Newline hell

The Uzebox now have a fully functional emulator! Download and discuss it here.
Post Reply
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Newline hell

Post by Jubatian »

It popped up a few times so far, and I am also running in problems related.

I work on Linux, using mcedit, and here, this tool doesn't even recognize the Microsoft style new line sequence "\r\n". It is a bit annoying since on every line end I see a symbol for "\r", and I can not output it myself (so my line ends end up as Unix style "\n"). So far the only place I had seen problems with Unix newlines is Microsoft's Notepad, but that thing really shouldn't be used for anything serious.

I can not change editor for the following reasons:
- I like to work in console, it is handy, I can CTRL+O to see the console output any time even behind the editor, and Midnight Commander itself is also very useful in the workflow.
- My eyes are overly sensitive, I simply can not work with something which enforces white background, and many GUI applications do (I use a black background theme for MC). Even if they are theme-able, many GUI apps end up with a few (improperly coded) menus and stuff with black text on black background or bright text on white when attempting to invert their theme.
- I can not use Vim due to that I prefer to have my Hungarian keyboard layout, on which the many keycombos operating Vim are simply not present (Hungarian requires 9 accented vowels which take away most keys). I frequently edit Hungarian stuff with mcedit, and I really wouldn't want to use two editors (and even then what about Hungarian commented code, used at the company I work for - there most plaintext type files use Unix line endings, and we never had any problem with that even though most use Windows and various GUI tools even on Linux to view / edit code).

With these the only option for me is converting source files back and forth if I wanted to get "\r\n" line endings in the end result.

Who uses what by the way? It feels like most of you working on the emulator are quite familiar with Unix tools, I even wonder how these files ended up with Microsoft style line endings.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Newline hell

Post by Artcfox »

I noticed some files ended up with mixed line ending when working on your -hacks branches, and also after your changes got pulled into the uzem140 branch.

I've just been running dos2unix and then unix2dos on the files to fix them, but then it makes any changes I make look like they touched more lines than they should. So when I do git diff to see what I changed, I've had to tell it to ignore whitespace with the -w flag.

What needs to be done is to settle on a single line ending style, and then fix all the files in the repo, and make that be the only change made for that check-in.

I use Emacs and it doesn't care what the line endings are as long as they are consistent within the same file. Otherwise it shows ^M at the end of some of the lines. So I don't really care which line ending style we settle on, as long as it remains the same within each file. I believe that Uze uses Windows, which is why most files have DOS line endings.

Since Jubatian's editor is the only editor that seems to have a problem with DOS line endings, another option is for him to configure his copy of Git to automatically convert the line endings to unix when he clones, and then convert them back to DOS when he commits. To be honest I've never had good luck when people in a team use that setting, because it assumes that any file checked in with CRLF was done so by mistake, rather than "the project was started on Windows."
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Newline hell

Post by uze6666 »

A lot of people has worked on this project and everyone has his own way and tools. That said, I can't see how someone could work with notepad though. For me, it's important that whatever the format works in Eclipse, the tool most commonly used on this project from what I know. Since I'm the long term maintainer, it important that the format work without conversion under Windows.

So if /r/n (crlf) works both in Windows and linux (and for you) , I have no issue to change them globally. We should probably add a rule on the wiki for that.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Newline hell

Post by Artcfox »

uze6666 wrote:A lot of people has worked on this project and everyone has his own way and tools. That said, I can't see how someone could work with notepad though. For me, it's important that whatever the format works in Eclipse, the tool most commonly used on this project from what I know. Since I'm the long term maintainer, it important that the format work without conversion under Windows.

So if /r/n (crlf) works both in Windows and linux (and for you) , I have no issue to change them globally. We should probably add a rule on the wiki for that.
He was saying that CRLF did not work in his editor, since it can only add Unix line endings, which is what resulted in the files with mixed case.

Everything short of Notepad.exe on Windows can handle Unix line endings, and Unix line endings is actually how Git stores things internally. If you are doing the conversion, I would just avoid changing any data files for games, as certain games might be expecting the CRLF to be present in their data files.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Newline hell

Post by uze6666 »

Ah ok, sorry, I misread. I thought /n only was causing the issue. You are right, data files should not be changed, only source files.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Newline hell

Post by Jubatian »

For me it seemed like your software autodetects line endings and corrects things accordingly. It forced me to do a "dirty" merge when pulling in changes on the main branch since I changed code within my previous edits (with Unix line endings), and the main branch had all those converted back to Microsoft line endings. However in a file I added new (the instruction table) there were no such changes, it still has Unix line ends.

I could use a converter to force things back to Microsoft format, but that could again break stuff by forgetting to convert or converting something accidentally which I didn't meant to (and which then could merge in silently).

What could probably be done, and would be the safest is that whenever I open a pull request, before that I would create a single commit in which I ran the converter on affected files. This still could be faulty (forgetting to convert something, and so it goes in with mixed line ends), but at least in that commit it would be clear what I ran the converter on, and maybe less problematic merges would then emerge from "stealthy" conversions.

If nobody has problems with it, and some modularizing may continue, I would prefer to use Unix line ends on new files added by me. (Unix line ends are also better for text content in games if any: one precious byte saved on each line :D )
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Newline hell

Post by uze6666 »

If nobody has problems with it, and some modularizing may continue, I would prefer to use Unix line ends on new files added by me. (Unix line ends are also better for text content in games if any: one precious byte saved on each line :D )
Sure, go on. If we encounter problems we'll adjust then.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Newline hell

Post by Jubatian »

Just a related problem which also cripples the code's layout somewhat.

The coding style in the emulator uses tabs to indent. The problem is that the tab size is not consistent across platforms (I use 8 chars as tab width since it goes best with many leftover stuff from DOS and assembly files which might have statements right after a short local label). The problem is not the usage of tabs, rather the inconsistent usage of them.

At parts of the code spaces are used for indentation, at other parts, tabs, which screws up its layout where the tab size is not exactly the same as for who edited it last. When I edit parts of the code, for the affected lines I fix these indentation issues by tabbing the code properly (replacing spaces where they are incorrectly used).

Moreover when editing, you should not only be aware of tab size when indenting, but also when commenting or aligning parts of the code (such as breaking a long statement in multiple lines). To make it look right with any tab size setting, tabs should only be used for indentation (but there only consistently tabs) and nothing else. Aligning comments should be done by spaces past the natural indentation of the block (and you should not align parts of comments past indentation blocks since those will misalign with a different tab setting).

In mcedit, a properly tabbed code can be visually quite descriptive as mcedit discerns tabs from spaces visually (so a properly tabbed code's indentation levels strike out nicely).

I usually don't use tabs in C code (I feel a single space perfectly sufficient for indentation), but I follow whatever style is present in others code (uzem's coding style is quite different to what I naturally use). I just mention this hoping that it will help keeping the style consistent, which is one although not critical, but important part of the "cleanness" of the code, and a good thing for long term maintainability.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Newline hell

Post by Artcfox »

I'm probably guilty of contributing to the mismatch.

Emacs uses a "smart-indent" mode where I can just type the code with no indentation at all and as soon as I type a parenthesis, semicolon, or comment character, or the tab key and it parses the syntax and automatically indents the code using spaces (replacing all existing tabs on the line with spaces). The tabs drive me crazy, because in order to put actual tabs in I have to press Ctrl-Q and then press Tab each time, and any time I type a parentheses, or a semicolon, or a slash, I have to first escape it by typing Ctrl-Q otherwise all of the existing tabs on the line get immediately blown away and replaced with spaces. I try to make sure that I go back and manually change everything back to tabs after, but I'm sure I miss some, especially when I'm jumping around in different parts of the code.

When I tried telling Emacs to use tabs, the indenting stayed the same, but it converted any run of 8 consecutive spaces into a tab, and then indented the leftover bit that wasn't on an 8-character boundary using spaces. That is extra-terrible because it assumes that everyone else is using a tab size of 8, and that they want to mix tabs with spaces in this exact way. Yuck.

As a result, I'm used to using just using spaces as well, because that's what happens automatically and I have to fight really hard for it to be anything else.
Post Reply