
Ask HN: How / why did “newline at the end of source file” become a thing? - factorialboy
Question for old-timers I guess  ️
======
andreareina
Seems it's not so much that source files must end in a newline, but _lines_
must end in a newline[1]. As to why, I imagine that it makes it easier for
line-oriented tools to not have to special-case files where the last line
doesn't end in \n

[1] [https://gcc.gnu.org/legacy-
ml/gcc/2003-11/msg01568.html](https://gcc.gnu.org/legacy-
ml/gcc/2003-11/msg01568.html)

~~~
factorialboy
Haha.. nice to know, thanks for sharing!

------
BjoernKW
While not the original motivation for me personally it's a UX advantage:

If a file I'm editing has a new line at the end I can skip to the end of file
and then move upwards again line by line with the cursor ending up at a
defined position (i.e. the first column in that line) each time.

If, however, there's no new line when following the same routine the cursor
will end up at a position defined by the arbitrary length of the last line in
that file, which for me at least is not what I want as a user in most cases.

Then again, maybe that's just weird muscle memory acquired through long-term
exposure to how text editors usually work.

------
bloak
How else could you distinguish between a file containing zero lines and a file
containing one empty line?

(That sounds a bit Zen, doesn't it?)

~~~
AnonHP
Maybe I didn't understand the question, but a file containing zero lines can
be stored on all operating systems. It would be a file of size zero bytes and
would show up in directories as such. A "> filename" command (or even a
"touch" command with a new filename) run on the shell is an easy way to create
such a file. A simple file containing one empty line would contain just the
newline character (depending on the OS, this may be LF or CR-LF or CR) and
would be one or two bytes in size. On disk, both these files would occupy the
minimum allocated block size by the OS for files.

~~~
bloak
Yes, I think you didn't understand. The question was "why". If newline
characters (hypothetically) separated lines rather than terminated them, like
how commas separate fields in some situations rather than terminate them,
then, in that hypothetical situation, you would not be able to distinguish
zero lines from one empty line. This may not be the historical reason for non-
empty files ending with a newline, but it's a motivation for that rule.

In most of the cases where commas separate things either those things are not
allowed to be empty or there has to be at least one thing, so there's no
problem. But you have to watch out for cases in which you don't have those
conditions because then you get the ambiguity.

------
kazinator
In Unix, a text file is defined as a sequence of zero or more lines, each
terminated by a newline.

If the file has any lines at all, then the last line is terminated by a
newline, for the same reason as the first and any other line.

If the last line is missing a newline, then it's an improper text file.

An empty file is just no characters at all. If a file contains nothing but a
newline character, then it's a one-line file containing an empty line.

If the newline were optional at the end of a file, then those two cases (empty
file versus one-line file with optional newline missing) would not be
distinguishable.

The text file representation has to have some sort of unambiguous framing
which indicates the presence of each line. Unix chose the terminating newline
as that framing.

A lot of tools come from the Unix environment. C came from Unix, and the C
<stdio.h> text streams follow Unix conventions when operating in text mode,
regardless of platform. So that is to say, unless you _fopen_ a file in binary
( _" b"_) mode, text mode is in effect, and text mode means newline-terminated
lines. The programming model that you see when manipulating text streams in C
is that of lines terminated by '\n', and nothing else. This is true even if
you are on Windows, where the actual file has "\r\n" termination, and
character 26 at the very end.

C ate the world, especially in the area of tooling, and so that representation
and its associated concepts have spread. Other languages imitate the model.

A lot of languages have been written in C in the last 30 years, and it's easy
for such languages to be influenced by C's conventions. For instance, I think
that Python has essentially the same newline-terminated-line view of text
files, on any platform.

Language designers want to give programmers a sane model for working with text
files that is platform-independent, and the inspiration from that comes from
UNIX via C more than anything due to historic circumstances being what they
are.

Plus the text file model is very simple. It has no special case for the last
line (no file terminator), and uses a single character for the framing, which
is a lot simpler to program with than a two character sequence like CR-LF,
where either of the two characters can be missing, or they can be in the wrong
order: all these cases to handle in every situation where you're scanning
across lines.

------
undecisive
I'm quite glad it is a thing - not sure whether this is the reason, but I
quite like creating small files using the `cat` command:

    
    
        cat - > some_filename.txt
    

To finish the file, you have to add a new line - then, at the start of that
line, press Ctrl-D

So with tools where you have to insert an EOF manually like that, it's
impossible to have a file that ends without a newline.

~~~
jolmg
Not exactly.

`cat` stops reading when read(2) returns an empty string (read(2) doing that
is what's interpreted as EOF). Ctrl-D doesn't insert anything. Rather it
flushes the terminal's output buffer (the one that's filled with user input)
into the underlying program's input.

The terminal does line buffering by default to allow the user to modify the
line before submitting it (e.g. deleting with backspace or Ctrl-W). That means
that it automatically flushes when reaching the end of line (i.e. the
newline). If you flush again with Ctrl-D at the start of a line, that means
cat's call to read(2) will return an empty string, since you're flushing an
empty buffer.

That means that:

> So with tools where you have to insert an EOF manually like that, it's
> impossible to have a file that ends without a newline.

is false. It's not impossible, just impractical. You can have a file with an
incomplete line at the end by starting a line and hitting Ctrl-D twice. Once
to submit the incomplete line, and then again to flush the empty buffer,
thereby indicating EOF.

~~~
undecisive
Well I never... very interesting. Thanks!

------
ddgflorida
Keep in mind if you are writing software to read text files, that last newline
may not be there.

