All About EOF (2012)

majewsky · on March 12, 2019

Unix applications assume end-of-file when read() returns 0 bytes. Ctrl-D can generate such a condition in a quite contrived way. The article is very wrong about this:

> In fact, the Control-D you type in at the shell to end input is simply a signal to the shell to close the standard input stream.

The shell is not involved in this at all, and no file descriptor gets closed. [0] The magic happens in the kernel's VT (virtual terminal) subsystem. When the user presses Ctrl-D, the terminal emulator (e.g. xterm) writes the corresponding byte sequence [1] into the master side of the pty (pseudo-terminal) device. Upon observing this input, the kernel will immediately answer all outstanding read() syscalls on the slave side of the pty device.

Now usually, when the terminal device is in its default mode (aka "canonical" or "cooked" mode), the kernel will buffer user input until a full line is observed (i.e. until the user hits Return), so read()s on the slave side will block until Return is hit. Due to the behavior above, Ctrl-D can be used to send partial lines to an application. Try this, for example:

1. Run `cat`.

2. Type something and press Enter.

3. Type something and press Ctrl-D.

4. Type nothing and press Ctrl-D.

The last step will cause `cat` to exit because the read() returns 0 bytes (since there is nothing in the kernel buffer), which `cat` interprets as encountering the end of the input file.

[0] Which wouldn't make sense anyway, since the shell can only close its own file descriptors, but the reading happens in an entirely different process.

[1] I'm not incredibly familiar with the master side of ptys, but I would guess that the concrete byte written is 0x04, since the letter D is 0x44 and the Ctrl modifier (in ye olden days) unset the 0x40 bit, leaving only 0x04.

JdeBP · on March 12, 2019

> The magic happens in the kernel's VT (virtual terminal) subsystem.

No, it is in the line discipline, which all terminals have, real, pseudo, and kernel virtual.

JdeBP · on March 12, 2019

> So what is the difference between the type command used above and the Notepad application? It’s actually hard to say. Possibly the type command has some special code that checks for the Control-Z character in its input.

Actually, it's very easy to say, and I've been pointing to the code for about a decade.

* http://jdebp.eu./FGA/dos-character-26-is-not-special.html

jokh · on March 12, 2019

Why doesn't Windows drop the purported compatibility with CP/M and get rid of control-z being a special file marker at this point? Lots of legacy code relying on it?

JdeBP · on March 12, 2019

It's not in Windows in the first place. The headlined article does explicitly say this.

It's library code in a large number of applications programs.

* http://jdebp.eu./FGA/dos-character-26-is-not-special.html

Arnavion · on March 12, 2019

FWIW it's generally advised to not open files in text mode on Windows anyway. The handling of \x1A is one reason. The automatic mangling of newlines into Windows newlines is another.

Edit: And while cmd's type does process it, modern stuff like PS's Get-Content does not.

Thorrez · on March 12, 2019

Someone could even write new code that relies on it.

Thorrez · on March 12, 2019

One weird thing I've found about EOF is that EOF is sometimes not actually the end. You can read an EOF from stdin and then continue reading more data.

majewsky · on March 12, 2019

See my sibling comment: https://news.ycombinator.com/item?id=19366521 - In short, EOF is probably not what you think it is.