Unix applications assume end-of-file when read() returns 0 bytes. Ctrl-D can generate such a condition in a quite contrived way. The article is very wrong about this:
> In fact, the Control-D you type in at the shell to end input is simply a signal to the shell to close the standard input stream.
The shell is not involved in this at all, and no file descriptor gets closed. [0] The magic happens in the kernel's VT (virtual terminal) subsystem. When the user presses Ctrl-D, the terminal emulator (e.g. xterm) writes the corresponding byte sequence [1] into the master side of the pty (pseudo-terminal) device. Upon observing this input, the kernel will immediately answer all outstanding read() syscalls on the slave side of the pty device.
Now usually, when the terminal device is in its default mode (aka "canonical" or "cooked" mode), the kernel will buffer user input until a full line is observed (i.e. until the user hits Return), so read()s on the slave side will block until Return is hit. Due to the behavior above, Ctrl-D can be used to send partial lines to an application. Try this, for example:
1. Run `cat`.
2. Type something and press Enter.
3. Type something and press Ctrl-D.
4. Type nothing and press Ctrl-D.
The last step will cause `cat` to exit because the read() returns 0 bytes (since there is nothing in the kernel buffer), which `cat` interprets as encountering the end of the input file.
[0] Which wouldn't make sense anyway, since the shell can only close its own file descriptors, but the reading happens in an entirely different process.
[1] I'm not incredibly familiar with the master side of ptys, but I would guess that the concrete byte written is 0x04, since the letter D is 0x44 and the Ctrl modifier (in ye olden days) unset the 0x40 bit, leaving only 0x04.
> So what is the difference between the type command used above and the Notepad application? It’s actually hard to say. Possibly the type command has some special code that checks for the Control-Z character in its input.
Actually, it's very easy to say, and I've been pointing to the code for about a decade.
Why doesn't Windows drop the purported compatibility with CP/M and get rid of control-z being a special file marker at this point? Lots of legacy code relying on it?
FWIW it's generally advised to not open files in text mode on Windows anyway. The handling of \x1A is one reason. The automatic mangling of newlines into Windows newlines is another.
Edit: And while cmd's type does process it, modern stuff like PS's Get-Content does not.
One weird thing I've found about EOF is that EOF is sometimes not actually the end. You can read an EOF from stdin and then continue reading more data.
> In fact, the Control-D you type in at the shell to end input is simply a signal to the shell to close the standard input stream.
The shell is not involved in this at all, and no file descriptor gets closed. [0] The magic happens in the kernel's VT (virtual terminal) subsystem. When the user presses Ctrl-D, the terminal emulator (e.g. xterm) writes the corresponding byte sequence [1] into the master side of the pty (pseudo-terminal) device. Upon observing this input, the kernel will immediately answer all outstanding read() syscalls on the slave side of the pty device.
Now usually, when the terminal device is in its default mode (aka "canonical" or "cooked" mode), the kernel will buffer user input until a full line is observed (i.e. until the user hits Return), so read()s on the slave side will block until Return is hit. Due to the behavior above, Ctrl-D can be used to send partial lines to an application. Try this, for example:
1. Run `cat`.
2. Type something and press Enter.
3. Type something and press Ctrl-D.
4. Type nothing and press Ctrl-D.
The last step will cause `cat` to exit because the read() returns 0 bytes (since there is nothing in the kernel buffer), which `cat` interprets as encountering the end of the input file.
[0] Which wouldn't make sense anyway, since the shell can only close its own file descriptors, but the reading happens in an entirely different process.
[1] I'm not incredibly familiar with the master side of ptys, but I would guess that the concrete byte written is 0x04, since the letter D is 0x44 and the Ctrl modifier (in ye olden days) unset the 0x40 bit, leaving only 0x04.