
Character by character TTY input in Unix, then and now - gbrown_
https://utcc.utoronto.ca/~cks/space/blog/unix/RawTtyInputThenAndNow
======
bluetomcat
Would you agree that the whole TTY subsystem makes very little sense in this
day and age? Why not let programs operate in raw mode by default and provide
user-level library calls for input line buffering?

~~~
jstimpfle
It's important for job control. Like Ctrl+c or Ctrl+z.

~~~
jml7c5
Why not do that screen/tmux style, where shell takes input and passes it to
foreground program, unless it's one of those special key sequences?

~~~
jstimpfle
That's not what happens with screen or tmux (or xterm, or sshd on the remote
side of an ssh connection...). These allocate a pseudo terminal (PTY; which is
a structure in the kernel) and connect to the master end. Using that they
emulate a terminal, which has a virtual keyboard and a virtual screen. A shell
(or another program) is started, with its input and output(s) connected to the
slave end.

The virtual keyboard might be X11 key events in the case of xterm, or a
network stream in the case of sshd, or simply the stdin downstream from
another terminal in the case of tmux. That keyboard character stream is fed
into the write fd of the master end of the PTY.

The virtual display is implemented by processing data read from the read fd of
the master end of the PTY. That might involve drawing to an X11 window in the
case of xterm, or sending back data over the network in the case of sshd, or
writing to stdout connected to another terminal in the case of tmux.

The important part is this: The job control is still done in the kernel as
part of the PTY implementation. That implementation is where for example a
Ctrl+c (0x03 byte) is converted to a SIGINT which is then sent to the slave
program.

------
caf
The fact that the kernel TTY driver is responsible for handling erase in
cooked mode leads to a wrinkle: in order to do that correctly it needs to know
if you're using a multibyte character set, and if so, which one (because a
single erase must erase an entire multibyte codepoint).

Linux uses the (non-POSIX) IUTF8 termios(2) flag to indicate a terminal is
using UTF8. If this is set on a terminal which isn't UTF-8, you'll see some
odd results - for example, in a terminal using the single byte character set
ISO-8859-1 but with IUTF8 set and in cooked mode, if you type:

    
    
      Ã¡Ã¡
    

and then backspace once and hit enter, the process reading cooked mode will
recieve "Ã¡\n" \- the erase has eaten two ISO-8859-1 codepoints because it
thought that they were a single UTF-8 codepoint.

I've never used a non-UTF-8 multibyte terminal (like BIG5 or SHIFT-JIS) but I
don't think erase would work correctly in cooked mode on such terminals.

------
3pt14159
This is incredibly timely. I've been trying to figure out if it is possible to
create a universal shim that would always give me readline-like behaviour
while supplying input to an ongoing program.

Something like this:

./shim.sh | ./some_horrible_cli.rb

That way I could always rely on my linux shortcuts. Ie, no "^A^K" when I try
to kill a line; it would just kill the line.

Is this possible?

~~~
dTal
Good news! This already exists. It's called 'rlwrap' and it's awesome.

~~~
3pt14159
Thank you! :)

------
willvk
Correct me if I'm wrong, but I don't think that protected mode existed in old
Unix systems and so was not delineated between user space and kernel space as
described.

Protected mode and the separation of kernel and user space was only available
since 80286 processors starting in 1982.

Refs:

[https://en.wikipedia.org/wiki/User_space](https://en.wikipedia.org/wiki/User_space)

[https://en.wikipedia.org/wiki/Protected_mode](https://en.wikipedia.org/wiki/Protected_mode)

~~~
icedchai
There is a world outside of Intel, especially back in the 80's and 90's...

~~~
mwfunk
And especially back in the '60s and '70s, when Unix and its predecessors were
originally developed.

------
gumby
Th discussion of ssh is interesting and calls to mind the SUPDUP protocol,
which moves all the character echoing, cooked mode, and even some local
editing — including screen movement for Emacs!) to the local (user end) of the
network connection. Might be worth reviving.

------
rubatuga
This really helps me understand how read() works as I am currently learning C
programming. Didn't know that ctrl+D still sent everything before it in the
buffer to the read call, and that a read of length zero and EOF was the same
thing.

~~~
majewsky
Ctrl-D is maybe the hardest part of TTYs to emulate if you want to bypass the
accumulated legacy of the kernel-level TTY device API. It's very easy to make
read() return 0 forever, but it's next-to-impossible to make read() return 0
exactly once, except for TTY slaves.

