Hacker News new | past | comments | ask | show | jobs | submit login
Why Is It So Hard to Detect Keyup Events on Linux? (robertelder.org)
148 points by robertelder on Jan 27, 2019 | hide | past | favorite | 131 comments



I mean, that there's no key up event in tty context goes back to the beginning of time, and is not Linux's fault, and not even Unix's (nor VMS's) fault, or anyone's fault, because this all works the way it does because of how terminals worked, and they worked the way they did because it was simple.

If, in the late 70s, key up events in tty context had been important, then terminal vendors would have developed an escape sequence system for expressing those. But it wasn't, so they didn't.

So, yes, it's absolutely impossible to get key up events in tty context, and at this point it will almost certainly stay that way forever, as it's too late to retrofit terminal emulators, drivers, and applications to understand whatever protocol for communicating key up events. (The right way to do this would be to develop an escape sequence protocol that the tty/pty drivers could decode and turn events into out-of-band events to be delivered via ioctl()s, that way applications that don't care about key up events don't see them and don't need to be modified. But who is going to do all that work?)


> So, yes, it's absolutely impossible to get key up events in tty context, and at this point it will almost certainly stay that way forever, as it's too late to retrofit terminal emulators, drivers, and applications to understand whatever protocol for communicating key up events.

Why would that be? It seems like it would just be a matter of adding another termcap entry for your specific terminal emulator, and hoping other people care enough to start implementing it in theirs.

There's already no particularly consistent standard for most terminal events -- just a database of (I think turing-complete) stack VMs that tell programs how to parse whatever they read from the terminal emulator they're running under. Adding another string-named property to this table that describes key-up events would not be overly difficult.


Hi

> There's already no particularly consistent standard for most terminal events

This is not really true anymore, almost all terminals still in use (and certainly all that are widely used) are VT100 compatible. In fact most are broadly compatible with xterm. So that is pretty much the standard now, and most new features being added to terminals don't take any account of other types of terminal at all.

> It seems like it would just be a matter of adding another termcap entry for your specific terminal emulator, and hoping other people care enough to start implementing it in theirs.

termcap was superseded by terminfo around the late 80s. A few applications still use it, but most are now terminfo. The termcap database on most systems is now derived from terminfo.

terminfo is centrally managed as part of ncurses and stuff doesn't really get added to it just to see. The way it usually works is that one or more terminals add a feature and if it makes sense and is popular, applications will use it, and if applications use it then often it will end up in terminfo entries. Definitely if ncurses uses it, possibly if not (this can be controversial).

You are right that there is no real reason a terminal couldn't send an escape sequence for key up, although it would certainly have to be something requested by the application - there are plenty of other escape sequences like that (eg the mouse, focus events).

The tty doesn't really come into it, nor does the kernel or drivers, except for its own console terminal, but not many people use that very much.

SGR mouse mode is an example of a terminal feature added relatively recently which has been taken up widely.


> This is not really true anymore, almost all terminals still in use (and certainly all that are widely used) are VT100 compatible.

VT100 is a tiny subset of what's defined in termcap.


My point is that for new features, like key up would be, it is not necessary to be bound by terminfo. If some or all of xterm, VTE, konsole, rxvt-unicode and iterm2 supported it, that is a large proportion of the modern *nix terminals that people still use. And they are all VT100 compatible terminals. If you look at a lot of recent features, such as SGR mouse mode, RGB colour, underline styles, focus events - they are all based on VT100 style escape sequences, and they all started with a few terminals and existed for a good while (with applications using them) before terminfo added any sort of support.


There are no terminals still in widespread use that are not VT100-compatible, by which I mean they use the escape sequence protocol originally used in the DEC terminals, although of course it has been much extended.


this kind of highlights one of the differences between FOSS and paid software. “not our fault” “too late to fix it”

versus a manager forcing you to fix the damn problem nobody cares whose fault it is.

of course this difference is not innate, just a tendency.


Eh? No, no one can fix this. Oracle couldn't fix this in Solaris. Microsoft can't fix this either (in Windows, in Linux, not at all).

It's not a question of money, or managerial will-power. You're asking to boil a great lake, and if you have the money for it, you will almost certainly find better uses for it. There is no market demand for a "fix" here, so there cannot be any managerial will-power either.

Nor is this due to Unix being open source back in the 70s, because it simply wasn't open source (and save for the erstwhile OpenSolaris, it still isn't). It wasn't entirely a research vehicle either, as AT&T used Unix as the OS for its phone switches. VMS too could not have had this feature, and for the same reasons -- though I'm not familiar enough with VMS, I feel absolutely certain that VMS did not have this either. And the hardware terminals (they were all hardware terminals back then, as there were no graphical terminals in which to emulate a terminal) were decidedly not open-source, and their vendors were in it for the profit.

This, and all the wonky, hacky things to do with terminals, goes back to the way things were almost half a century ago, long long before "FOSS" even came to be.

Software business models have absolutely nothing to do with why you can't have key up events in ttys.

EDIT: And incidentally, windowing systems happened on the scene around 1984-1985. By 1987 even SVR2 had a windowing system in the AT&T Unix PC. But it was too late to retrofit key up events into ttys -- already by then there was over a decade of legacy, and hardware terminals continued to be a thing for some time yet. This is a crucial lesson in engineering: even relatively small amounts of legacy have profound effects on what is feasible, so often you have to get things right in the very beginning (precisely when it's hardest to get things right!).


> No, no one can fix this.

The nice thing about escape sequences is that they are ignored if not understood.

> Microsoft can't fix this either (in Windows

Because it's not an issue in Windows? The Windows console, as primitive as it is, supports monitoring the status of the keyboard (which can be used to infer key up events).


MSFT is adding a pty driver and support for Unix-style applications.


> The nice thing about escape sequences is that they are ignored if not understood.

This is not what I find when I press my arrow keys in a container terminal set up with no TERM variable. I get funky characters spewed across the input line.


> No, no one can fix this.

I don't know about this mythical "one", but I can. I just see no reason to fix it.

It is not so hard to make a patch for a /usr/src/linux/drivers/tty/vt.c to make it accept one more esc-seqeuence which would switch terminal into special state, when terminal will send keyup and keydown escape sequences instead of raw-keys for each keypress. A week of tinkering with vt.c and viola, terminal can send keyup and keydown. (Plus maybe one more week to decide how to encode keyups and keydown the best, so ideally terminal would be useable even after wrong esc-sequence sent to it in the wrong time).

The right question is: and what? Why might I wish to do this? I might invent some application for this new feature, but I'm pretty sure I couldn't invent something useful what can't be done without keyup event.


That doesn't fix it, on its own. You now also need to make every library and abstraction layer that the packages on your system use be aware of this new feature as well. Can an ncurses program now detect a keyup through this? Can a Docker PTY? Can the JVM? Etc.

In most of those cases, because of the historical precedent, these libraries/runtimes don't even have a concept of "keyup" for their TTY-like abstraction, so adding this to them may entail a bit more work than it sounds—especially if, like the JVM, they have a static interface/protocol built in to specify the API of a "TTY-like object", which cannot be changed, since third-parties have compiled code against it and expect that code to remain ABI-compatible with new versions of the runtime.


Exactly. There's more to it. And yes, it can be done, literally, and I'm sure several participants in this thread could do it. But there's practically no demand, so it won't get done.


> You now also need to make every library and abstraction layer that the packages on your system use be aware of this new feature as well.

No. You miss the point. I need to do nothing. I have no reason to do anything. Give me a reason and then we can discuss what I need to do. I need a rationale to do something, without rationale my work would be pointless and you might argue indefinitly that anything that I've done already is not enough.


You are the one that said, above, that you could—and I quote—"fix this."

"This", in that phrase, stands for the problem in the OP post title—"it being so hard to detect keyup event on Linux."

And "on Linux", in the OP post, is a shorthand for "in all the arbitrary applications that exist in the Linux userland, or that one might choose to write in the Linux userland." (You can tell that it has this meaning, because under the more literal meaning of "on Linux"—working within the Linux kernel itself—it is already quite easy to detect keyup events.)

Obviously, fixing the entire Linux userland is an unbounded amount of work. That is precisely why the other commenters above are saying that it cannot be done. Not "done" as in "worked upon", but "done" as in "finished."


key-up could give you all sorts of interesting information: you could detect "false zeros" where there are extremely rapid up-down sequences suggestive of motor function disorders in the user (Parkinsonism, etc). You could test the forearm extensors. You could test motor units to exhaustion (hold key until press fails, the "keyup" event). We haven't even gotten to the spinal cord yet, so I'm pretty sure the frontal cortex could be engaged in all sorts of interesting 3rd order ways.


Yes, but why might one want to do it in terminal? You can start Xorg server or Wayland and make it in graphics, where there are no such constraints.


There are terminal emulators that can display images, make clickable hyperlinks, track the mouse cursor, and dynamically update the title bar with custom text. None of that was possible in the 70s. To say that this is impossible is to ignore how terminal emulators have actually evolved over the years.


A typical modern terminal emulator running under X will export WINDOWID that is the X id of the terminal emulator window. With this you could detect key-up events. I've never seen anybody do this though. The only uses of this I've ever seen are detecting when the terminal emulator goes out of focus and drawing images on top of the terminal emulator. I've not done a detailed survey, but I believe this used to be the most common way of displaying images "in" terminal emulators before terminal emulators started supporting inline images encoded as escape sequences (e.g. iterm2)

The problem with having the terminal emulator implement it is getting the terminal emulator to know when to insert these new key-up escape codes into the psuedo-terminal. The terminal emulator shouldn't send them to the psuedo-terminal when it's in raw mode, because every existing program that runs in raw mode won't know what to do with them. And inserting them into a cooked mode pty simply makes no sense. So it becomes clear that the terminal needs a "rawer than raw" (still bleeding?) mode, and some way for the program requesting bleeding mode to communicate that desire to the terminal emulator. Right now, that's not how the whole stack works. A program tells the psuedo-terminal (kernel) whether it wants raw or cooked mode, that's not something that terminal emulator is involved in.

You could potentially have a terminal emulator that accepts a proprietary "toggle bleeding mode" escape sequence, at which point the terminal emulator starts inserting key up escape codes into the pseudo-terminal which it assumes has been placed into raw mode appropriately. There's not really any technical reason this couldn't be done, but there's a strong "nobody needs this" social reason for why it hasn't been done. Sure, if you want responsive controls while play aalib quake in your terminal emulator across ssh, you'd want this. But who does that as anything other than a gimmick?


Making links clickable requires no changes to the protocols spoken -- the emulator merely recognizes URIs. I know about mouse support. Updating title bars actually goes back a long time. Displaying images... I've never seen -- can you tell me about that?

I've actually outlined in this thread how to implement key up events, if that really was desired, in ttys, but I argue there's no real demand, and it won't happen. And it is impossible now in the sense that... the functionality does not exist.


I’m not just talking about clickable URLs, but actual hyperlinks where the visible title is different from the destination URL.

I don’t know what there is to tell about displaying images. There are terminal emulators that can display images inline when a program emits the image data properly.

I don’t understand your final sentence. How does it make sense to say that it’s impossible because it doesn’t currently exist? That’s like saying it’s impossible for me to drive because I’m not currently in a car. You didn’t say that nothing currently supports it, you said “no one can fix this.”


I wrote this:

> So, yes, it's absolutely impossible to get key up events in tty context, and at this point it will almost certainly stay that way forever, as it's too late to retrofit terminal emulators, drivers, and applications to understand whatever protocol for communicating key up events. (The right way to do this would be to develop an escape sequence protocol that the tty/pty drivers could decode and turn events into out-of-band events to be delivered via ioctl()s, that way applications that don't care about key up events don't see them and don't need to be modified. But who is going to do all that work?)

So "it's impossible [right now]" and "it can be done" but "it's not gonna happen". Is that not a fair summary? Perhaps we can disagree as to the last part.

EDIT: Ah yes, further below I wrote this:

> Eh? No, no one can fix this. Oracle couldn't fix this in Solaris. Microsoft can't fix this either (in Windows, in Linux, not at all).

> It's not a question of money, or managerial will-power. You're asking to boil a great lake, and if you have the money for it, you will almost certainly find better uses for it. There is no market demand for a "fix" here, so there cannot be any managerial will-power either.

That's overbroad, yes. The fix I outlined is not as much as boiling great lakes.


I wouldn’t be at all surprised if it happened. I wouldn’t be surprised if it never happened, too, but I don’t think it’s at all a fair summary to say it definitely never will. It’s not at all reasonable to call it impossible, and Oracle, Microsoft, or random Linux developers could all do it if they felt like it.


Fair enough.


Terminals have been able to display images since the VT200:

https://en.wikipedia.org/wiki/Sixel


Tektronix had graphical terminals in the 70's. Still emulated by xterm.


The terminology, the terminal emulator from enlightenment, will display images and do other things tgat xterm never could:

https://www.enlightenment.org/about-terminology.md


>things that xterm never could

Ah, but can terminology render tektronix vector graphics? :)


Thanks!


iterm2 supports inline images with a proprietary set of escape codes described here: https://www.iterm2.com/documentation-images.html

Incidentally if you ssh into an untrusted system using iterm2 (or otherwise get foreign escape codes going into iterm2, for instance if you cat the wrong file), I believe it's possible for that system to drop files into your ~/Downloads/ directory without prompting you.

It could be worse though, there used to be terminal emulators you could exploit by setting the title, because they weren't bounds checking correctly.


> No, no one can fix this

It's been solved twice, but I'd like to get that down to just once :)

https://sw.kovidgoyal.net/kitty/protocol-extensions.html#key...

https://gitlab.com/gnachman/iterm2/issues/7440


There's a left out qualification here: "without breaking backwards compatibility". Sure, you can fix it, you just might have to change the entire stack a little bit and say goodbye to running all the programs currently out there. And then you can have a subsystem for backwards-compatibility, or in the worst case, a VM.


There is a ton of value in legacy. Maintaining backwards compatibility is a big deal. At some point you end-of-life a feature, but you generally don't want to make breaking changes. The tty is probably going to be with us for many more years yet (decades), so I don't see it getting end-of-lifed.


It is a big deal, but honestly as somewhat of a visionary I would rather see an open source platform that decided to cut ties with such a legacy system that causes short term pains in favor of longer term advantages. That transition has to happen at some point for a platform to become more versatile, and if it's put off indefinitely, that just resigns itself to being not the best possible platform it can be.


The thing you have to keep in mind is that once you start cutting ties with legacy systems, breaking changes will keep happening. Today, it's key-up events. Tomorrow, it'll be something else. And eventually, the new and shiny subsystem you introduced today will be considered legacy code by someone else, and it too will be tossed out and replaced.

In the end, you'll be choosing between a platform that requires you update every piece of software you maintain every six months and a platform that doesn't. And while the former might seem better from a pure-ideals perspective, the latter is vastly preferable for actual usage. (That's why so many people use Windows, and it's why I now run Linux on my MacBook.)


That's an interesting perspective, but it seems unlikely that that is the main or even an important advantage Linux has over Windows. After all, you can't really run old Linux binaries on a new Linux system without strong determination and a lot of time. There are many others, including its (essentially) fully open source nature definitely being up there.


You're correct that this is not an advantage Linux has over Windows; both Windows and Linux strive to maintain strong backwards-compatibility guarantees. It's an advantage they both have over systems like macOS.

And while Windows maintains near-perfect binary compatibility (I've run programs on Windows 10 that were last compiled a decade ago), Linux instead has excellent source compatibility. I've successfully compiled and used decade-old Linux utilities with no problems.

This difference speaks to the different economic philosophies these platforms have. Microsoft expects you to buy software in a box at a store and use it forever. Linux expects you to use free software developed by volunteers. And Apple expects you to pay over and over again to avoid losing functionality.


The Linux kernel is famous for trying really hard to maintain ABI backwards-compatibility.

In practice this means that statically-linked executables from a decade ago will run just fine on Linux today, at least as far as interactions with the kernel is concerned (interactions with other parts of the system, it depends). The same is not necessarily true of dynamically-linked applications because user-space libraries' maintainers are not as exacting about backwards-compatibility. Note that this is not an argument for static linking, at least not in its current form.


You forgot to justify where you need to have interesting features in a tty instead of graphical environments where there is no such limitation.


> Sure, you can fix it, you just might have to [...] say goodbye to running all the programs currently out there.

One problem with doing this is that the only reason anyone uses Linux is to run programs that are currently out there. If Linux started to make breaking changes like this, people would switch to some other operating system that didn't break their software.

Sure, lots of people use macOS, and macOS breaks a bunch of APIs every year. But macOS users also buy expensive recurring subscriptions for every app they use, so Mac software companies can afford to employ developers to clean up after Apple every year. Linux doesn't have that; a lot of Linux software is either unmaintained or maintained by volunteer developers in their spare time.


Isn't it a bit cynical then that a VT100 terminal emulator and the unix shell are still my biggest productivity boost as a programmer? The Windows development environment is a terrible mess. Its command window and batch script language are barely usable.

The Windows guys didn't care to fix this problem in a long long time. While Unix/FOSS platforms have traditionally been very strong in developers tooling.

If you look in Windows more generally, they seem to be more wary not to break compatibility than Linux. That's because they need to run ancient programs that cannot be recompiled. I would say that "too late to fix it" is actually a problem with proprietary software, not FOSS.


This isn't a super fair characterization. Visual Studio is miles ahead of Xcode or anything available on Linux as far as debuggers are concerned. You also have Windows Subsystem for Linux if you need it.

Just because it doesn't conform to your particular workflow doesn't mean the development environment is a "mess." Personally I'm more productive in Linux for certain kinds of work, and more productive in Windows for other things (graphics work mainly, as the driver situation for GPUs on Linux is a "mess" to use your words).


Yes, Visual Studio has a reputation for having the best in class debugger. While I don't know any graphical debugger for Linux that I would prefer to gdb. Other than that, Visual Studio has a reputation for being a morass slow POS. (Apparently that has gotten a little better with the latest releases).

The game developers I follow all use Visual Studio exclusively for debugging. Otherwise they use a standalone editor and hand picked tooling, and batch scripts and the command window (or something like mintty which has its own problems) while cursing all the time.


> The game developers I follow all use Visual Studio exclusively for debugging.

I work with all sorts of people in the industry across genres and I've seen plenty of workflows at many studios. I have no idea where you are getting this but you don't have any idea what you're talking about.


I am talking about the game developers that I follow. I like to think that I have a good idea about the game developers that I follow. If you do it differently or know many people who do it differently, that's fine with me. I don't work in the game industry.

The general statement I made was that VS is slow and that it has bugs. I also know from my own experience that this was more true in 2011 than it is now. I still don't like it.


Out of curiosity, who do you follow?


Mostly Casey Muratori and Jonathan Blow. I've also watched a few streams by Abner Coimbre who now works for Jonathan Blow. And generally I like the guys from the circle of RAD Game Tools, although I'm not sure about their workflows. I believe Sean Barrett still uses Visual Studio 6.0 which is from the year 1998 according to Wikipedia.


> Visual Studio is miles ahead of Xcode or anything available on Linux as far as debuggers are concerned.

That's one aspect, sure. The rest of Windows is a mess as far as programming goes, especially once you leave the simple case of "Oh I'll write everything how Microsoft currently prefers and hope it doesn't pull another COM-to-.Net again, pulling the rug out from under my whole ecosystem."


Because writing software as if still sitting at Bell Labs is so much better.


It certainly is if I'm not suddenly destroyed by a minor shakeup at Microsoft.


> Isn't it a bit cynical then that a VT100 terminal emulator and the unix shell are still my biggest productivity boost as a programmer? The Windows development environment is a terrible mess. Its command window and batch script language are barely usable.

It doesn't sound like you do Windows development? Batch files are hardly at the center of it. Things just work differently over here, and if you're gonna insist on doing things the way you're used to then there's going to be corresponding friction, just like I would if I tried Linux's GUI dev tools. Though even the CLI friction on Windows is far less now that WSL has come around. (Though PowerShell's been around for a while, which you somehow omitted.)


>It doesn't sound like you do Windows development?

I do, although I have only done it part time for about 16 months. And I don't do business software. I'm working on a compiler in my spare time, and on an industrial dashboard for work. Both are written in plain C (plus OpenGL and FreeType for the latter) and run under Linux as well.

Command window and batch files are still way more productive for me than using a pile of junk that I don't understand, hidden behind an IDE that I cannot automate.


"... that I don't understand ..." might be core to the problem. With your Linux/Unix tooling you have a lot more experience and invested time in learning. That knowledge isn't easily transferable. I noticed the same when I once in a while try to do "simple" stuff on Windows. However if I look at my full-time Windows colleagues they have comparable productivity on Windows to me on Linux and fail on Linux similarly to me on Windows. I could sit down and learn, and sometimes I do a bit, but for the larger time my interests are elsewhere.


[flagged]


Would you care to substantiate your accusations? What tools do you think are superior to approaches that let you control what happens, and do not require you to push all operations through a slow, RSI inducing, non-automatizable GUI?


You absolutely can automate the GUI. People make very good money doing exactly that. The key difference is that the barrier to doing that automation is higher than automating a CLI.


Ummm... No. Thanks, but no thanks. And I'm not interested in registry cleaners, either.


Developers deciding to develop good developer tooling on a FOSS platform is about the least surprising possible outcome and doesn't really seem to have much to do with whether FOSS or enterprise is better suited to fixing a big, uninteresting problem.


While I am much more productive in Unix command line than in GUI I still am not sure if it is real productivity boost or I just have a skill and simply used to Unix command line.


The command-line is a huge productivity boost:

1) The programmer does not need to synchronize with the results displayed on screen. That's because the command-line is technically not that interactive. Input is coming from a serial line. As long as the programmer knows by heart the state transitions that his previous inputs will effect, he can simply continue typing without caring if the effected state is already displayed on screen. This is serious because typical systems can often not keep up with user input since processing is jerky and many operations simply take a certain amount of time. But GUIs with their focus based input model absolutely require that the state is updated on screen before the programmer can continue with inputs. The interpretation of inputs in GUIs is a function of the on-screen state (besides the user devices' inputs).

So the serialized nature of command-lines not only saves time, but significantly reduces the stress induced upon the programmer.

2) All operations that the programmer knows how to do "interactively" can easily be combined in a script for automation.

Both points are arguably missing from IDEs. The only ad


I'm definitely not arguing that FOSS is always worse or anything. But there are certain hard, minor usability things that tend to never get dealt with in that space.

Detecting key-up fits that category. It isn't a big deal, and hard to make it happen, so it won't happen (and in this case maybe it shouldn't)


No it doesn't. FOSS had nothing to do with why ttys are the way they are.


It kind of does. The "paid software" will probably have some half-arsed attempt to "fix the damn problem" that makes everything awful, because a manager wasn't interested in hearing about "whose fault it is".

It won't work properly, and you'll constantly be screwed over by it, but hooray, you've got "paid software" which your boss knows was able to "fix the damn problem" so there's no point even saying it can't be done, even though it can't be done - after all they paid good money, and whoever heard of a manager spending good money on a product that doesn't work...

Most of you have probably used software built to this philosophy, a lot of the middle boxes for example. Most anti-virus software. Back in the day "RAM doublers". Because you may laugh at the idea of paying for software to pretend to increase your RAM rather than just actually buying RAM, but remember, "fix the damn problem". Nobody wants to hear about it being impossible, I want software that doubles my RAM, now write it...


> Back in the day "RAM doublers". Because you may laugh at the idea of paying for software to pretend to increase your RAM rather than just actually buying RAM, but remember, "fix the damn problem". Nobody wants to hear about it being impossible, I want software that doubles my RAM, now write it...

Some of those products actually worked. They compressed RAM instead of paging it to disk. Which was a net benefit back then, and often still is.

Similar techniques are in use in modern Windows, macOS and Linux.


> Back in the day "RAM doublers". Because you may laugh at the idea of paying for software to pretend to increase your RAM rather than just actually buying RAM, but remember, "fix the damn problem".

Those RAM doublers fixed the "damn problem" so well, that the implementation of them now lives in Linux AND macOS kernel and is enabled by default on all Macs, Android phones and several other devices.

So I think the "fix the damn problems" mentality actually works.


There are lots of things Windows didn't fix in the name of backwards compatibility. And that part of what made it so successful.

It is not a difference between FOSS and paid software but more about context: is fixing more important than consistency?

Most FOSS is actually written by paid programmers, with grumpy managers who want that impossible problem to get fixed. If anything, the issue is that it is not always pushed back in the main line, sometimes for good reasons.


There is a long history of commercial UNIX... All have the same limitation. FOSS really has nothing to do with it.


Uh and in commercial software you end up with a clutter of inane features, just because at one time a (potential) customer who seemed important requested it.


Providing features that users want. So terrible.


So is this possible in PowerShell? I don't think so


Actually it totally is. Because PowerShell is not a tty.

https://docs.microsoft.com/en-us/dotnet/api/system.managemen...


It's not impossible to fix it because its FOSS its impossible to fix it because nobody cares to invest a substantial amount of resources which is a separate but often entangled issue.

It can also be too late to fix something in a deployed system because the resources required exceed the benefit derived in which case there will be no manager forcing anyone to fix anything.


How does this relate to FOSS? Actually, is there even demand for this to be fixed at all?


> How does this relate to FOSS?

Anything relates to FOSS if it gets people riled up and makes the thread longer.


> versus a manager forcing you to fix the damn problem nobody cares whose fault it is.

All I'll say on this matter is that I find your views surprising, and we'll leave it at that...


What are you talking about? "We can't fix this bug, it's not our fault, too late" is all that Microsoft, Google and Apple ever say.


But we did get mouse events in the terminal and certainly those didn't exist back in the 70s. Or bracketed paste. It's not hard to imagine that there will be a standard developed (well, 5 standards, then 15 standards and eventually one will remain as the de-facto standard) for keyup events, if there was demand for it.


> So, yes, it's absolutely impossible to get key up events in tty context

There is absolutely no reason that many terminal emulators couldn't translate key up events into an escape sequence and forward them on to the application. So long as they receive them somehow themselves (eg an X terminal emulator will be told of them by X).

It doesn't need to be out-of-band and it doesn't need to involve the tty driver.

The right way to do it is to make them only sent when the application requests them.

For example, this is how focus events work. If the application requests them with SM ?1004, \033[I is sent when the window gains focus and \033[O when the window loses it. There are not a lot of applications that use them, but some can (eg vim).


This is somewhat overstated. Unix-like systems have a whole framework of "termcap" and "terminfo" descriptions of terminal capabilities, and it wouldn't take much to add a keyup-reporting mechanism to these that's shared between e.g. xterm, the Linux VGA console and the modern Linux FB-console. This is similar to how "terminals" have been retrofitted to enable, e.g. mouse support, or 256-color support, or viewport-size reporting.


Yes, but termcap/terminfo are about output, not input. Some output sequences could be used to alter terminal behavior, of course, but if no terminals implemented a protocol for sending key up events, then that wasn't going to get implemented. Recall that termcap/terminfo are about abstracting differences in terminals, not about driving terminal evolution. And again, any in-band protocols for sending metadata like key up events would have required much more than merely a terminal capability abstraction layer like termcap or terminfo: it would have required something like a variant on tty/pty driver raw mode so that legacy applications need not be modified to understand this. Or, alternatively, it would have required shells at minimum to understand these new modes so that they could still let you type `tputs ...` to recover from apps putting the terminal into that mode but not restoring the terminal to its previous mode upon death/stop.


> Yes, but termcap/terminfo are about output, not input.

Termcap itself--the older system of the two, and the one that's mostly subsumed by terminfo nowadays--supports input-related features already! See e.g. https://www.gnu.org/software/termutils/manual/termcap-1.3/ht...

Anyway, surely some sort of keyboard raw input exists already, if only so that X can use it? AIUI, X does run without root privileges these days, and keyboard input was never the main obstacle to X running without root - video output was. So, I think the article is not quite right in its assumption that detecting raw keyboard events necessarily involves root-level access.


Looks like non-root X still requires the equivalent of root access to the raw input /dev devices - https://wiki.gentoo.org/wiki/Non_root_Xorg#Making_necessary_...


You are absolutely right. Are there any such features that produce additional in-band data for every keystroke?


Or, alternatively, it would have required shells at minimum to understand these new modes so that they could still let you type `tputs ...` to recover from apps putting the terminal into that mode but not restoring the terminal to its previous mode upon death/stop.

This subtlety is the crux of the issue. There's already precedent for input features that the program can opt-in to using output escape codes, like bracketed paste mode. But these don't change input so radically that they'd stop you from using your shell if the terminal was left in that mode after a program crash.

Maybe you could make it work by saying that key state event mode is automatically disabled after every newline from the terminal and the program needs to explicitly re-enable it?


The OS is abstracting away details about where the characters come from, which is a keyboard in this case. The characters could also come from other sources, such as a file, or a speech-to-text program getting input from a microphone. Key-up events make no sense for these sources. Essentially, the author wants to go down a layer of abstraction to use his keyboard not as a character input device but as a grid of buttons. By default, this will require special permissions on most linux distros (and rightly so, as it allows for keylogging), but this is a matter of changing one's udev configuration; root is not inherently required.

In any case, the stated goal "to remotely navigate a robot over an SSH connection using the 'w', 'a', 's', 'd' keys" is misguided to begin with; what happens when your connection drops and the robot can't be stopped?

Addendum: has the author thought about the case where the user is using a keyboard layout where the WASD keys are not together, or where the user is using a non-latin-alphabet keyboard? As someone who uses a dvorak-based layout, I am annoyed at how often developers screw the key/character distinction up and assume everyone uses qwerty.


The terminal is the wrong tool for this job. I believe the author realizes this in their exploration of the topic, but I think it still bears saying explicitly. This task is difficult not because of some big design mess-up, but because this use-case is well outside the design constraints of the technology he turned to first.

EDIT: I’d also like to mention that “Linux” has nothing to do with this. One would face the same issues using a Windows SSH client connecting to a Solaris SSH server.


Right. If you want to control something in real time, use a real time protocol.


Honestly I probably shouldn’t have mentioned SSH, because it also doesn’t really deserve any blame here. People can / do tunnel real-time traffic through SSH without any problem. The real root problem is wanting to get so much detailed information from a tty-like interface. In this case it just happens to be via SSH. The author would have the same issue with telnet, rsh, etc.


He could also totally still use SSH for this. He just has to tunnel something else over it instead of using a TTY. SSH itself is somewhat agnostic to whether you use a TTY or not.


What like SSH?


Sure but it's outside of the design constraints because whenever anyone tries to improve on VT100 crustaceans come out of the woodwork saying "nooo.... that's not proper Unix!"


Every freshman CS student should be given a copy of the Unix-Hater's Handbook on their first day.


That raw keyboard events are not delivered through SSH connection is entirely expected. At its core, it is a text-only communication protocol. At the discretion of the client terminal, there could be an ANSI escape to enter mode where raw key events are delivered, akin to unbuffered input. But that is nevertheless beyond the scope of what SSH offers.


Yes, delivering timed key events would be a security issue.



Huh, why?


If each interactive key stroke is sent in its own packet, then you may be able to guess which keys the user is pressing based on just the timing of packets coming in, and so guess what they are typing without decrypting the stream.

https://www.usenix.org/legacy/events/sec01/full_papers/song/...


But what does timestamping have to do with sending each thing in its own packet? Did you mean something else by 'timed'?


You, as a malicious 3. party, captures the encrypted packets between a client and a server. You can't decrypt those packets.

However if each keystroke is sent in its own packet, you note the timestamp of the packets and then you can infer which keys the user has pressed from the difference in time between the packets.


I get that, it's just not what I originally read the comment to mean. It sounds like you guys say "timed" to mean "sent in real-time", whereas I read it to mean "includes timestamps", that's why I was confused. (Because in my mind I think of things like "timed automata" which are automata accompanied with time variables, not automata that execute in real-time.) But otherwise yeah, that makes sense.


Exposing timing of typing in an SSH connection is pretty fatal to security. That means that ssh clients should send individual characters no more often than, e.g., 10ms, and preferably in multiples of 10ms up to 150ms (say). And they should use SSH2_MSG_IGNORE messages to pad out those packets so that attackers cannot observe the number of characters typed (I forget the details as to whether this last is workable).

Including timing information in the protocol, encrypted of course, would be just fine... But the channel that carries tty session data doesn't have a protocol for conveying timing and other metadata out of band.. A protocol would have to be developed for that, or for carrying that metadata in-band, and all of that would require changes on the server side and in the pty drivers.


This weirdly seems like the "right" implementation to me. Somehow I feel like a TTY generally doesn't need or deserve to know when keys are pressed and released.

That said, is there a more end-to-end summary out there of how keyboard input is handled in GNU/Linux? I have the vague understanding that USB HID scancodes are translated into keycodes, which are sent along to X applications or a TTY, but where and how each step happens is still a bit mysterious to me.


> Somehow I feel like a TTY generally doesn't need or deserve to know when keys are pressed and released.

Why? What if I'm playing a game on that TTY?


Games are the reason we have X or Wayland.

Yes you can play games in terminals. Years back I used to be pretty enthusiastic about roguelike development (rec.games.roguelike.development) Those games are almost universally designed in a way that doesn't require key-up info. Where you really need key-up info is in realtime games like tetris, except in practice even for tetris you can do just fine without key-up. Now, if you were playing Quake using a aalib renderer, then you'd definitely want key-up to stay competitive. But then you've also got the problem of sending stereo sound over ssh! Should terminal emulators be made to support that too?

There are lots of terminal games. There are lots of realtime games. The demand for games that are both at the same time is pretty damn niche. And when you plausibly find yourself in that niche, you realize that needing key up info is the least of your problems. This is indicative of using the wrong tool for the job.

Trying to play quake in your terminal emulator with responsive controls is like trying to take your car out into the bay to do some fishing so you go at it with a caulk gun and expanding foam, slap a sail on top, only to realize you don't have a keel to the wind flips you right over. Just get a boat.


Mainly because there exists no protocol by which the terminal (really, terminal emulator) can indicate these events, and the tty/pty drivers don't know how to decode that non-existent protocol to spare existing apps having to be modified to understand it.

This all goes back to the 70s, when people first hooked up typewriters to computers as terminals: everything was far too simple to make it possible to add such a protocol, and nobody cared about key up events, and they weren't going to care for a long time because the hardware and software all had to be rather simple (exceedingly so by our standards today).


You made a nice post but that's not the question being asked. The historical record is different from the question of whether a terminal needs/deserves it this decade. ("need" obviously being non-literal)


The submission's title is "Why Is It So Hard to Detect Keyup Event on Linux?". I knew the answer to that very specific question, so I offered it :)

As to whether we need to evolve this functionality today, I would hazard that the answer is "no". I did sketch out how it would be done, if we really need it and people want to implement it. But my guess is that there are too few people with the desire and funding to work on this problem. Reading TFA today is the first time I've ever heard of anyone wanting key up events in the tty -a tty that has some 45+ years of history- so I'm a bit skeptical that this will somehow become such a blindingly obviously highly desirable feature _now_ that it might get done.


> The submission's title is "Why Is It So Hard to Detect Keyup Event on Linux?". I knew the answer to that very specific question, so I offered it :)

You didn't reply to the top level article, you replied to a specific comment, which was asking a different question.

And "not enough desire to make the change now that we're so entrenched" is still different from "does it deserve to know, if we were doing it right". And I would say the latter is true. (Still ignoring "need" because "need" is really vague.)


Fair enough. That post said:

> Why? What if I'm playing a game on that TTY?

So my answer was still on point.


The predicate to that specific "Why" was "doesn't need or deserve to know", not "is so hard to detect on linux".


ReadConsoleInput https://docs.microsoft.com/en-us/windows/console/readconsole...

INPUT_RECORD https://docs.microsoft.com/en-us/windows/console/input-recor...

KEY_EVENT_RECORD https://docs.microsoft.com/en-us/windows/console/key-event-r...

    bKeyDown
    If the key is pressed, this member is TRUE. Otherwise, this member is FALSE (the key is released).
Also, please note that INPUT_RECORD contains union of key, mouse, window buffer size, menu and focus event records. I do not want to say interface is more well thought per se, but it is definitely more rich.


The reason why Windows has this stuff is because it never was particularly supportive of remote terminals. Going back to DOS days, when writing a console app, it was either the kind that only needed print, or else you basically had full control over the screen area, changing individual characters, colors etc in a random access way. The first approach would use DOS output functions that allowed for things like stdout/stderr to work. The second approach couldn't be redirected properly.

Windows inherited that model, and mostly just kept developing it until lately.


This is functionality that the terminal emulator reasonably should provide to enable games or other interactive applications. I believe that would solve the author's complaints. Some work has been done along these lines in both Kitty and iTerm2. Not everyone likes the idea because it breaks the basic abstraction of a terminal. I kinda like it, though, and I'm optimistic that the situation will improve in the coming years.


The title really should be "On a TTY" and not "On Linux" - the reality is, it's not that hard. The TTY is a TTY - it's designed for typing, not general input. You could always forward your input events through another channel, even over SSH if you wanted.


IIRC /dev/input/event* gives you that


Meta:

I'll again have to criticize this submission's title. It shouldn't be " Why Is It So Hard to Detect Keyup Event on Linux?" (it's not a problem with the kernel), it should be something along the lines of "Why can't I detect key up events via SSH?".

And the answer to that is simple: That's not what it's designed for.

Or, even better: Instead of concentrating on the complaints part of the article provide on the part in which you're providing value to your readers: "Detecting keyboard events without a display server" or, if you want to get in on the long headlines trend: "Detecting key up events in a TTY environments is hard. Here are some ways".


I kinda disagree. At least some of the article is specific to Linux's console implementation. Not the "why", but the specific workaround given for the terminal doesn't necessarily generalize to other POSIXs. I think it's fair to specify that in the title.

And the complexity of the article's answer is certainly interesting to people who care about receiving keyUp events.


No, the submission title is the title of the linked page. That is correct.


It's not uncommon to correct click-baity titles in a submission. By "submission" I'm also referring to the original article.


Your comment does not claim click bait. Your comment claims you disagree with the blog title. The rule is: Don’t editorialize.


"... unless it is misleading or linkbait", which I indeed think to be the case here. I clicked because I thought frameworks like Wayland and X or something else inherent about the architecture of modern Linux systems made it difficult to detect key up events, but it just turns out the author was trying to use something that doesn't have any concept of key events, leading to the feeling I was misled.


Would it work if the sender immediately started sending a fast stream of repeating characters over? Then the keyup on the receiver is when the stream stops.


Very interesting, in a different context I had to tackle a pretty similar problem with Redtamarin [0]

Traditionally under the CLI you will manage key input wih a readline() command or something similar to kbhit() and depending on your needs you'll use getchar() then track if either a CR or LF is entered for the "end of command", also EOF.

This is blocking, so nothing else can happen, and depending on how you do it, you can only read single byte chars and not mutlibyte chars (like CJK input)

something like

    while( run )
    {
        i=kbhit();

        if( i != 0 )
        {
            key = String.fromCharCode( getchar() );

            if( (key == "\n") || (key == "\r") )
            {
                run = false;
            }
            else
            {
                buffer += key;
            }

            i = 0;
        }
    }
another way to do it

    while( run )
    {
        b = fgetc( stdin );

        if( b == EOF )
        {
            if( kbuffer.length == 0 )
            {
                return null;
            }

            run = false;
        }
        else if( b == LF )
        {
            run = false;
        }
        else
        {
            kbuffer.writeByte( b );
        }
    }
which has 2 greats advantages, being able to read multiple bytes input (thanks to fgetc() which read the raw byte) and detect EOF (CTRL+D under POSIX, CTRL+Z under WIN32), but still blocking forever, using fgetc() the detection of EOF is done automatically for you (while getchar() getc() etc. do not detect that EOF)

now because Redtamarin is based on AVM2 and AS3, there is one part of the API which try to reimplement the Flash API with such things like KeyboardEvent that should be able to be non-blocking but still for a CLI environment

That KeyboardEvent should detect keyUp or keyDown but yeah it is hard to detect and if you try to do that for multipel platforms liek Windows / macOS / Linux it gets nigthmarish

In a little experiment [1] I found out you can do "stupid things" that actually work, like spawning a child worker (AVM2 uses pthread), blocking on the user input (like above) and then send back a message to your main worker and so receive a "key event", all that allow to listen for input asynchronously

But then, what about making the difference between keyUp and keyDown ? I decided to ignore the keyUp because in fact it does not really matters on the CLI or I least I don't see any use to it, for a GUI yes I can see the use cases, but for a CLI? not so much

Purely on the CLI (no X Server) you don't really listen for key events you read the stdin stream, the only special events are signals like SIGINT SIGHUP etc. or special kind of signals like EOF.

The other things you can alterate is the buffering of that stdin and the raw mode/cooked mode and echo off/on.

So for a use case like navigating something using the 'w', 'a', 's', 'd' keys you just need to go async to listen for chars input (which key is pressed) and probably a mix of timeout on the last key pressed and a "diff" between the "prev key" and "last key".

If last key pressed is 'w' then go up, if you keep receiving a 'w' key you keep going up, if prev key was 'w' and the last key is different you change direction.

And to detect when to stop to go up if last key pressed was 'w' you just keep the time when this key was pressed, if 1 second elapsed and no more key events are received you stop going up, ergo you don't need to detect keyUp, but maybe I'm missing something.

  [0]: https://github.com/Corsaair/redtamarin
  [1]: https://twitter.com/redtamarin/status/900794336031510530


This is quite possibly one of the best critiques of YAGNI that I've ever read.

I grew up in the waterfall (as opposed to agile) era of the 80s and 90s. Back then, it was top priority to catch mistakes as early in development as possible. This is the top article I could find on the concept, which seems to stir a lot of debate:

https://developers.slashdot.org/story/03/10/21/0141215/softw...

The gist of it is that if a bug costs $1 to fix during development, it costs $10 to fix during testing and $100 to fix once it's in production. Maybe someone can find the original quote.

Had I been working on terminals in the 70s, I would have been the annoying person in the back of the room who raised their hand and said "what about key ups?" There would have been a lot of muttering, much debate about how to store keymap bit arrays and what might happen if they get out of sync, every edge case would be explored, and in the end my opinion would be noted somewhere and character streams would have moved forward as the "simpler" implementation.

But it's not simpler, because perfectly valid use cases were excluded. Basically, their decision meant that we couldn't have video games in the terminal. Kind of a big deal, if you ask me.

After so many decades of this, it's hard for me to drink the Kool-Aid on new frameworks, even if they're ones we use every day like C++ (operator overloading was maybe a bad idea in hindsight), git (can't store folders without .gitkeep), Angular (two way binding - oops), React (front end PHP), even HTML/CSS/Javascript (difficulty encoding our own tags as components/nondetermistic inheritance across browsers/mutability). These frameworks are great, but it takes a certain level of suspension of disbelief to buy into them.

Give me 5 minutes with any technology and I'll find the conceptual flaws and bugs that impose major hurdles on its conceptual simplicity, utility and reusability. Basically everything I touch breaks. It's like a knack that makes me a good programmer, but also a Debbie Downer.

https://www.youtube.com/watch?v=MZF6EK7x4Dk

P.S. The workaround for the keyup thing is probably to set the key repeat threshold and repeat delays to 0 and check for repeated keydown events each main loop, setting the keymap entry to true if the key is still down (false otherwise). It's not perfect because it can't easily detect multiple keys down or modifier keys, but that was one of the ways we did it in classic Mac OS anyway.


What makes you believe those were valid use cases for those in charge back in 1970s? Just because some hobbyist wants to run a real time game on a terminal worth thousands of dollars doesn't mean that others are going to accommodate that.

The most likely reason why they never bothered with key-up is because they were an evolution of the teletype, and inherited its protocols. That was the use case, not a generic human interface.

But even if there was some explicit consideration of this idea, I'm pretty sure it'd die almost immediately, because carrying information about keystrokes would double the bandwidth of an average session, and we're talking about the time when typical transmission speed was measured in baud, and 4- digit speeds were considered fast (the first terminal supported a maximum of 2400 baud). You need a much more profligate culture to add support for things like that "just in case".


The counterargument is that it would have taken so long to do it the "right" way that someone else would have shipped the simpler implementation and it would be widely adopted by users that were happy to have something "good enough" so by the time your "right solution" shipped they wouldn't care about it anymore.

I mean, TTY was first written in the 70s. If I write code works fine except that it has a bug that will require a rewrite from scratch in 40 years I'd be pretty happy with the quality.


Bugs might cost $1 to fix during development, but 9/10 of those bugs don't affect anything visible during testing, and 99/100 of those bugs don't affect anything visible during production.


Finding faults out of context is not really a skill; any moron can do that, and unfortunately many do.


What about Wayland?


This is on a tty. So outside of X as well.

Both X and Wayland have key down and up events.


Yep. All GUI systems that I'm aware of have explicit keyDown/keyUp events. They're essential for a number of things.

If the Unix console was being designed today, I expect it would also have them, along with a sane system for reporting terminal capabilities and probably a zillion other things.

It's too bad none of the console modernization projects have really gained traction. The best we seem to be able to do is fish + UTF-8 + maybe Powerline.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: