Hacker News new | past | comments | ask | show | jobs | submit login
Why Is It So Hard to Detect Keyup Event on Linux? (robertelder.org)
156 points by EntICOnc on May 17, 2021 | hide | past | favorite | 128 comments

I'm the author of the `keyboard`[1] library mentioned in the article.

- It reads all events because it's meant to interface with your keyboard directly, for global hotkeys, macros, disabling or remaping keys, etc. Interacting with windows and per-application hotkeys are explicitly not a goal of the library at the moment.

- It reads /dev/input because the library was developed to work in as many environments as possible, including headless installations like raspberry pi's that may not have a graphical environment or even a monitor. There's an open change to _try_ to communicate with the X server first and fallback to /dev/input, but event suppression is not working reliably with this yet.

- It could read /dev/input by just being in the `input` group, but then `dumpkeys` doesn't work and you are stuck typing numeric scan codes instead of key names.

Due to a series of unfortunate personal circumstances I've been unable to give the proper maintenance the library requires, but the issues and thank yous have never stopped, and I'm slowly getting back to it again.


Little known trick: if you run the library as a standalone module (`python -m keyboard`), it prints a JSON object for each detected event. You can save them to a file and pipe them back (`python -m keyboard < events.txt`) to replay them like a macro.

Little known trick 2: I also created `mouse`[2], the companion library for my second favorite peripheral. It's the same thing, but for mouse events.

[1]: https://github.com/boppreh/keyboard

[2]: https://github.com/boppreh/mouse

Hi! I've used your library in various projects and am very grateful for it!

I figured it'd be worth a shot to ask you this here, since I just caught you: have you considered adding some support for "rehooking"/"restarting" the library, so that callees can restart your library whenever they detect that their environment has changed (e.g. input devices have been (dis)connected, etc)?

I've contributed with some info to this issue https://github.com/boppreh/keyboard/issues/264 as to how we're dealing with this in some of our projects. Funnily enough, I didn't know about `python -m keyboard`, and had I known about it, it would definitely have saved me some time!

Once again, thank you very much for your hard work!


I've been thinking about how to handle environment changes for a while, and it's pretty hard to reliably detect and respond to them in all supported systems.

But I really like the idea of a manual "re-init" method as temporary measure. This should be trivial in Windows, and doable in Linux after some refactoring.

Thanks for the idea, I'll take a look tomorrow.

For those interested, I wrote a similar library for Ruby: https://github.com/rickhull/device_input

Thank you very much for the library. Recently I have been learning touch typing and detecting system wise characters typed per day plus speed and acuracy is on my todo list. I already selected the keyboard module for this after some initial research.

> A tty doesn't know about the concept of key press or key release events. It only knows about 'data stream in' and 'data stream out'.

This is basically the long and the short of it; it's a very old API, and retrofitting key-up detection reliably is a serious problem.

Perhaps this is nitpicking, but I don't think the issue here is "data stream" vs. not-"data stream".

Perhaps a better distinction is "data stream of characters" vs. "data stream that can express key events (and perhaps other stuff as well)".

This is correct, plan9 continues using a data stream but puts events in the data stream rather than characters.

Even on Linux the low level kernel API for getting keypress events works this way.

It's really a lower level protocol issue. The serial terminals upon which the "tty" model is based had no event stream outside of characters being sent or received. There are events provided by the serial port itself and delivered out-of-band though (example: modem DCD line drops, SIGHUP is generated.)

To fix this low level problem, someone will need to modify xterm, or libvte, or any other terminal, to send escape sequences for key release events, when such mode is requested by an application.

Already done. Kitty and other modern terminals have an extended keyboard interface which allows to detect keypress, keyrelase and arbitrary key combinations (unlike legacy vt100-style terminal emulators, where e.g. Tab and Ctrl-I are indistinguishable). For backwards compatibility you need to switch to this mode by sending an escape sequence.


> and other modern terminals

Do you know of any source documenting which terminal emulators support this feature other than kitty?

Have a look here: https://gitlab.freedesktop.org/terminal-wg/specifications/-/...

Basically Kovid Goyal, the author of kitty, volunteered to write a spec, based on the work he did for kitty, but got fed up with what he perceived as pointless bikeshedding. The authors of other terminals represented in the terminal-wg (e.g. iterm2, mintty and vte -- which powers terminator, gnome-terminal and others) kept going for a bit, but there is no shared spec yet and I haven't checked if anyone else other than Goyal has implemented something at this point (I thought vte had, but I didn't immediately find it when I looked). It also looks like it might take some time till some de-facto standard emerges.

Userspace doesn't implement debouncing in linux, that's driver territory; linux isn't an rtos. It's unclear why this is even mentioned at all.

And you don't need root if your program's user is in the input group on most distros. You're opening a device node in /dev/input/XXX, the standard UNIX permissions model applies...

Yes, I think the article's description of getch() as being inadequate due to a lack of debouncing is a bit weird. I would think that debouncing would be done even earlier, in the keyboard itself, since the debouncing you need is necessarily specific to the actual hardware. While USB HID keyboards are state-based (they transmit a packet listing which keys are pressed), the PS/2 protocol is event-based, and transmits "make" and "break" messages for key-press and key-release events, respectively.

(That said, I wouldn't be surprised if debouncing were done in the driver. I just think it's the wrong place for it.)

I suspect the author may have some experience implementing ad-hoc buttons via GPIO polling from userspace, and maybe used getch() in doing so.

The thing about getch() is that there's not really one standard getch(). There's a Windows version of getch() which is compatible with, notably, Borland's version for DOS (I think), and then various people have reimplemented getch() for other systems.

They might have meant to write getc() or getchar()...

That would make no sense at all. I think the comparison to getch() is the natural one here.

I suppose, from the linux perspective going through curses for this use case, in the same sentence mentioning "debouncing" seems awkward, but it does at least provide a polling api. Honestly I have no idea what they're talking about there.

> And you don't need root if your program's user is in the input group on most distros.

It's complicated. Being in the input group is part of it, but to actually get the access you need to interface with the freedesktop-derived multiseat stack, which is as clunky as everything else that comes from the fd.o folks.

tl;dr: It's not about Linux, it's the terminal.

Linux exposes your keyboards as /dev/inputN devices and (with the exception of oddball devices or drivers without the hardware capability to detect them) every single one of them presents clean down/up pairs on every press, every time. If you want to read the device state, read it from the kernel.

The problem is that the author doesn't want to detect the device state. The author wants to write a program to run at the command line. And the command line is an environment descended from a long line of tty environments going back to the original teletypes of the 1960's. And those devices were never designed to expose a "keyboard" as a "device". They were a source of "characters" only.

Now, over time countless people writing countless terminal emulators have tried to address this shortcoming, in countless not-quite compatible ways. And that's been about as successful as you'd expect. You can do it, but...

But again, it's not about device support. Your command line program isn't and has never been connected to a "keyboard" device via any useful abstraction. The same feature that allows you to pipe input into it instead of typing at it makes this problem hard.

What is the correct architecture or design to make TUI on a terminal then ? Honest question

You don't go through the TTY model, but read directly from devices (if the kernel allows it). But people will hate you for it, since it doesn't fit with the abstraction layers that are commonly used, one of which is that only characters matter, not keypresses. Consider how such a TUI that directly interacts with keyboard devices (because it wants to detect keyup events) would work over SSH. It cannot. Abstraction layers limit certain functionality, but they also allow workflows to exist that otherwise could not (or would be much more complicated). You have to judge in every instance whether the correct balance has been made, but I don't think this aspect of the Unix terminal design is awful.

I think the other problem mentioned in the article is that it has to work over SSH. TUI on a local machine has no problem hooking into input devices; display-side, the difference between GUI and TUI is just what API calls you use to render stuff. But if you want to drive things remotely, you're constrained by the remoting protocol - which, in TFAs case, is SSH without X forwarding.

Yeah, but it's the same problem. "Over ssh" means "on a terminal". Terminals have input streams, they don't have "keyboards".

Not necessarily! SSH is terminal-agnostic: It doesn't care what or even if you're running a terminal or a shell on either end.

It's perfectly reasonable to setup a pipe that sends binary data through SSH:

    cat somefile.bin | ssh someuser@somehost "cat - > /tmp/somefile.bin
That example assumes the shell of the user on the other side can execute the `cat` command but what if you didn't use a shell at all? What if you just had `/usr/bin/myprogram` as your shell? That can work too! You just need to set things up correctly (the SSH tunnel and make sure your program knows what to do when various escape sequences/signals get sent through the pipe by the SSH client--so the session can end cleanly, mostly). Going this low-level probably isn't necessary though...

For the author's original example you could write `read_raw_input.py` and then pipe it through ssh to your controlling program on the other side say, `receive_input.py`:

    /usr/bin/read_raw_input.py | ssh someuser@somehost "receive_input.py"
...as long as the two Python scripts were setup to write and read from stdout/stdin everything should work fine, be reasonably low-latency, and won't require much setup other than SSH keys.

On the other hand, if I were do to this I'd write my Python code so that it communicates with the other end directly rather than rely on the SSH daemon but that's just me (even if it was using SSH/paramiko internally).

Sure, and that'd be fine for a special-purpose app that you use for yourself, but imagine being a regular user and installing this app, and the author wants you to install something on both the client and server side, and to run ssh in this unusual way. And you have to ensure that you have the proper permissions for raw access to the keyboard on the client side.

If it's something that truly requires that sort of experience to work properly, ok, but if it's just a sort of "I want a fancy text UI that requires key-up events", this process is a bit much to ask of your users.

It seems to me that terms mostly had semi batch IO mindset. The basic unit is a chat, not a signal over time like `<sensor> down for half a second` which only make sense for high frequency or real time user interactions.

The original ttys were literally Teletypes https://www.bell-labs.com/usr/dmr/www/picture.html designed to send text directly to another Teletype over a phone line. As such they sent and received sequences of characters, not key events, yes. OFC these are also basically the same ASCII text streams that Unix pipes between shell commands and the file system.

I get the legacy aspect both in internals of the time and usage. But I'm curious how to upgrade the old way to support finer grained interactions.

That's simply not possible nor desired. It's like trying to think of a way to make ints support fractions in C without storing a second int for the denominator.

What you really want is that SSH implements support for a remote keyboard so that the application on the other end can simply read local machine key events and this is as far away from "upgrade the old way" as possible.

Good point. This reminds me of Wayland Vs xorg.

Like other's have said. If your user interface relies on key-up, it's not really possible in general. A given linux machine may only have a serial console, and afaik, you're not getting key up from any serial terminals; certainly not any common terminals. It's just not part of the design, and short of a time machine, it's unlikely to become part of the design because DEC isn't around to update the spec for vtXXX terminals or their implementations.

No, a character stream is a character stream. You don't get key down or any key events. Why does everyone talk about key up exclusively as if there were 99 supported features and key up was the only forgotten one that you need to get to 100?

There could easily have been escape sequences for released keys, just like there are for non-character keys like the arrow keys. You'll have to change your terminal emulator to support this too, though. This would work fine with SSH.

You would also need escape sequences for pressed keys.

I mean, on Unix it's ncurses, right? If no-one's ever added key-event support to ncurses then that's a(n additional) sign that it likely isn't possible and that TUIs just have to make do without the capability.

Honest answer: one that doesn't involve the need to capture individual keypresses. It just doesn't work on a Unix tty. Use a web app or other smart client (c.f. the IBM 3270, which had this problem largely solved in 1971 by putting the key navigation intelligence into the terminal).

I was about to say that, distributing real time input processing near the terminal makes sense. Also I wonder how video games handle that separation.

You're not supposed to write very advanced TUIs (advanced being subjective here). It's not what the interfaces were designed for.

The interfaces for advanced UIs is Wayland and X. TUIs are meant for compatibility, not fancy ability.

Right. And on top of that, one of the useful benefits of a TUI is that you can use them while ssh'd into a remote server. If your fancy TUI is expecting to get raw access to the keyboard, it's not going to work in a ssh session.

Who says? Actual question.

> TUIs are meant for compatibility

Nonsense. Our machines don't even get GUIs installed.

That's exactly what the compatibility is for. Tools that work in the absence of anything advanced, with nothing but a few escape codes to "draw".

Otherwise it wouldn't work on the kernel console (the only thing available without a GUI installed), over SSH, over serial links, even still on dedicated terminals and teletypes.

Terminal is still a first-class UI mode, not some fallback.

If some tooling doesn't work for us (which means over a terminal), then we don't consider it. If something doesn't expose functionality over a terminal, that functionality doesn't exist.

Sure, it's a first class UI mode from the 70's, which is extremely useful. Any development of it would be in direct conflict with its qualities.

> If some tooling doesn't work for us (which means over a terminal), then we don't consider it.

Who is "we"? Even the most die-hard TUI fan uses them through a GUI terminal emulator, making that argument fall entirely apart.

Correct architecture for terminal applications is command line; line oriented input, line oriented output. Not handling individual key events.

Even back in the 1980s, when I was asked to reimplement the heirarchical DEC VMS "help" command for SunOS (in a terminal window), it was necessary to handle individual characters rather than use line-oriented input. The VMS "help" command had been written for VT100/VT200 terminals and their cousins, and was designed to take action on a single key press, rather than waiting for a return/enter action to "send" the command.

Not always, mind; having to press enter after every character press in vim would be a major drag!

ed/ex works perfectly fine with only line input and output. Afaik TECO stared out like that too.

Gvim works for visual editing

You can read individual characters just fine from a command line app. That's what tty "raw" mode does.

We need a text-mode equivalent of X11 (and then ssh forwarding for it).

To add, you can read the kb events even in a command line program. Using GUI APIs works fine in terminal programs if the GUI is running. Works even remotely in a GUI that support network transparency, like X does. Eg like this:


> Linux exposes your keyboards as /dev/inputN devices

Yeah, but those are only readable by root.

The modern Linux stack allows non-root users to access these if they're "bound" to the right 'seat' and 'session' in the freedesktop-derived multiseat infrastructure. AIUI, this is how non-root X works as well. It's incredibly clunky stuff because no one has ever made the effort to refactor it nicely (the typical problem with all fd.o stuff), but this is supposed to be possible.

this comment was an excellent read. is there any kind of book that covers the historical development of computer tech in this way, that isn't just a full-blown technical reference?

Not at that level of completeness, no. Most people who puzzle this out start with writing escape characters in shell to dazzle up their bash prompts and then escalate from there to figuring out the rest of the protocol.

Seeing this near the top of HN is a bit like seeing a "Why Does Italian Have All These Screwy Verb Endings?" post near the top of a web forum for professional students of linguistics ...

What's obvious to you might not be obvious to everyone. I've only heard reference to the issue previously but never looked at a detailed discussion, mostly out of not caring because I've never had to do something that depended on detecting keyup events in Linux in a non-browser context.

Really? There is almost no information in this article that I have ever heard before, and I imagine that I'm a pretty typical reader of this forum. You might have have a very mis-calibrated model of HN readers.

This is silly, you can't detect key up or keyb down in a tty, you detect what the terminal emulator gives you: chars and Esc sequences.

It's as sane as asking why you can detect joystick movements or the kettle boiling.

It bothers me how little people understand about the terminal and "the command line", despite using it everyday.

If people read man bash from top to bottom they would probably get a feel for what is going on. Learn how changing the title of your tty works and there is not much left to _not_ understand.

Yeah, the author is not running into a Linux problem. He is running into an SSH problem.

>Something worth noting to avoid confusion is that if you run the example python keyboard example in the introductory paragraph of this article over an SSH connection, the code will still work and run, but when you type characters, nothing will happen.

>What gives?!? That's because it will be detecting 'local' keyboard events from the machine you just SSHed into! If it's a cloud server like EC2 or something, there probably aren't any keyboards attached to it!

"What gives?!?"? Is this supposed to be a joke? What else did you expect? Expecting your local keyboard to connect to the remote machine is as insane as expecting your keyboard to read the keys you want to press from your mind.

The problem he is running into is that SSH only forwards a character stream to the remote machine and that is entirely an SSH problem and the solution to forwarding the keyboard has been X forwarding. What he is asking is that SSH should be able to forward the keyboard without X.

He even mentioned a workaround that involves compensating the lack of this feature by forwarding the keyboard himself.

>Well, you have to build your own client/server application where a client/server listens locally on the machine with the keyboard, and then forwards these events to the remote machine where you're running the applications that needs to respond to these events.

My suggestion is that he should send an email to the openssh developers to add keyboard forwarding without X.

> Are you interested in detecting local 'key up' events over an SSH connection without involving an X server? Oh boy, are you in for a disappointment! It turns out, that it's impossible. I don't mean the "I tried really hard and couldn't figure out how to do it, so it must therefore be impossible" kind of impossible, but the real kind of impossible where the keyup event doesn't get communicated at all during a non-X forwarded SSH session. Here is an experiment you can do to prove this to yourself [...]

This isn't really a good experiment as it assumes that everything that can be sent over SSH will always be sent. This not the case in the terminal world where by default you get a processed character stream and have to explicitly enable anything else. For example kitty's keyboard protocol [0] mentioned elsewhere in this thread needs to be activated by sending the escape sequence CSI > 1 u to the terminal on the other side of the SSH connection.

[0] https://sw.kovidgoyal.net/kitty/keyboard-protocol.html

> So, what do you do when you want to send 'key release' events over SSH when the target machine doesn't have an X server? Well, you have to build your own client/server application where a client/server listens locally on the machine with the keyboard, and then forwards these events to the remote machine where you're running the applications that needs to respond to these events. That sounds like a lot of work because you have to set up all the sockets and custom messaging protocols, but there's no way around it because the key release events simply aren't forwarded over your terminal-based SSH session when there is no X server forwarding configured.

This isn't that much work if you write it as a trivial Web app using WebSockets. Then you can leverage the browser's OS abstraction layer instead of having to write your own.

I don't think the article is fair. I'm not used to such lower level on linux, but it is certainly doable without root. SDL has been doing it for ages. Doing it from the terminal is another thing entirely since it was never designed for such kinds of events.

I'm right there with you, but as someone who is used to the "lower levels" of Linux (to use your phrasing) this seems like the exploration of someone beginning to understand they were on the question side of an "X / Y problem" (https://xyproblem.info/).

Put differently, they're making the transition from Linux as a commodity used to build the thing they're dreaming of to Linux as a critical design component.

For example, he calls out in the beginning: > They are interested in performing some real-time based task that is controlled using keyboard presses. In my case, the goal was to remotely navigate a robot over an SSH connection using the 'w', 'a', 's', 'd' keys.

As the author discovers, that's fundamentally not how SSH works. This sort of behavior could be achieved using other mechanisms, but it's not even really an issue of the tty/pty. They're just trying to map a functional model from a different operating system to Linux.

As to the discussion of keyboard handling moves into Python code I would have again, gone a different path.

When the author took a turn into Python my first thought was this is trivial for evdev to handle (https://en.m.wikipedia.org/wiki/Evdev / https://python-evdev.readthedocs.io/en/latest/).

Next thing I know they touch on the kernel mechanisms managed by evdev.... And pivot across to X11 (which would seem to make sense until one realizes that the transition to Wayland from X11 is far further along than a layperson might imagine).

In the end a good write up which shows a lot of "raw power" on the part of the author. With some additional tutilage, exploration, or guidance they to really take their understanding to the next level.

For the folks who are in the weeds (like me) it is a good guided tour. Seeing into the "beginners mind" (and taking it to heart) can provide perspective as to how to make software more intuitive.

(Typed and butchered from my phone).

I could be wrong but I'm pretty sure SDL just reads the keydown/keyup events it gets from the X server (that's why you don't need to choose a keyboard device when you use it). So without X running SDL wouldn't be able to grab keyboard events.

It can grab joystick/gamepad events though since it reads those directly. In that case you have to enumerate then specify which gamepad/joystick device to use so it can open the correct /dev/ path.

It can certainly be done by interfacing with the X server, but it limits you to systems with a running X server (e.g. no headless raspberry PI's, no Wayland).

I have a planned change to use this a default, and fall back to /dev/input on non-X systems. It's not quite there yet, especially the capability of suppressing key events.

Small nitpick, if you use WASD for control, it may be better to use the keycodes and not the characters. People with non-QWERTY layouts probably want to use whatever keys they have on this location (ex: ZQSD for AZERTY layout).

This might be a dumb question, but why would I want to detect a Keyup event in the first place?

Lets say you're porting a fighting game, such as Blazblue, to Linux.

Carl Clover (Blazblue character) attacks on keydown, while Carl's doll attacks on key-up. Expert players change the rhythms of their key-down / key-up to make combos possible.

In general, these kinds of fighting game characters are called 'Negative Edge' characters. There's a large number of them in many different fighting games. I know its present in Street Fighter, Marvel vs Capcom, Mortal Kombat, and Injustice. Even SSB:U (Shield-release / perfect shields) is a negative-edge event, showing just how common negative-edges have become in modern games.

I personally am only really good at Blazblue: and thus, I know that Carl Clover, and to a lesser extent Taokaka and Lichi are negative-edge characters. Its an entire "character design" philosophy, to make certain characters feel much different from others. But its all over the place.


I HATE playing negative edge characters. If I realize a character has negative-edges, I run the heck away from them. Nonetheless, I accept that fighting games are fun because there's "always a character" that matches someone's personality.

If someone else has fun playing negative edge characters, I want to welcome them into the community, and therefore want to support negative-edge gameplay into a game. Its not about "me", its about "the community of players".

Or even simpler. Let's say you're porting a side scrolling old school Mario-type game. When you release the "move right" key, the guy stops moving right.

Lol. KISS bites me again.

Simplicity is boring. Thanks for writing that comment. I learned something new about games I've only played casually (by which I mean, button-mashing)

For Blazblue specifically, I'm not a Carl player (ugggh... negative edges). But just remember: whenever the doll is attacking, it reacted to the player LETTING GO of a button (key-up).

All of Carl's moves himself are on key-down.

So think of what your fingers must do to consistently pull off combos like: https://youtu.be/D8gPPB9YD6s?t=65

You get the benefit of playing two-characters vs the opponent's one character, but it means having to think about both characters (as well as having a wonky control scheme for the 2nd character). Furthermore, Carl is designed to play with the doll, so you only deal the same damage as everyone else if you successfully pull off these combos.

But having two-characters means you can setup unblockables more easily (doll hits high, Carl hits low), or weird pressure strings / frame traps that are unavailable to most typical characters. Which is where Carl's unique advantage really comes in. So its not a "damage" thing, its more about the mind-games you play with the opponent when its 2vs1.

Terminology: A, B, C, and D are the four attack buttons in the game. Numbers represent which direction you push (2A means A-button while holding "Down". 2 is down on the numpad. 8 is up, 6 is right, 4 is left). Every fighting game community has invented their own terminology. But knowing this should give you enough information to watch that combo-tutorial adequately.

A, B, C map to Carl, D maps to the doll. Holding D means that the doll starts to move left-right (4 or 6 with your left hand), letting go of D means that the doll attacks (negative edging). A, B, and C on Carl play as a typical fighting game character.

Carl is... not easy... to play.

The problem isn't Linux, but the terminal environment...

Then lets say a hypothetical terminal video game that acts on negative-edges.

My point: negative-edge gameplay is common in fighting games (and maybe other video games). Its reasonable to expect that a video game designer would use that technique.

I've played terminal action games before. Lets say you're programming one and have some useful function mapped to the negative edge. How do you implement that in the terminal?


Megaman's charge shot (and most "charge shot" video games, including Samus / Metroid, Rocket Knight Adventures, etc. etc.)

Maybe those platformers are more common? Either way, the negative-edge is all over video games, its a good control scheme.

As you have discovered, terminal is not really an good environment for games. Its really just that simple. If you pick terminal as your platform, you are intentionally picking a limited platform, and working within those limits is then kinda the point. Having poor input handling is part of those limits. If you just want to make a good game, then just use Wayland (or X11) instead or some wrapper of those.

The Linux terminal is not a good environment for that.

The Windows terminal supports key-up events... as long as I can remember anyway. I think a number of Windows programmers who are looking at Linux's command-line API wonder why its so hard to get something that's common in Windows.

You can interface with input devices directly if you want that. The output can still be in an VT10x terminal, or it can be directly in a framebuffer if you want software rendered graphics. Or better yet! X11 or Wayland.


>In my case, the goal was to remotely navigate a robot over an SSH connection using the 'w', 'a', 's', 'd' keys. Real-time tasks like this require extremely high responsiveness to key events for palatable performance.

Yeah, I don't think a SSH connection is the right tool either, but there's the answer (and basically the entire article follows from that).

In retrospect, thinking about the end problem, he really should have written a custom front-end that used SSH (or TLS) and a custom protocol to remotely send input and receive telemetry, as most term emus don't support this kind of behavior. It could still be a TUI, but it'd be local and use evdev. What'd be nice is then you could actually use a joystick if you had one ^_^

To properly implement something like Ctrl+Tab for switching between tabs (assuming application has such a concept).

All the applications getting keyup events seems like big security issue.

Is this same case with Wayland

Is what the same as Wayland? Maybe you've misunderstood the article?

The process is running as root and reading directly from /proc; the article acknowledges that it's essentially a keylogger.

I am now curious, though. How does Wayland handle key events? I have to imagine it only lets the active window listen in on key events, right? Maybe there's an API for subsribing to specific key combinations?

iirc it's pretty much completely up to the compositor. The spec is worded to allow "focus" to enter a surface and deliver events to it, but it never specifies under what conditions the compositor should do so.

So the compositor chooses what surface to deliver events to based on its own desires (like letting pointer focus enter background apps) and the users input. I think there is a protocol (used by Xwayland?) to allow a client to get events from any window if the compositor/user allows it

That's how it is. It's up to compositor to decide what events to send to what clients. Usually this means compositor only forwards input events to the focused surface.

> Maybe there's an API for subsribing to specific key combinations?

Why would you ever let a non-focused application subscribe to any key combinations?

> Why would you ever let a non-focused application subscribe to any key combinations?

* Media playback controls

* Hotkeys to take screnshots, or record videos

* Quake style terminals

* Ingame chat and game invites

* Color pickers

* Screen controls like F.lux

* Window management hotkeys

* Push-to-Talk in group voice calls

And that's just what I personally use on my own machine.

It's a massive security issue that any installed software can listen to any input activity or view/affect other windows, and that does need to be reigned in. But to claim that there's no legitimate utility to global hotkeys is absurd. We need robust permissions, not a completely crippled experience.

It doesn't even need to be installed. Back in the 90's, some Unix workstations shipped with X11 security disabled by default. Your keyboard could be sniffed just by plugging into the network. Fun times.

In my experience most of these use cases are handled on linux by the window manager calling a script that sends a message to the relevant application, e.g. over dbus.

Sure, and that works fine. Just let my application tell the window manager:

* What key events I care about. (I.E. what the keybind should be named in the settings UI)

* What script or message to pass back to me when that event is activated.

* What the default keycombo should be, if any.

Some applications prefer to provide a ui for the user to configure a hotkey. This is naturally discoverable within the scope of the other settings in its preferences screen.

Providing a cli interface for automation or binding a hotkey is imo more powerful and useful but its not discoverable. Doubly so for a controlling it over dbus.

The logical thing is that since there is a substantial use case and a desire to limit applications access to global info is that a permission system ought to have been built into Wayland such that applications could request not only global access to the keyboard but permission to get a notification when a particular key press happened.

Since this feature was a staple of desktop operating systems for decades it ought to have been part of the plan from the start.

People have been asking for similar grab-type things since early on in Wayland, and there have been many proposals, but nobody has actually stepped forward to build such a permission system that would work everywhere and would not create additional security problems or would not severely overcomplicate things. It's not a simple thing to do, by any means. If you think you know how, I would urge you to get started designing it and contributing it to some of the major Wayland implementations.

I don't really want to be in the business of growing wheat or baking bread if what I actually want is a sandwich but I will happily explain why ___ sandwich shack is doing it wrong because Monday morning quarterbacking is more fun.

I get that, but everyone else also seems to have that attitude, so you see why it doesn't really get done when nobody has any wheat or bread :)

That doesn't seem like something that an application should be getting involved with. An application should only be able to manage its own windows and key presses to its own window.

So if I'm in my text editor and decide I want to change the currently playing song, I need to switch windows/contexts, press a hotkey or GUI button, and switch back?

The current model where everything running can listen to everything you do is not acceptable, but that doesn't mean there's no usecases that strongly benefit from global hotkeys.

> So if I'm in my text editor and decide I want to change the currently playing song, I need to switch windows/contexts, press a hotkey or GUI button, and switch back?

Looks to me like something the DE should manage for you, not the application.

But everyone seems to disagree with me!

But how do you teach your DE to do new things for you? And what is your DE in the first place?

The answer to both these questions is: an application. Think of applications hooking into global shortcuts as plugins for your DE.

> Looks to me like something the DE should manage for you, not the application.

> But everyone seems to disagree with me!

The original post you responded to was suggesting that your DE, or at least something outside of your application, provide an API to register specific global hotkeys that you can listen for. In response, you said:

> Why would you ever let a non-focused application subscribe to any key combinations?

So to me when reading this thread, it looks like you're the one disagreeing with this idea. This is the first post of yours I've seen where you suggest this, and the thread begins with you shooting down the entire concept as ridiculous.

Unless you're suggesting that the DE comes with a preconfigured set of global hotkeys and cannot be altered or extended, nor handled differently by different applications. In which case yes, I strongly disagree with that.

There's a reasonable middleground between a free-for-all and nothing but what the DE already thought of, and that's a well-defined API boundary alongside per-application, per-feature permissions. Bonus points if the DE handles all key-combination assignment in a consistent UI and applications can only register a suggested keycombo, a context for when the hotkey should be activated, and a function to respond to the event.

> Unless you're suggesting that the DE comes with a preconfigured set of global hotkeys and cannot be altered or extended

Yes that's what I'm suggesting. Works fine for almost all use-cases like on macOS.

I guess it's just not a popular opinion lol!

I've had custom global hotkeys for stuff in MacOS for over a decade.

But that's not how things like playback controls are implemented - they have non-extensible separate APIs for that. That's the point.

You've also been given a raft of other functions that depend on them in this thread.

I'm not sure what your hangup here actually is.

IME, most of those functions tend to get implemented in the DE when it turns out they're actually useful, or some other mechanism can be used to activate them. So no reason to get hung up on that particular method to do it either.

I did edit the post you responsed to a few times in the last few minutes, so heads up there.

> > Unless you're suggesting that the DE comes with a preconfigured set of global hotkeys and cannot be altered or extended

> Yes that's what I'm suggesting.

Okay, let's use the example of music playback. If your DE handles music playback controls, how does it tell the music player when to stop and start, or skip to next track?

Does it hardcode a list of music players, each of which provides its own bespoke API, and the DE calls into the application?

Or does it provide a way for any music player to call into the DE, and listen for those play/pause/skip events?

And if it does that, then why are the list of events hardcoded? Why not allow the music player to say "I also play podcasts, and I want to provide a hotkey to speed up/slow down the playback speed, or provide a separate hotkey to skip ahead 30 seconds that's distinct from skipping the entire episode."? Why not allow Discord to provide a push-to-talk hotkey so that people on a group call can actually hear each other without dogs barking and keys clacking and people coughing?


Look at the VSCode Extension API for instance:

* Extensions are at least partially isolated from the main process and each other. I don't know how far this goes or how secure, but for our purposes they could be as isolated as you want.

* They can register a command which when activated, performs some action from within the extension.

* Those commands show up in the command list at the top of the screen when you hit "Control + Shift + P" (which is itself a command subject to all the same rules) and users can select whichever command they want to run.

* Those commands can also be assigned to a hotkey.

* Extensions can suggest a default binding for this hotkey, but users can go into the VSCode settings page and assign whatever keycombo they desire.

* Users can remove keybindings for commands that have them by default.

* Users can assign keybindings to commands that don't have them by default.

* All keybind editing is performed through the same UI, which is owned by VSCode and not the extension.

* VSCode itself is the thing listening to those key events, and it calls into the extension to activate it.

* There is a concept of context for when these keybinds apply, such as when a markdown document is active in the editor. Extensions provide these contexts by default and users can override them as they choose.

* Extensions themselves don't even start up until an activation event is met. These can be opening a file of a certain language, running a command, etc...

What about this model is objectionable in your mind? Who benefits from the lockdown you propose where no new ideas can be tried until the DE authority deigns to allow it?

All of your concerns regarding music players are actually already addressed by the MPRIS spec: https://specifications.freedesktop.org/mpris-spec/latest/

This problem has been solved in the DEs for quite some time now, global hotkeys is probably the very last thing you want to reach for, when all other options are exhausted.

And when that happens, I don't know about other DEs, but the way you're describing with extensions is mostly how GNOME Wayland already works. If you want to intercept a key, you ship an extension for that, and display a UI to rebind the keys, or you can place additional entries in the system's keyboard settings panel. What you can't do is have random unprivileged processes intercept keys without the user's permission.

Probably because nobody on earth wants what would be the convenient design to dictate how they use something.

Depends on an application. Plenty of applications have a legitimate reason to hook into global keyboard handlers, or otherwise instrument other applications or the system itself. Removing that capability would be crippling to the utility of a computer.

It sort of works in the mobile space (+/- accessibility services, which are the way to get that functionality if you need it), but that's only because mobile devices are consumption-oriented; there's only so much you can do with them, and they eschew utility in order to streamline you into being monetized by third-party services.

Problematic. Means you are relying on the DE for all UI development. With Windows and X11 the DE is reconfigurable to some extent by applications.

There is also more to this, showing the Wayland security model is incoherent. If debugging is enabled on your system I can simply debug your root terminal and start injecting commands into memory.

Wayland obviously cannot solve all security issues and privilege escalations in the underlying OS -- the solution to that problem would be to just disable debugging within that security context.

In Wayland the situation is also not so much different from Windows, you would just reconfigure and extend the DE itself. You are already relying on them to provide most of the UI to an extent.

In regards to your first point, I am saying that Wayland does not do what it sets out to do and the proper way of doing it is a much larger problem than what Wayland can address. It is possibly better to avoid breaking everyone's stuff if you're just going to need to shortly do it again to fix the things Wayland does not and can not fix.

For your latter point, Wayland is much different. In Windows there are ways to inject behavior into other programs or the UI elements. This exists so you can change the UI. Wayland's idea of security is very much against that, at least in the way it can be done on Windows/X11.

I don't know what you mean you'll need to break it again, this one particular hole is already plugged in Wayland. That won't cause additional breakage, and you'll still want to be plugging those additional holes like ptrace anyway.

Input injection is being worked on in a different library, that brings a standard API that's supposed to work the same across Wayland, X11, and with sandboxed contexts: https://gitlab.freedesktop.org/libinput/libei

Not all applications need to be a classic "windowed application".

I know this is a rare use case, but the DE and the app need to work togheter to allow global shortcuts. I for example have a keyboard with many useless media keys , that I configured to run some specific scripts and some I could very easy set them up as Global shortcuts for some KDE apps. Not sure if other DEs would offer to setup global shortcuts for you.

Probably something most users will use is screen recordings and screen readers , this applications need to have global shortcuts(and access to the screen and window elements)

I am currently writing a utility to spy on Super Hexagon's KeyPress/KeyRelease XEvents with the goal of recording very precise timestamps for each event. Since the walls in the game arrive at a constant rate, I want to graph my keypress/keyrelease events modulo the time between walls.

My success rate for finishing Hyper Hexagonest[1] is falling; being able to inspect recent attempts might reveal where my timing is drifting.

(it's only ~25%-ish finished, unfortunately)

[1] https://www.youtube.com/watch?v=JJ96olZr8DE

WorkRave can't function well without it. Currently, it's working through a compatibility layer I think, but if that breaks, the application can't be made to work on Wayland AFAIK.

Alternative shortcut managers might also run into trouble (say, clipboard managers with customizable shortcuts) although those could work with a simple permission prompt for each new shortcut. Very annoying and obtrusive to the user, but the security principles remain.

There are also tools that extend some window managers with i3-like shortcuts and configurations; the config files are parsed on the fly and they need to work in the background.

Then there's tools like AutoHotkey that can be great tools for productivity through scripting and custom shortcuts. A lot of AH's functionality can be replicated using the standard hotkey API, but not everything.

There's also diagnostic tools (see Windows Steps Recorder) that record all keystrokes and generate a step by step report about what happened with screenshots as a guide.

IMO global key capture should still be possible with the right capabilities set because there are valid use cases for it. Requiring additional permissions is fine IMO, but completely removing the option to do this is a pain.

In order to implement https://www.semicomplete.com/projects/keynav/ (or anything else that listens for a specific key combo to activate)

Isn't that kind of thing the responsibility of the desktop environment service? Not an application?

As long as the desktop environment provides a mechanism for applications to register their global shortcuts then all is well. However, without such a feature the application has to just listen for all keystrokes and check if it matches something the user's configured.

Example: I've bound several scenes in OBS to keyboard shortcuts so that I can switch scenes no matter what application I'm in. I'm pretty sure OBS does this by listening to all keyboard events on the root window. It's not very efficient, but it works.

Now imagine having to support a global shortcut daemon per DE that exists. I suppose someone would write an abstraction but really, it's something that FreeDesktop.org should provide (if it already doesn't).

I suppose the proper answer is that that very much depends on your philosophy of desktop environments; I happen to prefer modular desktops built out of separate programs from separate authors which may or may not even know about each other. This has the advantage of making it trivial to retrofit new features without needing to ask your DE upstream. Or to be more pithy: Okay, fine; what desktop environments implement keynav? With X11, I can make the answer be "(probably) all of them".

Global Hotkeys for something like Quake terminal, or voice chat push-to-talk. Obviously I don't see why you would encumber the core protocol for it, but a subscription model for it as a wlroots or FreeDesktop portal could be interesting.

PureText[0] is a Windows application I use regularly. It adds a keyboard shortcut for pasting as plain text.

[0] https://stevemiller.net/puretext/

Because they are serial character devices?

Detecting keyup only in a window env seems like a perfectly good place to be. The problem there might be that there is no such thing as a "desktop linux" so writing a nice cross platform desktop app for any Linux regardless of window manager and desktop env I assume is a bit of a black art.

The X11 example only has a basic event loop with a check if the if the key up event was repeated. Everything else is setting up the window and tearing it down on close. That should work in any window manager.

Most of the examples are either global listeners or outside of a window environment.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact