Terminal Latency (2017)

modeless · on Aug 11, 2022

The bad news is the defaults on modern platforms are often very bad for latency. The good news is that it is possible to achieve good latency on most modern systems with a lot of attention to detail. With good hardware and good software it is even possible to e.g. run console emulators with lower latency than they would have on original hardware connected to a CRT. I just wrote a three part series detailing a lot of ways to improve latency:

Intro: https://james.darpinian.com/blog/latency

Techniques to improve latency in your applications: https://james.darpinian.com/blog/latency-techniques

Platform-specific considerations: https://james.darpinian.com/blog/latency-platform-considerat...

ad8e · on Aug 12, 2022

> Delay rendering until just before VSync: If you get it slightly wrong and your frame takes slightly more time to render than you thought, your frame may not be done in time for VSync. Then it will have to wait a whole extra frame and the previous frame will be displayed twice, causing a hitch in any animations.

According to docs, there are extensions WGL_EXT_swap_control_tear/GLX_EXT_swap_control_tear [0], that cause late frames to tear instead of wait a full frame. They don't work on my machine (my Intel HD 4000 reports that it is supported and then silently fails), but this should be the ideal swap mechanism.

[0]: https://registry.khronos.org/OpenGL/extensions/EXT/GLX_EXT_s...

modeless · on Aug 12, 2022

I discuss tearing here: https://james.darpinian.com/blog/latency-techniques#tearing-... and here https://james.darpinian.com/blog/latency-platform-considerat...

Tearing is a pretty bad artifact so I wouldn't say that enabling it is ideal. I'm a stickler for low latency but even I don't think it's worth it in most cases. It's possible to achieve great latency without tearing. The ideal swap mechanism would be VRR when available.

Those GL extensions likely predate modern composition window managers and fail to work when compositing is enabled. As I discuss in the platform specific considerations section, tearing on Windows requires either full screen, or Multiplane Overlay which is not supported by OpenGL.

ad8e · on Aug 12, 2022

On a non-VRR display, I believe it's nicer to have a momentary tearline near the top of the screen than a momentary frame skip. The stutter is more noticeable than the tearline. (Some feature of how GPUs work means that tearlines landing exactly at vsync are mysteriously delayed half a frame. Maybe someone else knows why, since I don't and neither do the people I talked to.)

adrusi · on Aug 11, 2022

    ~2 msec (mouse)
    8 msec (average time we wait for the input to be processed by the game)
    16.6 (game simulation)
    16.6 (rendering code)
    16.6 (GPU is rendering the previous frame, current frame is cached)
    16.6 (GPU rendering)
    8 (average for missing the vsync)
    16.6 (frame caching inside of the display)
    16.6 (redrawing the frame)
    5 (pixel switching)

I'm not very familiar with graphics pipelines, but some stuff here seems wrong. If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms. You can't start simulating the next tick while rendering the previous tick unless you try to do some kind of copy-on-write memory management for the entire game state. And with double buffering, the GPU should be writing frame n to the display cable at the same time as it's computing frame n+1., and the display writing the frame to its cache buffer should be happening at the same time as the GPU writes the frame to the cable.

By my count that's a whole 50 ms that shouldn't be there.

From the linked article:

One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1.

Maybe modern games do use CoW memory?

[The GPU] might collect all drawing commands for the whole frame and not start to render anything until all commands are present.

It might, but is this typical behavior? This implies that the GPU would just sit idle if it finished rendering a frame before the CPU finished sending commands to draw the next one — why would it do that?

Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency.

Maybe this is what is meant by the "16.6 (frame caching inside of the display)" item? That might be real then.

alberth · on Aug 11, 2022

John Carmack famously said:

“I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?”

https://mobile.twitter.com/id_aa_carmack/status/193480622533...

pistachiopro · on Aug 11, 2022

Games generally don't use copy on write, but they do often explicitly pipeline processing to happen across multiple frames (usually by manually copying the necessary data from sim "owned" memory to render "owned" memory, but varying amounts of double buffering is also used). This was especially true after the transition to multi-core but before the many-core regime of today. Transitioning from a single threaded engine, it was easier to run effectively a single-threaded simulation frame and a single-threaded render frame in parallel than to fully multithread everything. Graphics APIs took a while to support multithreading, as well.

These days game programmers have gotten experienced enough to get closer to fully saturating all cores in both the simulation and render steps, so you sometimes no longer see the two full frames of latency there.

> 16.6 (GPU is rendering the previous frame, current frame is cached)

Not entirely sure what this is about. Maybe some sort of triple buffering is being employed as a way to reduce hitches? If you push the engine really close to the 16 ms limit for each stage of your pipeline, sometimes something out of you control, like the OS deciding to do some heavy background work, will push you over your limit. Without the extra buffer, you will miss your vsync and the user will perceive a very disturbing judder.

anonymoushn · on Aug 11, 2022

I agree that it seems like the "game" part of the latency has about 33ms extra, but the source of this breakdown[0] seems to be knowledgeable and includes measurements that corroborate many of the claims. I was surprised, for example, that vsync seemed to add 2 frames of latency rather than 1 in this test.

The total time in this breakdown is in line with the measured total time, so if the source is wrong about the game by claiming it takes longer than it does, they're also claiming that some other stages take less time than they do by basically the same amount. I would bet on the monitor, but I don't have much reason to think they're wrong to begin with.

[0]: http://renderingpipeline.com/2013/09/measuring-input-latency...

lodi · on Aug 11, 2022

> If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms.

It can work this way--e.g. nvidia exposes an 'ultra low latency mode' in their driver that caps prerendered frames to zero--but typically for smoother animation and higher average fps gpus will have a queue of several frames that they're working on, and this is irrespective of how many render targets you have in your swapchain. Danluu's breakdown above is actually correct for the typical case.

---

Thought I'd clarify how this works since there's lots of confusion in this thread. In the early days you would directly write pixels to memory and they'd be picked up by a RAMDAC and beamed out to the screen. So if you wanted to invert the color of the bottom right pixel it would take at most two frames or 33ms of latency if you were running at 60fps double buffered: first you set your pixel in the back buffer, wait up to 16.66ms to finish drawing the current front buffer, flip buffers, wait 16.65ms for the electron gun to make its way down to the bottom right corner, and then finally draw the inverted pixel.

With modern gpu's, the situation is very similar to sending commands to a networked computer somewhere far away. You have a bit of two-way negotiation at the beginning to allocate gpu memory, upload textures/geometry/shaders, etc., and then you have a mostly one-way stream of commands. The gpu driver can queue these commands to an arbitrary depth, regardless of your vsync settings, double/triple buffering, etc, and is actually free to work on things out of order. You have to explicitly mark dependencies and a 'present' call isn't intrinsically tied to when that buffer will actually end up displayed on screen. So there's no actual upper bound on latency here; even at 360hz if the gpu is perpetually 10 frames behind the cpu, each frame only takes 2.77ms to simulate and 2.77ms to render but the overall input lag could still be ~30ms. (In practice though, drivers will typically only render 2-3 frames ahead.)

jesse__ · on Aug 11, 2022

Yeah I'm with you .. even with double buffering these numbers don't hold water

sdwr · on Aug 11, 2022

I don't know too much about the whole graphics pipeline, but this is definitely double-counting.

I will say though, whatever the numbers are, after running on a 144hz monitor with adaptive sync, 60 FPS feels painfully jerky for gaming.

amarshall · on Aug 11, 2022

Input + display lag is a decent chunk of that, at least.

dwheeler · on Aug 11, 2022

Related: "Why Modern Computers Struggle to Match the Input Latency of an Apple IIe" https://www.extremetech.com/computing/261148-modern-computer...

The Apple //e, with its 1MHz clock and 8-bit CPU, had an average latency from keypress to character display of 30msec. Modern computers are dramatically slower in keypress to text display. There are reasons, but end users see a slower system.

fhars · on Aug 11, 2022

Those results are actually by Dan Luu and done at about the same time (2017), too: https://danluu.com/input-lag/

dang · on Aug 11, 2022

Terminal and shell performance - https://news.ycombinator.com/item?id=14798211 - July 2017 (204 comments)

terminallybored · on Aug 11, 2022

The zutty project had an interesting article on terminal latency, as well:

https://tomscii.sig7.se/2021/01/Typing-latency-of-Zutty

tingletech · on Aug 11, 2022

I learned vi over dial up in the early 90s. One's fingers would often be very far ahead of the screen.

icedchai · on Aug 11, 2022

Could you type at 2400 baud?

I upgraded to a 9600 baud modem in 1992, I think. That was finally when things felt "fast."

tingletech · on Aug 11, 2022

No, but I can read at 300 baud iirc.

icedchai · on Aug 11, 2022

Nobody was using 300 baud in the 90's! Maybe that's why I was confused.

tingletech · on Aug 11, 2022

modems in the 90s would still negotiate 300 baud connections depending on how good the phone connection was, which could depend on the weather. I think there also needs to be a full duplex loop from the terminal emulator to the unix host and back for the character to show up on the screen. And consider that the both the PC that the emulator was running on was a 90s PC, and the web server I think was a shared host at the ISP. Some vi commands also would trigger a lot of screen update action. It was very easy to type far ahead of what was on the screen. If I lost track of what I had typed, I would have to stand up and walk away from the keyboard for a sec to let the screen catch up.

icedchai · on Aug 12, 2022

I started using modems in the late 80's. My first one was 1200 baud. Never once did a modem negotiate 300 baud on its own, even in the worst conditions. Were you connecting through a tin-can with string?

Terminal emulators are not CPU intensive applications. I ran one on an Apple II (8-bit, 1 mhz) and it could keep up with at least 2400 baud. If you were refreshing the screen with vi, I could see it being slow.

tingletech · on Aug 14, 2022

I was mostly inserting tags at the start and end of paragraphs of with `I` and `A` and other repetitive markup where I might have to `I` and the down arrow three times and then something else. I'd also do a lot of `:` ex based search and replace.

The way the university dial up pool worked, and when I worked for an ISP for 6 weeks in 96, there would be a room full of phone lines and modems where the dial in would happen. Sometime you would get one bad line, and sometimes you would get one bad modem, and probably sometimes you would get a bad line going to a bad modem. To keep users from paying local tolls, you would have to have several locations for the modem pools. In the case of the ISP, Bill Blue rented garages around the county and had T1s or something run out the the garages.

I didn't say it was common to connect at 300 baud, or that the vi story had anything to do with 300 baud. I know I did connect at 300 baud more than once in the early 90s, and I that is when I found out that I could read usenet news at 300 baud w/o using a pager.

latchkey · on Aug 11, 2022

Those terminals were a huge upgrade over the 300-1200 baud modem connections in the 80s...

tingletech · on Aug 11, 2022

At one job I had in 95, a few times I did connect at 300. (I'd hang up and try again if that happened, but sometimes I would vi at 1200 baud. 2400 I think was normal, and sometimes I'd get lucky and connect at 48k. It was nominally at 56k modem, but I never saw in connect at that.) I can't remember the name of the terminal emulator I used from windows in 95, or what even the browser was, must have been mosaic? I was coding up web pages for lawyers at https://www.lawinfo.com / experienced attorneys referral service before Guenter sold it to Thompson Reuters. Pre-web, the outfit would place ads in yellow pages nationally, and then transfer calls to attorneys who subscribed. I supported the computers for the folks who took the calls from the yellow page ads, and the computers for the folks who cold called attorneys all day, but most of the day I was creating HTML in vi for lawyers. I think we used something called lantastic; and we had a commercial CRM system that ran on DOS and dialed the phones for the sales team... it's on the tip of my tongue... I remember loading new phone numbers into it from some vendor feed for the sales force. We were in a weird strip mall in Encinitas, and I remember hanging out with the folks who worked next door at some sort of computer business that made our PCs but also worked on some sort of B2B software.

leephillips · on Aug 11, 2022

Maybe of more interest would be https://lwn.net/Articles/751763/, which measures latency on Linux.

kgwxd · on Aug 11, 2022

Not a big deal, 122.6 ms is still way below the Doherty Threshold :)

Dropping into a real terminal on Linux feels so weird when typing. I swear sometimes I see a letter on the screen before I actually touch the key. Similar to playing an Atari on a CRT, paddle games, like Breakout, feel like you're physically attached to the on-screen paddle with a sturdy rod as opposed to the mushy feel you get from a mouse in modern games.

bitwize · on Aug 11, 2022

Curiously not listed: xterm, which has close to the lowest latency of any open source TE. But who uses that anymore?

mhd · on Aug 11, 2022

I do, mostly. I often ran in some terminal emulation issues with other terminals, especially when using older software.

These days, I could probably switch to another terminal, as I'm in tmux quite often, which creates those compatibility issues itself, no matter what "backend" you're using.

bitwize · on Aug 11, 2022

The developer of xterm actually maintains a battery of tests which exercise corner cases in the DEC VT protocol and ensure that xterm conforms in the manner that a real terminal would: https://invisible-island.net/vttest/vttest.html

Xterm really is a terminal emulator. Most other "modern" TEs are more like shitty xterm emulators.

mhd · on Aug 11, 2022

Thomas Dickey is one of the unsung heroes of the open source movement, despite his double-wrong taste in editors ;)

ac130kz · on Aug 11, 2022

I did, I switched to kitty, because of Wayland. It feels crazy responsive even without looking at tests

spc476 · on Aug 11, 2022

I still do. Been using it for over 30 years now.

teddyh · on Aug 11, 2022

I tried to switch from XTerm to GNOME Terminal. It went well for a while, and better Unicode and emoji display was nice, but then a new version of GNOME Terminal came out which broke the ability to use the Meta keys for sending an ESC prefix; it is now hard-coded to only accept the Alt keys to do that. So I had to switch back to XTerm.

aurelien · on Aug 11, 2022

why not xterm? xterm respect most of standard, maybe one of the closest to all standards.

collegeburner · on Aug 11, 2022

semi related but anybody know how i fix the powershell latency on windows? it's literally unusable as a shell besides running scripts.

LtdJorge · on Aug 11, 2022

Are you using old PowerShell or the newer Open Source PowerShell 6/7? Try also running it from Terminal, the official modern terminal app from the store. It's much faster than bare cmd or PowerShell, IIRC it's because of conhost.

von_lohengramm · on Aug 11, 2022

It is because of conhost. You can actually replace the default Windows one with the new OpenConsole and it makes your entire PC faster. Kinda neat.

thfuran · on Aug 11, 2022

Wait, what?

vetinari · on Aug 11, 2022

But only on Windows 11, unfortunately.

ajoseps · on Aug 11, 2022

For me the new Terminal app is much slower than something like Cmder (ConEmu)