The bad news is the defaults on modern platforms are often very bad for latency. The good news is that it is possible to achieve good latency on most modern systems with a lot of attention to detail. With good hardware and good software it is even possible to e.g. run console emulators with lower latency than they would have on original hardware connected to a CRT. I just wrote a three part series detailing a lot of ways to improve latency:
> Delay rendering until just before VSync: If you get it slightly wrong and your frame takes slightly more time to render than you thought, your frame may not be done in time for VSync. Then it will have to wait a whole extra frame and the previous frame will be displayed twice, causing a hitch in any animations.
According to docs, there are extensions WGL_EXT_swap_control_tear/GLX_EXT_swap_control_tear [0], that cause late frames to tear instead of wait a full frame. They don't work on my machine (my Intel HD 4000 reports that it is supported and then silently fails), but this should be the ideal swap mechanism.
Tearing is a pretty bad artifact so I wouldn't say that enabling it is ideal. I'm a stickler for low latency but even I don't think it's worth it in most cases. It's possible to achieve great latency without tearing. The ideal swap mechanism would be VRR when available.
Those GL extensions likely predate modern composition window managers and fail to work when compositing is enabled. As I discuss in the platform specific considerations section, tearing on Windows requires either full screen, or Multiplane Overlay which is not supported by OpenGL.
On a non-VRR display, I believe it's nicer to have a momentary tearline near the top of the screen than a momentary frame skip. The stutter is more noticeable than the tearline. (Some feature of how GPUs work means that tearlines landing exactly at vsync are mysteriously delayed half a frame. Maybe someone else knows why, since I don't and neither do the people I talked to.)
~2 msec (mouse)
8 msec (average time we wait for the input to be processed by the game)
16.6 (game simulation)
16.6 (rendering code)
16.6 (GPU is rendering the previous frame, current frame is cached)
16.6 (GPU rendering)
8 (average for missing the vsync)
16.6 (frame caching inside of the display)
16.6 (redrawing the frame)
5 (pixel switching)
I'm not very familiar with graphics pipelines, but some stuff here seems wrong. If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms. You can't start simulating the next tick while rendering the previous tick unless you try to do some kind of copy-on-write memory management for the entire game state. And with double buffering, the GPU should be writing frame n to the display cable at the same time as it's computing frame n+1., and the display writing the frame to its cache buffer should be happening at the same time as the GPU writes the frame to the cable.
By my count that's a whole 50 ms that shouldn't be there.
From the linked article:
One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1.
Maybe modern games do use CoW memory?
[The GPU] might collect all drawing commands for the whole frame and not start to render anything until all commands are present.
It might, but is this typical behavior? This implies that the GPU would just sit idle if it finished rendering a frame before the CPU finished sending commands to draw the next one — why would it do that?
Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency.
Maybe this is what is meant by the "16.6 (frame caching inside of the display)" item? That might be real then.
Games generally don't use copy on write, but they do often explicitly pipeline processing to happen across multiple frames (usually by manually copying the necessary data from sim "owned" memory to render "owned" memory, but varying amounts of double buffering is also used). This was especially true after the transition to multi-core but before the many-core regime of today. Transitioning from a single threaded engine, it was easier to run effectively a single-threaded simulation frame and a single-threaded render frame in parallel than to fully multithread everything. Graphics APIs took a while to support multithreading, as well.
These days game programmers have gotten experienced enough to get closer to fully saturating all cores in both the simulation and render steps, so you sometimes no longer see the two full frames of latency there.
> 16.6 (GPU is rendering the previous frame, current frame is cached)
Not entirely sure what this is about. Maybe some sort of triple buffering is being employed as a way to reduce hitches? If you push the engine really close to the 16 ms limit for each stage of your pipeline, sometimes something out of you control, like the OS deciding to do some heavy background work, will push you over your limit. Without the extra buffer, you will miss your vsync and the user will perceive a very disturbing judder.
I agree that it seems like the "game" part of the latency has about 33ms extra, but the source of this breakdown[0] seems to be knowledgeable and includes measurements that corroborate many of the claims. I was surprised, for example, that vsync seemed to add 2 frames of latency rather than 1 in this test.
The total time in this breakdown is in line with the measured total time, so if the source is wrong about the game by claiming it takes longer than it does, they're also claiming that some other stages take less time than they do by basically the same amount. I would bet on the monitor, but I don't have much reason to think they're wrong to begin with.
> If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms.
It can work this way--e.g. nvidia exposes an 'ultra low latency mode' in their driver that caps prerendered frames to zero--but typically for smoother animation and higher average fps gpus will have a queue of several frames that they're working on, and this is irrespective of how many render targets you have in your swapchain. Danluu's breakdown above is actually correct for the typical case.
---
Thought I'd clarify how this works since there's lots of confusion in this thread. In the early days you would directly write pixels to memory and they'd be picked up by a RAMDAC and beamed out to the screen. So if you wanted to invert the color of the bottom right pixel it would take at most two frames or 33ms of latency if you were running at 60fps double buffered: first you set your pixel in the back buffer, wait up to 16.66ms to finish drawing the current front buffer, flip buffers, wait 16.65ms for the electron gun to make its way down to the bottom right corner, and then finally draw the inverted pixel.
With modern gpu's, the situation is very similar to sending commands to a networked computer somewhere far away. You have a bit of two-way negotiation at the beginning to allocate gpu memory, upload textures/geometry/shaders, etc., and then you have a mostly one-way stream of commands. The gpu driver can queue these commands to an arbitrary depth, regardless of your vsync settings, double/triple buffering, etc, and is actually free to work on things out of order. You have to explicitly mark dependencies and a 'present' call isn't intrinsically tied to when that buffer will actually end up displayed on screen. So there's no actual upper bound on latency here; even at 360hz if the gpu is perpetually 10 frames behind the cpu, each frame only takes 2.77ms to simulate and 2.77ms to render but the overall input lag could still be ~30ms. (In practice though, drivers will typically only render 2-3 frames ahead.)
The Apple //e, with its 1MHz clock and 8-bit CPU, had an average latency from keypress to character display of 30msec. Modern computers are dramatically slower in keypress to text display. There are reasons, but end users see a slower system.
modems in the 90s would still negotiate 300 baud connections depending on how good the phone connection was, which could depend on the weather. I think there also needs to be a full duplex loop from the terminal emulator to the unix host and back for the character to show up on the screen. And consider that the both the PC that the emulator was running on was a 90s PC, and the web server I think was a shared host at the ISP. Some vi commands also would trigger a lot of screen update action. It was very easy to type far ahead of what was on the screen. If I lost track of what I had typed, I would have to stand up and walk away from the keyboard for a sec to let the screen catch up.
I started using modems in the late 80's. My first one was 1200 baud. Never once did a modem negotiate 300 baud on its own, even in the worst conditions. Were you connecting through a tin-can with string?
Terminal emulators are not CPU intensive applications. I ran one on an Apple II (8-bit, 1 mhz) and it could keep up with at least 2400 baud. If you were refreshing the screen with vi, I could see it being slow.
I was mostly inserting tags at the start and end of paragraphs of with `I` and `A` and other repetitive markup where I might have to `I` and the down arrow three times and then something else. I'd also do a lot of `:` ex based search and replace.
The way the university dial up pool worked, and when I worked for an ISP for 6 weeks in 96, there would be a room full of phone lines and modems where the dial in would happen. Sometime you would get one bad line, and sometimes you would get one bad modem, and probably sometimes you would get a bad line going to a bad modem. To keep users from paying local tolls, you would have to have several locations for the modem pools. In the case of the ISP, Bill Blue rented garages around the county and had T1s or something run out the the garages.
I didn't say it was common to connect at 300 baud, or that the vi story had anything to do with 300 baud. I know I did connect at 300 baud more than once in the early 90s, and I that is when I found out that I could read usenet news at 300 baud w/o using a pager.
At one job I had in 95, a few times I did connect at 300. (I'd hang up and try again if that happened, but sometimes I would vi at 1200 baud. 2400 I think was normal, and sometimes I'd get lucky and connect at 48k. It was nominally at 56k modem, but I never saw in connect at that.) I can't remember the name of the terminal emulator I used from windows in 95, or what even the browser was, must have been mosaic? I was coding up web pages for lawyers at https://www.lawinfo.com / experienced attorneys referral service before Guenter sold it to Thompson Reuters. Pre-web, the outfit would place ads in yellow pages nationally, and then transfer calls to attorneys who subscribed. I supported the computers for the folks who took the calls from the yellow page ads, and the computers for the folks who cold called attorneys all day, but most of the day I was creating HTML in vi for lawyers. I think we used something called lantastic; and we had a commercial CRM system that ran on DOS and dialed the phones for the sales team... it's on the tip of my tongue... I remember loading new phone numbers into it from some vendor feed for the sales force. We were in a weird strip mall in Encinitas, and I remember hanging out with the folks who worked next door at some sort of computer business that made our PCs but also worked on some sort of B2B software.
Not a big deal, 122.6 ms is still way below the Doherty Threshold :)
Dropping into a real terminal on Linux feels so weird when typing. I swear sometimes I see a letter on the screen before I actually touch the key. Similar to playing an Atari on a CRT, paddle games, like Breakout, feel like you're physically attached to the on-screen paddle with a sturdy rod as opposed to the mushy feel you get from a mouse in modern games.
I do, mostly. I often ran in some terminal emulation issues with other terminals, especially when using older software.
These days, I could probably switch to another terminal, as I'm in tmux quite often, which creates those compatibility issues itself, no matter what "backend" you're using.
The developer of xterm actually maintains a battery of tests which exercise corner cases in the DEC VT protocol and ensure that xterm conforms in the manner that a real terminal would: https://invisible-island.net/vttest/vttest.html
Xterm really is a terminal emulator. Most other "modern" TEs are more like shitty xterm emulators.
I tried to switch from XTerm to GNOME Terminal. It went well for a while, and better Unicode and emoji display was nice, but then a new version of GNOME Terminal came out which broke the ability to use the Meta keys for sending an ESC prefix; it is now hard-coded to only accept the Alt keys to do that. So I had to switch back to XTerm.
Are you using old PowerShell or the newer Open Source PowerShell 6/7? Try also running it from Terminal, the official modern terminal app from the store. It's much faster than bare cmd or PowerShell, IIRC it's because of conhost.
Intro: https://james.darpinian.com/blog/latency
Techniques to improve latency in your applications: https://james.darpinian.com/blog/latency-techniques
Platform-specific considerations: https://james.darpinian.com/blog/latency-platform-considerat...