Hacker News new | past | comments | ask | show | jobs | submit login
Desktop compositing latency is real (lofibucket.com)
538 points by dezgeg on Nov 21, 2017 | hide | past | favorite | 291 comments



I've been experimenting with this too, in the context of the Windows front-end for xi editor. It's absolutely true that the compositor adds a frame of latency, but I have a very different take than "turn it off."

First, it's possible to design an app around the compositor. Instead of sending a frame to the system, send a tree of layers. When updating the content or scrolling, just send a small delta to that tree. Further, instead of sending a single frame, send a small (~100ms) slice of time and attach animations so the motion can be silky smooth. In my experiments so far (which are not quite ready to be made public but hopefully soon), this gets you window resizing that almost exactly tracks the mouse (as opposing to lagging at least one frame behind), low latency, and excellent power usage.

Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface. These already exist on mobile, where power is important, but making it work for the desktop is challenging. When that happens, you get your frame back.

So I think the answer is to move forward, embrace the compositor, and solve the engineering challenges, rather than move backwards, even though adding the compositor did regress latency metrics.

Edit: here's the talk I was referencing that mentions overlays: https://www.youtube.com/watch?v=E3wTajGZOsA


The issue with sending a tree of layers is that this prevents some important optimizations: avoiding overdraw with early Z becomes impossible, because your app doesn't know anything about the positions of the scrollable layers and so has to be conservative and paint all of their contents. So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making. (Note that today, almost all apps overdraw like crazy anyway, but that should be fixed.) :)

> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.

This seems like the best solution to me. It allows apps to paint content intelligently to prevent overdraw while avoiding any latency.


It might be lots of overdraw in the general web case, but for a text editor almost everything the app renders is going to be shown on the screen. The exception is a bit of stuff just outside the scroll viewport, so it can respond instantly to scroll requests. This feels like a good tradeoff to me.


> So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making.

I would jump at the chance to make that trade. My 8 year old CPU is rarely taxed by normal usage. What's the point in having a faster computer if it feels slower?


Overdraw doesn't just use some CPU. If that would be all, it'd be an easy problem to solve. It causes extra data to be transferred over various buses, extra stops in a pipeline of operations that is already cramped trying to provide a frame every few ms, and probably other problems that I don't know about. This problem isn't as easy as 'throw some more CPU at it'.


I think it’s only that complex and taxing for GUI frameworks that mix CPU and GPU rendering. Like Win32 GDI or GTK+.

For modern GUI frameworks that were designed for GPU, like MS WPF/XAML or Mozilla WebRender, overdraw just use some GPU resources, but that’s it. GPUs are designed for much more complex scenes. In-game stuff like particles, lights, volumetric effects, and foliage involve tons of overdraw. In comparison, a 2D GUI with overdraw is very light workload for the hardware.


Overdraw actually matters quite a bit for integrated GPUs, especially on HiDPI. On high-end NVIDIA or AMD GPUs, sure, 2D overdraw doesn't matter too much. But when power is a concern, you don't want to be running on the discrete GPU, so it's worth optimizing overdraw.


That seems like a dubious claim to me. In my experience maxing out memory or pcie bandwidth is rarely a bottleneck, and extremely unlikely / impossible with low CPU usage.

Do you have a link or more detail about 'already cramped pipelines'?


2D rendering on GPUs is almost entirely memory bound.

Source: I've been working on GPU 2D rendering full time for years now.


Depends heavily on the GPU and the scene being rendered. On Android I've seen low-end GPUs bottlenecked by the shader ALU of all things, even though the 2D renderer only produces trivial shaders. Turns out it's easier to whack off shader compute cores than it is to muck with the memory bus in some cases.

Most common situation is the GPU isn't taxed at all, though, and doesn't even leave idle clocks. We end up just being limited by GL draw call overhead.


You seem to be implying that rendering of a text editor has to be all 2d and that rendering a frame pushes the memory bandwidth to it's limits, both of which can't possibly be true. Why can games run at 144hz and above but a text editor can't afford to overdraw for decreased latency?


> Instead of sending a frame to the system, send a tree of layers.

Which API does one use to do this?

I'm slightly surprised that hardware overlays aren't already a feature, especially given that they're handled by the graphics card. I know there's a special API for video overlay, especially DRM video (where part of the requirement is that the system doesn't allow it near the compositor where it could be screenshotted). Can you do video "genlock" on Windows? (edit: https://msdn.microsoft.com/en-us/library/windows/desktop/dd7... )

I'm also wondering how VR/AR stuff handles this on Windows.


DirectComposition. It's been there since Windows 8, and is used by Chrome among other apps (see https://bugs.chromium.org/p/chromium/issues/detail?id=524838).

It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.

There's a bunch of stuff in the interface to support video and also integrated 3D content ("create swapchain for composition"), but I don't know how well it works. In my experiments and reading through the Chromium bug tracker, Microsoft's implementation of all this stuff is far from perfect, and it's hard, for example, to completely avoid artifacts on resizing.


> It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.

All hardware does some overlays, though not a large number. (You can see Intel GPU overlay setup at [1].) One scanout overlay is already in use on all major OS's, to draw the mouse cursor.

[1]: https://github.com/torvalds/linux/blob/e60e1ee60630cafef5e43...


Typically hardware will have the base layer, a cursor layer, and possibly a video overlay. But in many cases, the video "overlay" is actually emulated by the driver using a shader to do the colorspace conversion and the texture hardware to do scaling.


People generally aren't composing other effects on top of a video overlay. Making it work in the general case requires hardware that can do all the things the compositing engine is doing in software.


The usecase mentioned in Jesse's talk (linked in my root comment) is displaying a notification from some other app while playing a game. They added the "flip_discard" swapchain effect so that the OS can paint the notification on top of the game content before flipping it in hardware. This is something of a hack; I think you're right that the endgame for this is that the hardware can indeed implement the full compositing stack. I'm not sure how far away we are from this.


In a text editor, we don't want to compose a whole frame just because a character was inserted. The situation of drawing directly into the visible frame buffer at any time we like without caring about V-sync is pretty much ideal. If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.


> If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.

A new character won't tricker a draw call though. Rather the three characters will queue up, added to an internal buffer and when the next draw is due, they will all three be drawn.


That's how it should work. But when you look at the scatter plots, note how there are multiple bands under Windows 10. E.g. for gvim, not only is there an extra overall delay, but extra clusters of additional latency.


How are you testing the mouse cursor following the window borders on resize? I've read that windows turns off the hardware sprite mouse cursor when resizing windows so that it can software render it to always line up properly.


Visual inspection for now. I've got an Arduino and a high-speed camera, so my plan for the next step is to send mouse and keyboard events from the Arduino, blinking an LED at the same time, then capture both the LED and the monitor in the video. Then a bit of image analysis. This is the only way to be quantitative and capture all the sources of latency.


The implication of what I was saying is that windows will always show the cursor lining up with the border of a window during resizing.


Would that it were so.


Hmm, are you inspired or were part of https://github.com/google/walt ?


Would you be testing this with a high refresh rate monitor (120hz+) ? Would that matter?


Yes and yes. My son has a 144Hz gaming monitor, and my main monitor for coding is a Dell 4k. We'll test both.


I also believe that is true.

My reason being, I had adjusted the screen brightness using software - but the mouse cursor is still brighter than white anywhere else. When I dragged a window, the cursor turns dim.


Oh. This explains why the cursor flickers briefly and gets drawn partially across multiple monitors when at the split instead of the normal mouse cursor which is drawn only on one monitor at a given time.


>attach animations so the motion is silky smooth

Yeah, because there isn't enough latency already.

Hell no.


> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.

So they are copying the Xerox Alto.


This is also why picking a good monitor is important for software development.

Some monitors have tons of input lag (60-70ms) and that's the time it takes for you to see what you're typing to reach the display. This also includes the time it takes to see you move your mouse cursor too.

I did a huge write up on picking a good monitor for development which can be found at: https://nickjanetakis.com/blog/how-to-pick-a-good-monitor-fo...

I ended up buying the Dell UltraSharp U2515H http://amzn.to/2jF3WHp. It has very low input lag (10-15ms) and it runs natively at 2560x1440.


"I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?" - John Carmack


I had a hard time believing that quote at the time. John ended up answering the question I asked about it on SuperUser [1] with details about his setup. His writeup is impressive: he actually set up a high-speed camera to record the actual latency.

[1] https://superuser.com/q/419070/2269


I used to work for Philips' TV branch, they used a similar setup. Instead of a camera they mounted a light sensor to the display and measured end-to-end delay for various color transitions (white-to-black, black-to-white, ...). All other manufacturers and many serious reviewers perform similar tests.

Of course, these tests are always performed with all built-in image processing disabled, those can easily add 2 frames of delay on modern televisions. MPEG deblocking and noise removal are performed by most TV brands to improve image quality at a cost of latency.


Right, the issue is that 60Hz is ridiculously slow. Ethernet hardware is multiple orders of magnitude faster. It's just because of broadcast TV and CRT legacy. G-Sync / FreeSync allow up to 240Hz, which is ~4ms latency.


60Hz is 16.7ms between each picture. However, that only gives you a lower bound on the latency. The actual time from GPU sending picture, to picture being on screen, can be a lot more. See the John Carmack answer above.


No, the issue was that he was using a display with a crazy amount of buffering because they were doing so much filtering and doing it poorly.


It's not just monitors even keyboards can have 40+ms worth of delay and it all adds up.


I don't think that was what he was referring to


"I video record showing both the game controller and the screen with a 240 fps camera" https://superuser.com/q/419070/2269

So, it's total response times from key press on a game controller to screen that is important not individual components. Sure, 0.04 seconds on it's own is not a big deal, but when 5 or six things take 0.04 seconds you hit noticeable delays.


G-Sync and FreeSync have nothing to do with the available maximum refresh rate on any monitor. That is a property of the display interface.


Solution: send the pixel to the screen over IP!

(joking, but with a tiny serious element)


this reminds me of the argument that instead of going to USB C 3.1 as the grand unifying connector for all peripherals (including video), we should have instead migrated to Ethernet cables for everything.


Currently ethernet doesn't look cheap enough for video to me.

Even a simple 1920x1080 60Hz monitor needs 3 Gb/s, a 2560*1440 144Hz monitor 13 Gb/s, a 4k 60Hz 12 Gb/s.

DisplayPort 1.2 offers 17 Gb/s and DisplayPort 1.3 offers 26 Gb/s.


40 Gbps and 100 Gbps Ethernet standard was defined in 2010 [1]. Google "buy 100 Gbps Ethernet" and you will find it is real, although currently is geared primarily for datacenters.

[1] wikipedia.org/wiki/100_Gigabit_Ethernet


Actually not a bad idea. Ethernet speed is one gigabit per second.

And there are high throughput devices like LIDARs (eg Velodyne LiDAR known from Google/Waymo self-driving cars) that are connected via Ethernet rather than USB.

Having 1 GBit Ethernet LAN network since 2004, I would prefer the possibility to upgrade to 10Gbit home Ethernet in near future.


It's a terrible idea. USB-3 is faster than 1Gbit/s. and there's no way you can drive a 5K monitor with any current or near future Ethernet standard, the bandwidth demands are too punishing.

Ethernet isn't optimized around short-haul signals, like computer to screen over a few metres at most, it's for 100m+ runs in datacentres. The signals have to be a lot more durable.


> "USB-3 is faster than 1Gbit/s"

But Cat 6 Ethernet is 10 Gbit/s.


Still not as fast as USB-3+Thunderbolt which is already 40Gbit/s and intended to be pushed to at least 80Gbit/s to support 8K screens.


> there's no way you can drive a 5K monitor with any current or near future Ethernet standard, the bandwidth demands are too punishing.

Really? You need more than 100gbit for 5k monitor? Because there is an 40/100gbit ethernet phy standard right now (has been for a while, since 2011). And work is being done on up to 200gbit (due to be ratified in late 2018).


5120 * 2880 * 32bits * 60FPS is only 28.3 Gbit and both 40Gbit and 100Gbit are well established standards.

Granted 40GB cards are running 50$ and 100Gbit cards are running ~700$, but that's most from the low volumes involved.

Really though the larger issue is Ethernet connectors are not designed to be plugged in and removed regularly.


> "Ethernet connectors are not designed to be plugged in and removed regularly"

Well could easily use a smaller and easier plug while still being pin compatible with ethernet.


ethernet over multimode fiber, I want to be able to have accessories hundreds of meters away from my computer


Careful, multi-mode fiber is near its scaling limits. 10GBASE-SR and 25GBASE-SR exist, but that's about it. (By contrast, 40GBASE-LR and 100GBASE-LR work over single-mode fiber, and DWDM + coherent optics can push the medium much further.) Wouldn't want to be stuck at 25 gbps forever ;-)


Yeah, some of us can type pretty fast. And for mice, some of us take clicker games seriously.


Just make sure those cables are never bent beyond the minimum bend radius.


It's amazing how much you can bend fiber. Fiber is really hard to break just by bending it, assuming it is well insulated. You can have performance loss with bends and kinks though.


Just use bend-insensitive fiber. It's not like copper cables like being kinked, either.


in all seriousness, why not?


The little plastic bits always snap off. Ethernet is great for semipermanent wires but not very good for peripherals like portable hard drives that are constantly disconnected and reconnected.


That problem is largely solved if you use 8P8C connectors with a rubber boot protecting the latching tab.

https://info.pcboard.ca/wp-content/uploads/2017/04/RJ45-Conn...


You can't tell me that's better than the USB-C connector.

Even HDMI is better.


My MacBook Pro’s USB-C connectors glitch if you move them slightly. I like the data rate but I’m not happy with RAM-pack wobble in 2017.


Even with the protective guard the little plastic tabs inevitably break on me. I can't even say those have fared any better than the unprotected versions.


My main complaint is that the protective rubber often gets stiff as it ages, which can make it pretty dang hard to remove old cables (until you resort to pliers).


Yeah, live hinges (that is, plastic bits that intentionally bend, not rotate on pins) are always going to break eventually. They're just so easy to make that it's a trade off that doesn't fall in the consumer's favor.


Living hinges can have lifespan in the millions of cycles with the correct materials and design. I'm sure it's possible to make an Ethernet connector that doesn't break easily, but it would cost more, and people aren't willing to pay.


What’s that to do with displays which aren’t unplugged frequently?


I plug/unplug from my display several times daily, since the MBP connected to it travels with me. Granted, the end of the cable in the monitor typically doesn't move¹

¹well, one of them. The crap LG monitors we have constantly forget about the peripherals that are attached, and one easy way to fix this is to detach/reattach the peripheral. I suspect that the company is never going to RMA these devices, sadly.


USB-C is used for all sorts of devices many of which are unplugged frequently.


The obvious ones: Ethernet cables are big (imagine one on a phone), and aren't designed to be plugged and unplugged the massive number of times that USB cables are (spring contacts wear out, little clips break).


well maybe the actual plug could be modified into a small form, while still being pin-compatible with cat 5/5e/6.


USB cables have little to do with the question of sending pixels via IP

Outside HDMI and some DisplayPort cables, all display connectors click or screw in

And we’re talking about desktop OS and displays


I'm not even sure about sending it as IP, rather as raw Ethernet frames (or simply just using the cables).


Some people tried, most went out of business. Either chapter 11 or bought for assets and IP. OnLive was the biggest one.


people working on compositors should have that printed out and hung above their coffee machines.


Indeed. I would rather take an excellent monitor, keyboard and mouse over a faster machine without them. Sadly this means I am pretty much limited to desktops as finding a laptop with a colour accurate 120Hz+ display with excellent keyboard seems to be impossible unless I look at some gaming laptop monstrosity that weights 5KG and has a 1 hour battery life. If anyone reading this know of a good laptop for development please let me know!


I can't offer you 120Hz, but the ThinkPad P51 with 4K screen has excellent colour gamut and accuracy and the best laptop keyboard in the business (with the possible exception of the old-style ThinkPad keyboard currently available only in the limited-edition "ThinkPad 25"). Battery life is pretty good.

Downsides: relatively bulky, pricey, no touchscreen or stylus input. (Upside of that: matt rather than glossy screen, which I think is difficult to combine with a touchscreen.)


Razer Blades? Not sure about refresh rate, but they have amazing keyboards and displays from what I hear


A good (and easy to remember/no other affiliation) source for this data: https://displaylag.com/


Is that still a good monitor vs 4k Dell offerings? Reasonable price here in Brazil.


Yes. I'm going to buy a second one in a few days (waiting to see if there's any Black Friday deals).

At this point I've been using 1 for almost a year and it's been nothing but great. I want to get a second one and orient it vertically.


That's an affiliate link fwiw


It's true, but there's also a 4,000 word detailed blog post that describes how I came to pick that monitor along with giving you insights and tips on how to pick a different monitor if that one isn't for you.


Why do people care about affiliate links? Serious question.


It's good to know that the person posting the link has a financial stake in it. Depending on the context this can mean a lot - like a review or recommendation.


Except it's Amazon. I could understand that position if it was a Dell affiliate link. But Amazon sells so many brands of monitors, any of which could carry the same affiliate code, that I don't see how that matters.


Just like the FTC, I do not care about the existence of the affiliate link being on HN, but only the disclosure. It makes for a fair(er) market.


I pointed it out, and I don't really care exactly--I just think it's common courtesy for the person to disclose it.

He didn't, so I did for him.


It's a conflict of interest. People are more likely to share links that get them the most affiliate $$$.


Why not go with 5k 27" monitor?

http://www.dell.com/en-us/work/shop/cty/pdp/spd/dell-up2715k...

Sure, it's over $1k, but if you're making $10k/mo writing code on it, then you should be able to afford it.


> if you're making $10k/mo writing code on it

hah... if only the world was silicon valley.


I mean, even in SV after taxes is that really a common salary?


Marginal Utility. A $400 27" 4k Ultrasharp isn't 1/3rd as good as a 5k 27" Ultrasharp.


I have the same monitor and it's pretty good.

The only thing I don't quite like is the pixel size, it's a bit small for native scaling. I kind of wish i had gone for the 27". I went with two 25" because two 27" were a bit too big for me.


P2715Q user here. One of the best monitors currently available, excellent color reproduction, top of the line quality. The 4K resolution however is a major problem when you are using Windows 7. Some applications will not play correctly when you try to scale up and at the end you get a mess. Windows 10 is better, but not perfect.


Have the same monitor, it's my third Dell monitor since I bought my 2405fpw back in 2005. I've always gotten excellent performance from them and the 2405 lasted 12 years.

The 4k resolution and scaling can cause issues with some apps as you noted. This is especially true if you use multiple panels with heterogeneous resolutions, connect with displayport and occasionally turn one monitor off. This causes windows to move everything to the other display which is a mess. However after futzing with it for a few days post-purchase I had it all working well enough to deal with, even though I occasionally get slack in an over-maxed window on the smaller display. Btw this program is really useful for smoothing mouse movement between the larger and smaller displays in windows 10: https://github.com/mgth/LittleBigMouse


I looked at 4k monitors but ultimately decided against it because as soon as you start scaling then you lose the extra screen real estate.

At 4k 150% scaling you end up with the same screen space as 2560x1440.


I personally like the higher resolution that scaling provides, but if you don't mind low DPI you should check out the 42.5" 4k panel that Dell makes[0]. I have one in my home office, and while it's not great for design and photo editing, the sheer real-estate available is mind blowing.

[0] http://www.dell.com/en-us/work/shop/dell-43-ultra-hd-4k-mult...


I agree and I _almost_ didn't buy the 4k panel, but as I noted in my comment above my first Dell panel lasted 12 years, and that's a long time.


Ditto, have a pair of those. They are the nicest I've used in a long time. Haven't noticed the Windows issues, but then I only run a couple applications under win in a VM. No issues with Linux.

They've released the UP2718Q at a serious price difference. I'd like to see one in person, but a 3x price increase requires a lot of improvement over an already great monitor.


I have a LG 27UD68-P (and just ordered a second one) and have found it pretty good for programming. Here in Europe they come with a swivel stand. The only real disadvantage compared to the Dell is they don't have DisplayPort daisy-chaining, but they support HDMI 2.0 so you can use that for 4k60.


Do you have 20/20 vision?

With glasses (20/20) I can comfortably read text from up to 3 feet away from the 25" version at 1:1 scaling. Even tiny text like the HN text field box.


We're using them at work, I'm very pleased with them. And the bang/buck ratio is great.


How is the 2715H?


All modern (2014+) Dell IPS displays have excellent input lag


If you are looking for such monitors, look for gaming monitors with TN-panel, with 2ms or 4ms response time. Compared to office monitors or even TVs there is a big difference. (mind that TN-panel have pro and cons)


Much worse viewing angles is a pain.


I agree.

Which is why its hilarious whenever I ask around for recommendations, especially on Reddit, everyone tries to peg it as a non issue. As if your head never moves and you only have one monitor.


The problem is:

TN Panel:

Pro: Cheap, low latency

Con: Everything else


What happens if you have multiple monitors? i.e. a TN-panel for gaming and an IPS for graphic design?


This is pretty much my setup. Higher quality TN panels don't have that much of a dropoff from median quality IPS panels.


I will gladly believe this is a real problem, but this page does not demonstrate that (at least not convincingly); the metric used is simply too poor.

To quote:

> I used my own hacky measurement program written in C++ that sends a keypress to the application to be benchmarked. Then it waits until the character appears on screen. Virtual keypresses were sent with WinAPIs SendInput and pixels copied off screen with BitBlt.

So, this is measuring some rather artificial and fairly uninteresting time.

You really do need to measure the complete stack here, especially if your theory is that issues like vsync are at stake, because the level at which vsync happens can vary, and because there are interactions here that may matter. E.g. if there are 100 reasons to wait for vsync, and you remove one of them... you're still going to wait for vsync. It's not 100% clear that this measurement actually corresponds to anything real. Also, note that a compositor need not necessarily do anything on the CPU, so by trying read back the composited image, you may inadvertently be triggering some kind of unnecessary (or OS-dependent sync). E.g. it's conceivable that regardless of tearing on screen you want the read-back to work "transactionally", so you might imagine that a read requires additional synchronization that mere rendering might not.

And of course: all this is really complicated, there are many moving part's we're bound to overlook. It's just common sense to try to measure something that matters, to avoid whole classes of systemic error.

Ideally, you'd measure from real keypress upto visible light; but at the very least you'd want to measure from some software signal such as SendInput up to an hdmi output (...and using the same hardware and as similar as possible software). Because at the very least that captures the whole output stack, which he's interested in.

Another advantage of a whole-stack measurement is that it puts things into perspective: say the additional latency is 8ms; then it's probably relevant to know at least roughly how large the latency overall is.


The time measured is artificial but not necessarily uninteresting. You can't measure the true end to end latency this way, but you can compare different user applications to see how much latency they add. It's not perfect but it is the best available way to measure latency of arbitrary applications without extra hardware.

I built a browser latency benchmark based on a similar method: https://google.github.io/latency-benchmark/

I did plenty of testing and found that the measurements obtained this way do correlate with true hardware based latency measurement.

All that said, it is a travesty that modern OSes and hardware platforms do not provide the appropriate APIs to measure latency accurately. A lot of what is known about latency at low levels of the stack is thrown out before you get to the APIs available to applications.


I'm sure they correlate: but the thing with correlation is that you can have correlation even when there are whole classes of situations where the relationship doesn't hold.

On the same os +drivers I'd be willing to believe that this measure is almost certainly useful (even there, it's not 100%). But it's exactly the kind of thing that a different way of, say... compositing... might the implementation to work a little differently, such that you're comparing apples to oranges. If BitBlt simply gets access a little earlier or later in the same pipeline and hey presto: you've got a difference in software that is meaningless in reality.


Ideally, you'd measure from real keypress upto visible light;

This, I think. When measuring latencies for e.g. reaction time experiments for neurophysiology I'd always use a photodiode or similar to figure out when things actually get on the screen as that's the only thing which matters. IIRC this was always with V-Sync on, but still with custom DirectX applications it was impossible to get lower than 2 frames of latency on Windows 7. So it has always been a bit mistery to me how these articles talk about latency of keyboard to screen of < 1 actual frame. Maybe this explains it, i.e. they're not measuring actual latency from keyboard to screen? I just don't know enough about graphics but even with V-Sync off I'm not sure you can just tell the GPU 'hey here's a pixel now put it on screen right now'?


Just another point of data with compton compositor on Arch with several text editors. https://news.ycombinator.com/item?id=14800713


Once you throw in another 15+ms of keyboard and USB latency...


> Don’t you find it a bit funny that Windows 95 is actually snappier than Windows 10? It’s really a shame that response times in modern computers are visibly worse than those in twenty years ago.

My Amiga A1200 (Only have extra RAM) feels faster and more responsive that any modern computer with Windows or GNU/Linux desktop.


Layers of abstraction take you further away from the metal. The more layers of abstraction your keypress must traverse before rendering is complete and the photons have reached your retina, the longer it will be until that happens.

Layers of abstraction make complex tasks more reachable by a larger number of programmers by reducing the amount of specialist knowledge about those lower layers required to do the job. The more layers of abstraction you have, the less they have to worry about lower levels and can just get on with what they want to do.


Layers of abstraction make complex tasks more reachable by a larger number of programmers by reducing the amount of specialist knowledge about those lower layers required to do the job. The more layers of abstraction you have, the less they have to worry about lower levels and can just get on with what they want to do.

Sometimes this is certainly true, but I think when it comes to abstraction layers, people tend to overestimate the benefits and underestimate the costs. Lately I've been thinking a lot about Joel's Law of Leaky Abstractions.

20 years ago we had native UI toolkits that were fast and responsive, relatively simple to program against, and yielded a common design language that was shared across all the apps on a given OS. Now we have monstrosities like Electron that are bulky and slow, yield non-native UIs across the board, and require programmers to understand a whole mess of technologies and frameworks to use effectively.

I mean, sure, now you don't have to rewrite your web code to build a desktop app, but don't get me started on the utter quagmire that is modern web development. These days it feels like software development has an infection, and instead of carving out the infection and letting it heal, we just keep piling more and more bandaids on top of it.


Absolutely, the amount of technologies often required for full stack web development is astounding. Even for a smallish website, you can be quickly staring at 30+ essential pieces of technology you need to at least understand somewhat. Several languages, package managers, IDEs, frameworks, a database, debuggers, compilers, project management tools, virtual machines, server software, your OS, supporting protocols (SSL, TCP/IP, SSH, DNS etc.) and the list goes on. Who can truly understand all of that simultaneously? All of that bloat just massively increases the chance of errors and inefficiencies.


Well said...

It wouldn't be so bad if the many layers of abstractions actually resulted in increased productivity, but they clearly don't.

Simplicity >> abstractions.


Numerous games, including those having complex graphics and behavior, can render 120+ frames per second and realtime interactions (physics, optics, reactions) on pretty average hardware. I don’t think that game scripters who make final things like scenery or ui face complexity much harder than those in gtk/qt/wpf/htmljs widget programming. Details would be interesting though, since I’m no game developer.

If true, it must be something very wrong with traditional ui systems?

Edit: my apologies, I didn’t read the article first and thought it measured complete feedback like “press ctrl-f and wait for element to popup”, but I’m interested in my question regarding games anyway.


There isn't much of a difference between a GUI toolkit you'd find in a desktop application and the GUI framework you'd see in a game - the most likely difference will be that the game GUI will be redrawn every frame whereas the desktop GUI wont (and there are game GUI frameworks that cache their output to avoid redrawing the entire widget tree every frame).

The difference when it comes to why games can be snappier is that games are "allowed" to bypass most of the layers and cruft that exists between the user and the hardware, including the compositor that the linked article is talking about (in Windows at least).

Fortunately in Linux with Xorg you can get stuff on screen as fast as the code can ask for it, as long as you are not using a compositor (so you can even play games in a window with no additional lag!).

Hopefully the Wayland mania wont get to kill Xorg since yet another issue Wayland has and X11 doesn't is that with Wayland you are forced in a composition model.


The funny thing is that there is no technical reason at all for compositing to have worse latency, even for games.

Think about the actual operations that are involved. You certainly never want to render directly into the front-buffer (you'd end up scanning out partially rendered scenes). So you render to a back buffer. Which you then blit to the front buffer (assuming you're in windowed mode; in full-screen mode the compositor goes out of the way anyway and lets you just flip).

The only difference between the various modes of operation is who does that final blit. In plain X, it's the X server. In Wayland, it's the Wayland compositor. In X with a compositor, it's the compositor.

Now granted, some compositors might be silly and re-composite the whole screen each frame, but clearly that can be avoided in most cases.

Depending on the setup, there can also be some issues with the scheduling of the game's jobs vs. the compositor's jobs on the GPU. Valve are working on this currently for VR, since the problem is much more noticeable there -- clearly it can be fixed one way or another (on Radeon GPUs you could do a blit via asynchronous compute if need be, for example), but note that compositing actually doesn't change this issue (since the X server's jobs also need to compete with the game's jobs for scheduling time).

So if compositing has worse latency, it's because nobody has cared enough to polish the critical paths. Conversely, compositing clearly does have advantages in overall image quality. So why not fix the (entirely fixable) technical problems with compositing?


There is a very good practical reason why the compositor is in no position to fix that even if it could theoretically be possible.

A major source for the compositor latency (or actually, the increased response time you get with a compositor) is that the "render to back buffer" (ie. compositor's texture, at the best case) and the "blit to the front buffer" (which happens by the compositor by drawing the window geometry) do not happen at the same time.

From a technical perspective it is perfectly possible for a compositor to create a tight integration between a program and the compositor itself: simply synchronize the program's updates with the compositor updates. Every time the program says "i'm done drawing" (either via an explicit notification to the compositor, via glXSwapBuffers or whatever), issue a screen update.

The problem however here is the compositor has to take into account multiple windows from multiple programs so you cannot have a single window dictating the compositor updates. Imagine for example two windows with animations running, one at 100fps and another at 130fps. Depending which window is active (assuming that the compositor syncs itself with the active window), it would affect the perception of other window's updates (since what the user will see will be at the rate of the foreground window's update rate). Moreover, beyond just the perception, it will also affect the other windows' animation loops themselves - if a background window finishes drawing itself and notifies the compositor while the compositor is in the middle of an update, the background window will have to wait until the update is finished - thus having the foreground window indirectly also affect the animation loops of the background windows. This can be avoided through triple buffering, but that introduces an extra frame of latency - at least for background windows.

So to avoid the above problems, what all compositors do is to decouple window update notifications from screen updates and instead perform the screen updates at some predefined interval - usually the monitor refresh rate and they do the updates synchronized to it. However that creates the increased response time you get with the compositor being on a few milliseconds behind the user's actions, with the most common example would be window manipulation like resizing and moving windows lagging behind the mouse cursor (which is drawn by the GPU directly, thus bypassing the compositor).

Hence the linked article recommending a 144Hz monitor to avoid this, although this is just a workaround that makes the problem less visible but doesn't really solve it.


This "do not happen at the same time" is true in plain Xorg as well, though, since the final blit to the screen happens in the X server and not in the application.

Your example of 100fps vs. 130fps on the same screen is inherently unsolvable in a proper way with anything less than a 1300fps display. So you have a bunch of tradeoffs, and I'm sorry to say that if the tradeoff you prefer is tearing, you're in the losing minority by far.

That said, if you truly wanted to write a tearing Wayland compositor, you could easily do so, and in any case plain X is still going to work as well.


Without a compositor, when you ask to draw a line, a rectangle, a circle or even a bitmap, it is drawn immediately. Sure, it isn't done in zero time, there is some latency, but that is the case with any graphics system :-).

As for the compositor, it isn't impossible to create a Wayland "compositor" that draws directly on the front buffer either, it is just harder and pointless since Xorg exists :-P.

But yeah, if everyone abandons Xorg (and by everyone i mean Everyone, not just the popular kids) and nobody forks it (which i doubt it'll happen as there are a ton of people who dislike Wayland) and nobody else steps up to do something about it, then yeah, i'll most likely just make my own suckless Wayland compositor. I'd prefer the world to stay sane though so i can continue doing other stuff :-P.


The reason that apps on Windows can bypass the compositor is that you can use them as the scanout buffer directly on fullscreen. On Linux (both Xorg and Wayland), this same exact behavior is supported with a compositor. For strange legacy reasons, it's known as "fullscreen unredirection". If you're running windowed on all three OSes, you see the same compositor latency.


Note that on Linux you are not forced to use a compositor, personally i do not use one and so i do not have any such penalty when running games in windowed mode.


> the most likely difference will be that the game GUI will be redrawn every frame whereas the desktop GUI wont (and there are game GUI frameworks that cache their output to avoid redrawing the entire widget tree every frame).

Modern UI toolkits (WPF, QML, JavaFX) operate on a scene graph, so they work exactly the same. Android is slowly catching up; it's a disgusting mix of the worst of both worlds.


I'll take a frame of latency and Wayland's not-completely-insane model over X any day.


yet strangely, wayland removes the frames-of-latency known as 'remote windowing' in it's quest for 'improvement'..


Because it vastly simplified many things, and oh look, compositors have managed to add remoting back in somehow without it!

I mean christ, windows has no support for remote windowing in it's API and still has RDP, even for single windows.

Linux people just love to complain about any change that actually makes the system better because then they can't feel quite as elitist for using it.


> can render 120+ frames per second and realtime interactions (physics, optics, reactions) on pretty average hardware

Yes, but when you consider that the underlying "average" hardware of 2017 has a million times as many transistors and runs a thousand times as fast as the Amiga that seems less impressive.


It's a lot easier to optimize the APIs underlying those scripts because there are far fewer of them in even the most complicated video games than there are comparable abstractions in modern OSs. And there's more motivation. People accept the slightly lower responsiveness in normal OS interactions whereas even millisecond delays in competitive games are intolerable.


That said, whether hundreds of ms delays in OS are tolerable or not is a question on topic itself.


They would have problems rendering text though. I'm working on a text editor and made some research on the fastest way to render text. It's really hard to beat the OS API.


Good point, I also touched pango-level text rendering and can remember how long some layout calculations may take. Do things get better with DirectWrite/2D or is it just a facade to old techniques incompatible with game environments?

Edit: I also like how OSX go-fullscreen animation is done. They render new window once (with e.g. lots of text) completely in background and simply scale old window to fullscreen with alpha-transition between the two. First few frames give enough time for new window to be rendered and then it magically appears as being live-resized. I suspect few users actually notice the trick.


Cost/value. AAA games with "complex graphics" take years to develop, cost millions of dollars to produce, require dozens of developers, are extremely power hungry, and require specialized GPU programming to make look good and render fast. They are typically judged by how fast/smoothly they perform, so it makes sense to direct resources to this. This isn't an approach most people want to take for average mobile or desktop GUI apps.


Every realtime videogame, from AAA shooters built by hundreds of developers to tiny one man band indie platformers, is more responsive than the average desktop app. This happens because if the controls don't respond well, the game is automatically bad. Sadly this isn't the case with desktop apps


I think you severely underestimate the difficulty of making a game engine.


I was not talking about making a low-level engine; I only theoretized that gamemaking is somewhat as hard as modern ui at the top level where you script it and “draw” 3d/2d ui or interaction parts. For fair comparison on that scale game engines should correspond to at least font and vector rendering like pango/ft or cairo, or even direct blit ops, not to widget positioning. For one example, it is pretty easy to take unity3d and make a 9pool game — it is just ~two hour tutorial on youtube for people with no cg background at all.


Layers of abstraction aren't necessarily a problem if the designers care about latency. Oculus have invested huge resources in minimising latency, but most designers are fairly tolerant of latency if they can trade it off for increased throughput or lower development costs.

We saw this particularly strongly with Android - circa 2011, Google realised that Android latency was having a major impact on UX, so they invested the resources to address it. Unfortunately, early architectural decisions meant that they never quite caught up with Apple, who had prioritised latency in iOS from the outset.


Consequently, we have more and more incompetent people writing software with absolutely no concerns about performance.


I was seriously impressed with how well BeOS performed too.

I've got a Blizzard 1230-IV with 128mb of RAM bunged into my A1200 at the moment and it runs rings around my desktop running Gnome.


Haiku has the same rendering model, so it's equally snappy here. :)


> "or GNU/Linux desktop"

I believe Wayland has made some latency improvements.


Wayland uses a composition model which is inherently slower than having direct access to the front buffer.


Not necessarily: video output is ultimately limited by the display. If the display runs at, say, 60Hz and both drawing on an off-screen buffer and compositing together take up less than ~16ms, the result should be exactly the same as drawing directly on the front buffer.

The main problem is that modern GPU rendering is “pipelined”, so it's entirely possible to have a drawing operation that takes 16ms and a compositing operation that also takes 16ms, and still have your application running at 60FPS, albeit 1 frame "behind" the input. Most developers are not aware of that. (Including me, until recently. I learned about this while trying to figure out why my VR application felt "wobbly", despite running at the recommended 90FPS) The HTC Vive ships with a really neat tool for visualizing that: http://i.imgur.com/vqp01xn.png


This assumes you are synchronizing the updates with the monitor's refresh cycle, however if you aren't (and the major reason you see lag in compositors is because they do such synchronizations) then composition is indeed slower since it involves several more moving parts and the need to orchestrate the refresh of multiple windows (as opposed to the instant "i want to draw on the screen now" model that X11/Xorg without a compositor and Windows without DWM use).

I give a few more details here:

https://news.ycombinator.com/item?id=15748880


Yeah, having to synchronize multiple windows is probably a pain. I guess that's a much smaller issue with a VR application (the OpenVR compositor supports overlays, but they're not used that often, and there's a clear priority to the "main" VR content)

I guess a valid approach would be to double buffer all windows on the compositor side, and render the "stale" buffer of any window that fails to update within a specified frame budget (16ms - expected compositing time), that way at least well behaved apps would have no noticeable latency. There would probably need to be some level of coordination with apps that already do their own double buffering, not sure how that's currently handled. Perhaps a hybrid approach between compositing and direct rendering is also possible, where different screen regions get composited at different frame rates. (Should work as long as there's no transparency involved)


Compositors already do that, you render into a compositor managed texture and the compositor simply uses whatever is there so applications can update at their own leisure.


... and when you give people direct access to the front buffer, they write code that tears or generally scans out incomplete renders and users end up blaming the operating system.

Compositing is a good thing, and in the vast majority of cases its latency isn't actually intrinsically higher than writing directly to the front buffer. Certainly its intrinsic latency is never higher than writing directly to the front buffer if you build a system without visual artifacts. (Because at the end of the day, all compositing does is shift around who does the job of putting things on the front buffer; the jobs themselves stay the same for all practical purposes.)


But i want the tearing, or at least i prefer it to the latency that compositors impose! This is why compositors must not be forced and instead be a user option. I do not see why i have to suffer a subpar computing experience because of some clueless users.

I even force vsync off system-wide where possible (that is in Windows, in Linux i haven't seen such an option and even in Windows DWM ignores the setting).


And then Gnome goes and does it one "better" by hooking the mouse pointer directly to the main render loop...


Except that the Amiga didn't have any memory protection, something not really possible when connected to Internet.

A more interesting comparison is BeOS: it had memory protection and it was (probably still is) way more responsive than Linux and Windows. Thad said with a SSD a computer responsiveness is good enough and if display latency bother you, buy a monitor with an high refresh rate! You'll have a fix much sooner than waiting for a software fix of the issue..


A fun experience some years back: running Wine on OSX and browsing the file system with explorer was a lot faster than with OSX:s own finder.


open a hard drive folder, wait until it draws one icon at a time and get back to me



I recently measured it with my phone's camera in slow motion mode. The system is an AMD Ryzen 1800X with AMD R9 280x GPU, KDE Plasma with KWin window manager in compositing mode. Key press to screen output latency was ~33 millseconds (90 fps recording, so increments of 11) in KWrite. The computer feels plenty responsive with that latency, and I hate latency...

It is a full stack real world result - for comparison purposes it makes sense to measure only the software as in the article, but in reality you want to optimize everything. Especially screens can be quite bad - tens of milliseconds, up to 100 in the worst. USB lag is usually quite low - when I measured it once for low-latency serial comm it was usually < 2 ms.


33 ms is two frames, if your monitor is at 60 Hz. If you tried vscode or other electron app, it might be 49 ms (3 frames). These are the numbers I'm getting from 1900X with Nvidia 1080 GPU, Gnome3, 4k@60hz, but without measuring latency of the keyboard itself.

Modern keyboards are another part of the problem. They can also take their sweet time since keypress until packet appears at the USB bus. See https://danluu.com/keyboard-latency/


Modern motherboards often still have a PS/2 port! And most USB keyboards still support PS/2, a passive adapter works great.


The problem is often in keyboard controller, not in the interface. Apple managed to get fastest keyboard with only 15ms lag; other may be order of magnitude slower.


Could someone explain to me why on earth 15ms of lag for a key press is considered good? It is a switch for gods sake. It should be near instant.


The linked article explains it (TLDR: key travel time, scanning keyboard matrix, debouncing).


It’s a common myth that denouncing needs to meaningfully affect latency. It does not. It will affect maximum repeat rate, but you can pretty much report an event the moment you see an edge.


A keyboard doesn't need to implement a scanning matrix. It could hook up all keys individually to a an own IO-Port.


Requires a bigger chip (100+ IO pins) and more complex wiring diagrams than most inexpensive keyboard makers are willing to invest.


Replace with _any_ keyboard manufacture.. Similarly, while your keyboard may advertise USB2 or even USB3, the actual key-press USB interface is always running at USB 1 low speed (1.5Mbit).

I spent a fair amount of time trying to find a keyboard to work on a device that I have that only works with high speed devices, 30 or 40 keyboards later I gave up... If someone actually knows of such a thing I would be interested. Same basic problem with mice.. I guess the thought-process is that hey USB2 supports split transactions, and the keyboard/mouse won't actually generate even 1.5M bit of data, so we are going to continue to sell the same basic mouse/keyboard interfaces we were selling 20 years ago wrapped in styling variations.

PS: Some of the physical button keyboards I found with configurable colors/etc, usb hubs, do support USB3... For the color controls, or hub. The keyboard endpoint is still at low speed...


Likely because they want to implement the minimum necessary HID spec (or, in a nicer tone, the HID spec with the most compatibility), which would be the one supported by the BIOS.


I don't understand why someone hasn't come out with a dedicated keyboard chip yet. If it's cheap enough you don't need to run all the keys to a single chip, you could have multiple chips that all talk over a serial bus to one that it designated the master.


Another layer of serial interfaces would make it even slower.


A serial interface can run a lot faster than anyone can type, eg i2c high speed is max 3.2Mbit/s.

SPI can be way faster than that I believe.


true, but adding another link (microcontroller) in the chain is going to add delay anyway. I think the original suggestion was pointless: instead of adding more microcontrollers you can just replace the main uC with the one that has enough pins. The reason this is not done is uCs with >100 pins are usually more powerful and expensive, so you can't just pay for more pins -- you also have to pay for more processing power and features you don't need.


Depends on the speed/latency of the interface. Although you could probably say the same about the scanning matrix too.


> Could someone explain to me why on earth 15ms of lag for a key press is considered good? It is a switch for gods sake. It should be near instant.

I don't claim that 15ms of lag is to be considered good, but the problem that one has to solve is debouncing the switches.


Denouncing shouldn’t add lag. On the first closure detection you can send the key down code. You then need some debounce logic to decide when to send the key up code, but after the key is solidly down you are again in a state to send the key up code as soon as the up begins.

The only time there should be lag is when a very short keypress happens, the key up might be delayed while the controller rules out bounce.


For reference, on a 2012 MacBook Air the numbers are 18-20ms in either a regular app or VSCode.


I'm getting min 18.5-avg 24.3-max 35.2 ms in vscode on 2015 rMBP. However, the test won't finish and Typometer complains ("Previously undetected block cursor found"). In Emacs, it won't run at all.


How are you measuring from key activation?


Yeah, that seems to be a weak point of some visual measurements, especially when laptop and other scissor keyboards score better than older switch-based ones.


I don't - just filmed keyboard and screen together; key down is easy to see. I have a quality keyboard (Fujitsu KBPC PX eco) connected by PS/2, and as stated above I'd expect little extra latency from USB. As measured, anyway, there is no space for significant keyboard lag in the result. The reason why I measured latency was that I seemed to notice a change after changing GPU driver kernel options. End to end was easiest to measure. The result was close to the theoretical minimum so I stopped there.


A recent iphone, the samsung s8 and the pixel phones can record at 240fps, which gives you much better precision if you have access to that kind of phone.


Someone recently gave me an old PowerBook G3, running Mac OS 8.6. I was amazed by how responsive the UI is compared to today's UIs, from Mac to Windows to iOS to Android. When I clicked something, it felt like there was a pushrod between the mouse button and the menu, which triggered it instantly.


Well compositing was introduced in 10.2, I'm not sure if running Classic Mac OS was an advantage in this case


Did you use OS X prior to 10.2? I guarantee you it was slower and worse in every way. Especially compared to classic. There is a reason they continued installing OS 9 side-by-side before 10.2.


OS X did compositing on the cpu before Quartz Extreme in 10.2


> Virtual keypresses were sent with WinAPIs SendInput and pixels copied off screen with BitBlt.

This methodology alone could account for the differences in timing between Win7 and Win10. For all we know, Win10 could just be slower at getting the pixels back to the program from BitBlt, or SendInput could be slower triggering events, or a multitude of other issues.

The best way to truly detect key-to-photon latency is with an external video recorder that has both the screen and keyboard in frame. Grant a few ms of noise for key travel distance.


I'd be curious to see message traces and UMAPC (user mode asynchronous procedure calls) traces of this between the two. My hypothesis is that Win10 does quite a bit more in UMAPCs than win7 does in the interests of keeping the system 'responsive' even at the costs of latency. For those not aware UMAPCs only run when a thread is available in an 'alertable' state (see MSDN as that's not exactly simple to explain https://msdn.microsoft.com/en-us/library/windows/desktop/ms6... ), as such they tend to wait for input or other runtime idle points unless the application makes very heavy use of windows built in asynchronous methods and alertable waits.

I would also be curious to compare a D2D application versus a GDI application; as the majority of the work has gone to D2D in the last few years. Please note that D2D application in this case means one using a swap chain and device not an ID2DHWNDRenderTarget (this rasterizes and composites on the GPU but has GDI compatibility built in).


As I understand it:

- Compositing is done on the GPU

- BitBlt is done on the CPU

- Copies from GPU -> CPU are slow

So, yeah, compositing adds a frame of VSync latency, but these measurements are complete bunkum.


BitBlt is done in DMA ram by the CPU, so it may be even worse than just a copy as there is likely a wait involved too to prevent shared access. Using DMA ram prohibits the GPU/Driver from doing optimizations on that ram that it could do if the buffer was in dedicated gpu-ram. This is why DX12 resources are generally always copied into non-shared buffers.


That's why I prefer to play games in fullscreen as opposed to "borderless windowed", I have noticed quite a bit of input lag in the latter mode.


This is because fullscreen mode allows the use of something called "DirectDraw Exclusive mode" which bypasses Windows for making calls to the GPU.

https://msdn.microsoft.com/en-us/library/windows/desktop/dd3...


I wonder if it would be feasible to use this to reduce input lag in an editor?


Yes, but you lose access to any user interface components you don't paint yourself, and you can eat some lag/flickering when switching out of the app as control is returned to the compositor. If you end up needing to show the Open File picker from the OS or pop up the Print dialog you'll need to exit exclusive mode.


It depends on the game for me, usually I prefer borderless windowed because I tend to alt-tab out a lot.


Me too. It's just so much quicker to alt-tab to my browser or Telegram or iTunes, even on my beefy rig it takes multiple seconds for my desktop to take control again.

In my experience the vast majority of games that I play don't suffer any perceptible ill effect from running in borderless fullscreen. If you're playing the game competitively it's another story, but for me running around shooting aliens in Destiny 2 or something I haven't noticed any degradation of my experience. There's of course the odd (usually older) game that doesn't support borderless fullscreen, but sometimes there are mods to support it.


This article is about Windows, but I wonder how Wayland on Linux measures up.


It should be possible to reduce the latency quite a bit if a Wayland compositor had that goal in mind. It's something we're working on for Sway. Sometimes you have to choose beween (1) render correctly and (2) respond immediately to user feedback. When resizing windows for example, we can scale the old buffer up (stretching it) while we wait for a new buffer from the client, or we can wait to give you that feedback until the client has prepared a new buffer at the right size.


Me too, since devs like those of libinput are specifically and actively addressing such latencies.


Wayland does everything with compositing. The Wayland people love v-sync because they hate tearing. So chances are that this effect applies...


And then Gnome goes and builds on that by hooking the mouse pointer up to the redraw...


I would like to see end-to-end measurements (=high speed camera footage analysis) before making final conclusions. Not saying that compositing doesn't add latency, but I feel like the system is so complex that this sort of userspace software measurement might not tell the whole story


My work laptop (the only Windows computer I use) runs Windows 7, and I intend to keep it that way as long as Windows 7 still gets updates. This article just confirms my bias, and I freely admit I am biased. I do not like Windows very much to begin with, but as far as Windows goes, I think Windows 7 ____ing nailed it (for people without touchscreens, anyway).

On a related note, I have noticed that Outlook 2013 exhibits a notable lag between a keystroke and a character appearing in the message window. I have not done any measurements, but my best guess is that it is in the order of hundreds of milliseconds. If you type fast (I like to think that I do), Outlook can keep up throughput-wise, but this lag is terribly annoying.


> On a related note, I have noticed that Outlook 2013 exhibits a notable lag between a keystroke and a character appearing in the message window.

Try switching to text-only mails, no zoom... and if you must write HTML mails, do not have an image that is larger size than the window. As soon as there is an image that doesn't fit into the window at 100% zoom Outlook begins to crawl.


My work computer is stuck with Office 2007, because I have Office 2007 Professional, and I need Access about once per year for an arcane reason. Office Pro is fairly expensive, so for the time being, I am stuck with 2007. I am still not entirely sure if I should be happy or sad about it. ;-) But I have used Outlook 2013 on coworkers' computers every now and then, and it was pretty laggy.

These days, I do a lot more programming than sysadmin'ning and help desk, but when I was the IT support guy at our company, my overall impression of Office 2013 was not very good. I have seen it just stop working on a handful of computers (out of about 75-80, so to that is a lot), in such a way that I could only "fix" it by uninstalling and reinstalling Office from scratch. On one of our CAD workstations, Outlook and Autodesk Inventor started a feud where an update to MS Office caused Inventor to crash, and the subsequent reinstallation of Inventor caused Outlook to crash when we tried to write an email. (Then we reinstalled Office, and then suddenly things worked magically, so I remain clueless as to what happened.) The latter may be Autodesk's fault as much as Microsoft's (I get the vague impression that they care even less about their software crashing than Microsoft, as long as the license is paid for). But the impression I get is that MS Office has suffered quite a bit over the years. Therefore I am not entirely unhappy about being stuck on Office 2007. I do miss OneNote, the one program from their Office suite I really like, but I have org-mode, so I can manage. ;-)

EDIT: Sorry for venting, that one has been building up for a long time.


You maybe have that smooth typing animation enabled? It can be turned off via the registry, google for it.


The difference in latency is only noticeable in Windows 7 with DWM disabled though, is that your current setting?


I always thought I was the only one noticing this. With Compositing enabled, both with DWM and on GNU/Linux, the whole interaction seems to become "soft" instead of the raw that feels much nicer and snappy. From my experience it also has to do with passing through the stack to the GPU when compositing, running it all from the CPU is what makes it feel snappy.

I've also been researching about removing the triple buffer vsync on W10. It seems it was possible in the first builds by replacing some system files, but that option is gone now with the recent big releases.

As of that, I do not see the real reason why compositing would be needed on W10, as transparency and etc arent important factors.


Makes me think of the "smooth scrolling" option that you can find in most web browsers. Never liked that, and first thing i hunt down after a new install.

This because using it feels like scrolling through molasses for whatever reason.


> Actually, I don’t know why a compositing window manager should enforce V-Sync anyway? Obviously you get screen tearing without it but the option should still be there for those who want it.

Every additional option (especially in the realm of video settings) opens the door for additional complexity, implementation error, and user error in unintentionally setting the undesired mode. It's perfectly understandable why window managers would settle on one or the other of two extremely different render-to-screen approaches, especially when general consensus for quite some time now in the graphics space has been that minimizing the potential for tearing is preferable.


> your keyboard is already slower than you might expect.

An extract from the linked article:[0]

> A major source of latency is key travel time. It’s not a coincidence that the quickest keyboard measured also has the shortest key travel distance by a large margin.

They're not measuring from when the signal is sent from the keyboard, they're measuring from when the force begins to apply on the key. If you have a clicky or tactile switch (Cherry MX Blues, Greens, Browns, Clears, etc) then the latency measured here will be way disproportionate to how it actually /feels/.

[0]: https://danluu.com/keyboard-latency/


further reading:

https://pavelfatin.com/typing-with-pleasure/

https://danluu.com/keyboard-latency/

In Windows 7, classic mode disabled the DWM compositor and V-Sync. It's incredibly dumb that Microsoft would arbitrarily remove that feature in Windows 10 to push their ugly as sin post-metro UI.


The DWM compositor is bad for games too. The only way to take it out of the equation is to use your GPU in exclusive mode.


According to this: https://www.youtube.com/watch?v=BTURkjYJ_uk

Firefox's servo engine can compose CSS elements/Display List together at 500 frames / second.

Maybe next version of Windows / Linux desktop should use FF's servo engine?


Your application renders a frame then the compositor gets it and does its transformations, if any. The composited result is then rendered to the screen thus adding one frame of latency. That's why the article says one solution would be to get a 144Hz monitor, it would reduce the time between frames so an extra frame of latency wouldn't be as bad.

You could potentially reduce this delay as well by having the application and the compositor in communication. Since rendering is going to be synced to vblank if you can get the application to not try to sync as well and instead just notify the compositor when it is done drawing a frame you could potentially get the application drawing and the compositor drawing in the same vblank interval. This is what Wayland and DRI3 Present let you do in the Linux world, I assume Windows has something similar but you'd need to opt-in to it so I bet nothing uses it.


That's throughput, though, not latency. I would guess that servo's input latency is significantly more than 2ms, even if their throughput for certain rendering operations is 500 fps.


"Money can buy bandwidth. Latency requires bribing God."


Depends on the complexity of the html and CSS in question. To reproduce a full desktop environment, even just using Canvas, would be challenging and I highly doubt it would allow 500fps.


Thanks for the kind words :) But that was an artificial benchmarking mode that turned off any synchronization. It couldn't actually show the picture at hundreds of FPS, because the physical hardware can only update at 60.


Don't confuse bandwidth and latency.


i've been using sway[0] as my wm for some time now (it's a sort of port of i3 to wayland) and it's incredible that you can actually tell that it is much faster than wms running on X.

[0] http://swaywm.org/


It's funny you mention this - it's only in the past few days that we've been taking this sort of thing more seriously, and our work is unreleased!


The irony is that it's most often not the fault of the DWM for the latency but the applications themself. Since DWM acts as the screens double buffer, your application needs to be synchronous with the DWM frame timing, not being in sync means latency and flickering.


No left margin on webpages is real too, and it annoys me


Are you on mobile? It centres perfectly fine for me on desktop. The linked CSS file uses this method:

    body {
    	max-width: 844px;
    	margin-left: auto;
    	margin-right: auto;
    	font-family: Verdana, Arial, Helvetica, sans-serif;
    }
I guess wrapping the whole article with a div with 0.5em margin would fix it on mobile.


Ah they are using margin:auto to center. I thought it must be an override since most user agents include a default body margin.

Yes I'm on mobile. My OnePlus 5 hides the first one or two pixels under the bezel if looking at it straight on, so the first character on each line gets a little cutoff. Not sure if this is just my model or if other phones do this also.


Either way the conclusion is the same: websites should have a minimum margin! I'm sure the author of the website is receptive to this feedback, so I sent an email.

Also, Firefox (and Safari on iOS) should have "view text-optimised version" button in the URL, maybe that would help you here? I don't know if other browsers have it though.


Is there a website that demonstrates the effects of latency after pressing a key? I know there's examples of different frame rates shown with moving circles, but I don't think that's quite the same.

I mean is there really a noticeable difference between say 20 and 40 ms?


sorry, so they had to write some code to test what they couldn't perceive but believe they can perceive? I feel like its plausible that this is partly a psychological problem?


Well, it does say

> At least I can feel the difference when typing.


I can feel people's auras. Discuss.


I suspect you know what 'feel' and 'perceive' mean, and just being quarrelsome.


I'm suggesting its in their head and they can't actually feel or perceive it.


Why would enforcing vsync add more than 1/60s of latency to anything?

This looks way more like badly designed animations than some fundamental problem coming from the hardware.


Perhaps it ends up being multiple vsync waits for a given rendered frame? Something like the application or OpenGL driver waiting for vsync before rendering into its buffer, then the compositor waiting for the next vsync before actually compositing/flipping.


This is a common source of delay in composited apps/games, yes. Ideally what you want is to have a completed frame ready for the compositor at least a few milliseconds before the next vertical sync arrives, but it's easy to screw that up, especially if you're getting fancy. Triple Buffering also enters the picture here (though mostly for games), because in the bad old days you had exactly two backbuffers, and if both were in use (one being scanned out to the monitor, the other your most recent completed frame) everything had to grind to a halt and wait before rendering or game code could continue. Triple buffering solved this by adding an extra frame, at the cost of an entire frame worth of display latency in exchange for your code spending less time spinning and waiting on the GPU. If someone is careless they could definitely end up with triple buffering enabled for their app (like if they're rendering using a media-oriented framework that turns it on.)

The 'Fast Sync' option NVIDIA added to their drivers in the last year or two is a fix for the triple buffering problem - you get spare buffers, but instead of adding a frame of latency the GPU always grabs the most recently completed frame for scanout. Of course, if a compositor is involved you now need the compositor to do this, and then for the compositor to utilize this feature when presenting a composited desktop to the GPU. I don't think any modern compositor does this at present.


I see. Locking everything until you get a vsync surely makes the code simpler.

I fill that as about as likely as hardcoding the frames of animations, and displaying them all whatever device frequency you have.


Smartphones suffer from input latency too, though I’m unsure of the underlying cause (curious how iOS handles window/view drawing). It only seems to be getting worse, though I haven’t done tests on this. While each new model undoubtedly has better tech specs, the interface responsiveness doesn’t seem to improve.


Ghetto latency test: finger-scroll alternatingly up and down very quickly, see at which frequency your finger and scroll position are 180° out of phase, i.e. you finger is up while the contents are down or vice versa. Smartphones seem to be fine according to that test. Android is very good and iOS is even better.


Hm I get about 4 up and downs per second before the scroll position is 180deg out of phase in Safari on iPhone 7+. That translates to about 125 (1000/8) ms latency?


Yes, that is how it works :) That value seems surprisingly bad. My limited experience with iDevices (don't own one) has been that the position of the stationary finger on the page vs. finger on the page while scrolling (another way to measure - unless it is specifically fudged with some kind of prediction to make scrolling feel less detached) is very small. But I can't argue with data. FWIW, I like to test Android in the scroll view of the OS settings app or the address book. Those are well implemented and presumably don't add unnecessary lag.


"The big problem with latency is that it accumulates. Once some component introduces delay somewhere in the input chain you aren’t going to get it back. That’s why it’s really important to eliminate latency where you can." - a lesson that applies to many things besides the narrow case of Windows 10.


> Don’t you find it a bit funny that Windows 95 is actually snappier than Windows 10?

Comparing them on the same hardware?


There was a similar article a while ago comparing it to an Apple II, where Apple II was snappier on its own hardware compared to a modern computer.


I might be interested in seeing that if you can find the link.

I have a transwarped IIGS, which for CPU benchmarks is slower than most of the modern emulators, but on the actual hardware its pretty amazing (particularly since it boots faster from a compaq flash card than most win10 PC's i've seen). I would guess that a USB keyboard->windows->emulator->app response->draw->Emulator->windows->gpu path is much, much longer than the IIGS keyboard poll rate draw cycle even given the 1000 cycle/sec advantage a modern PC has.


not a problem, let's find some p2 boxes on ebay.

ps: every time I boot an old desktop I always have this feeling. The new stacks are amazing, they do it all, but .. I love the immediate feeling of the old ones, even at some cost. And this comes from a compositing fetishist.


Try popping the start menu open on a Windows 10 machine with a mechanical hard drive.


It wasn't much better on Windows 95 on average hardware around the time of the release of Windows 95. Heck, if you dared click [Start] as soon as the desktop displayed (read: Windows hadn't yet finished booting) then the whole OS would hang for several minutes.


Windows 95 had a ton of "clever" tricks, like "OLE Chicken" aka shimming in fake OLE instead of loading real thing just to display desktop faster, executing anything triggered .dll load anyway, but official metric was blue desktop....

https://blogs.msdn.microsoft.com/oldnewthing/20040705-00/?p=...


If you want to have the most minimal Windows setup:

- Don't use an antivirus

- Stop unused services running in the background (e.g: services.msc)

- Turn off all vision effects including compositing and animation

Then you might want to set up a firewall to block all the nonsense like SMB, NetBIOS, etc. You can also set up a cheap old machine to act as your firewall, reverse proxy cache, antivirus/antispam, etc.

You can set up a script to turn on all the printing related services when you are actually going to use a printer.


Pick up an LTSB release and remove the desktop (custom shell).

It looks a lot like Arch / Debian with first class hardware support.


While LTSB looks good (fewer bundled apps, that's great), it is very dissimilar from Arch or Debian.

Regarding hardware compatibility, please be aware of the fact that Debian supports many more processor architectures than Windows.

Maybe Windows will support some peripherals better, but Linux has improved a lot in this respect. Chances are out of the box support for hardware is better in Linux than it is on Windows these days.


Is it really fair to compare something like Gvim to Slack though? I'm assuming Gvim is going to be magnitudes faster than an Electron app.


I wonder if any compositing window managers support FreeSync. That could help significantly, for any v-sync related latency.

Vulkan rendering would help as well.


FreeSync only helps when you're running a GPU heavy game that can't keep up with the monitor's refresh rate (dips below 60/120/144/whatever Hz). All desktop compositors definitely can and do render at your monitor's refresh rate :)

Vulkan wouldn't help much. It has less overhead (no validation in production, etc.), but GL/GLES are plenty fast for any compositing tasks. The difference might be completely negligible.


Well that certainly explains why a W7 with Classic theme feels faster than with the default Aero theme. Thanks to OP for digging!


Is this something related to the latest Windows 10 updates or it was always the case? I mentioned this because, may be unrelated, there are some slow drawings in the UI after the October major update. I can notice this when I login and the desktop is drawn.


The article mainly explores the difference with DWM (Desktop Window Manager) enabled and disabled. DWM's main role is to render all of the windows into separate buffers in memory and then "compose" them on the fly. This for example avoids the effect that dragging a window over a frozen application would result in glitches.

Since Windows 8 DWM can not be disabled. By default in W7 it is also enabled (the infamous Aero), but at least you can get rid of it.


It has been the case since Windows Vista. Prior to Windows 10 though, you could disable the desktop compositor.


WOW! I usually run about 7 emulators of android in one monitor, and my web browser and other stuff in the other monitor. things get really laggy sometimes, turning off this feature made HUGE difference. my experience has improved dramatically


the argument about vsync and framebuffers seems mistaken. vsync only prevents partial renderers, not full renders, so disabling it does not remove latency in most cases,

this article makes it sound like there is some magical way to draw directly into the buffer without it being redrawn, which is not true. the best you can get is a chance of faster drawing because you can write into the buffer is its being drawn... (and probably get some tearing)

the idea that compositing is somehow slower is also very misleading... how exactly is a stacked renderer faster?

i think the author is blaming a poor implementation on technical details that they only partly understand.


> vsync only prevents partial renderers, not full renders, so disabling it does not remove latency in most cases,

V-Sync will, as the name implies, wait until the partial draw finishes. This "wait" is what is adding the latency.

> the idea that compositing is somehow slower is also very misleading... how exactly is a stacked renderer faster?

Yes, compositing is slower because when an application wants to draw its window needs to send the image to the compositor (or acquire the handle to the backing texture for the window that the compositor uses or whatever, that is implementation details) and then the compositor will at some point later draw it together will all the other windows (almost all compositors are V-Synced, meaning that you will see at most 60Hz updates - or whatever refresh rate your monitor is running at). This adds a very noticeable delay between the program needing to update itself and the update being visible to the user.

On the other hand, without a compositor and assuming a window system based on clipping regions (like X11 and Windows without DWM) with direct to frontbuffer drawing, the application will ask the window system to prepare for drawing in the window (which usually means the window system will setup the clipping to be inside the window's visible area), then perform the drawing directly on the framebuffer and notify the window system that it is done (so that the clipping stuff can go away). Notice how nothing here waits on anything else, like v-sync (or any other interval) and how this totally ignores other windows - each window draws itself immediately when needed instead of having to orchestrate an update for all windows on the screen.

Of course with the latter approach you do get tearing since you can have windows drawing themselves during a monitor refresh, but if that is a problem or not is up to the user. Personally i never care about tearing to the point that i barely notice it, yet i immediately notice any sort of V-Sync or compositor induced lag so i always try to avoid these.


I also noticed that there's a lot more compositing latency on MacOS


How did you measure this? That's contrary to what I've seen using https://itunes.apple.com/us/app/is-it-snappy/id1219667593.


I find the exact opposite, the UI latency of macOS seems to me a lot lower latency and more 'snappy' that Windows 7/8/10, especially with Metal2 under High Sierra.


The compositor adds some latency, yes.

It also removes latency for everything else. We no longer need to suffer when apps are slow to redraw their content whenever we move windows around.


Does anybody know if DirectX and/or OpenGL also suffer from the latency? For example, would a Windows XP VM in Windows 10 have the one-frame lag?


Gnome 3 is the worst at this, especially in wayland. Moving your mouse while the compositor is busy causes input to be lost as well


"use an operating system that has a stacking window manager"

Exactly, one more reason gnu+linux is the superior os.


As the author mentions, i3wm ftw. Seriously check it out.


i3wm is indeed amazing.

Good article. Just turned the dwm off at the win7 machine at work, and it's like a free hardware upgrade! Everything is more responsive.


placebo, and as a bonus enjoy tearing videos.


100hz UWQHD monitor for work, worked for me


I am so glad I don't notice this.


The data speaks for itself but I'm trying to perceive any latency in my Firefox (win10) and unable to. Typing seems instantaneous to me. I hate the milliseconds the start menu takes to animate though. Win7 start was instant.


From my experience latency is rarely bad when it's very very consistent. After some time the brain compensates.

I do recall playing World of Tanks with a 800ms Ping. After about 3 months or so I was back to thinking there was no Ping until a problem with my ISP was fixed and I dropped to 8ms. After which I ran into walls a lot.

The brain can adapt to such things rather well, given enough time.


I agree with you about consistency but 792ms is something though!

I remember when we upgraded from a 28.8k modem to ISDN. My Quake got a lot better purely because of lower latency. A friend was left behind on modem and he could tell the difference too, I was virtually unbeatable to him after the upgrade.

Of course we both knew that it was down to an unfair advantage so it wasn't a true victory.


What I hate even more is when I press the Windows key and start typing an application's name into the Start menu textbox, and it misses the first two or three keypresses.


As a comparison point, gnome 3 does the right thing and buffers the input until it can handle it. On an old machine I usually have the full name typed out and enter pressed before any animation actually starts being displayed.


How about they disable animation when windows are activated by keyboard? Or just disable it entirely ;-)


That's weird, I don't have that problem. What build of Win10 are you running? I wonder if it depends on the kind of keyboard you're using.


I think it somehow depends on whether the OS is installed on a spinning disk or an SSD. I know it's stupid but this is my theory based on observation. Would love for someone to verify it.


I'm sure a spinning disk would make it worse, but I do have an SSD (admittedly, a 2013 model and not the fastest). To answer the grandparent, build 15063 and I'm using the laptop keyboard.

I tried a few times and I can consistently make Windows miss at least the first keypress if I haven't opened the Start menu in a few minutes.


> Typing seems instantaneous to me.

When the only source of latency is the extra ~16ms added by the composting, you likely can't feel it, but it adds up to other sources of latency that can make things feel bad more often. Such as:

-latency added by text editors depending on their quality and the amount of work they are doing (colors, intellisense, file size, etc).

-latency added by keyboards, some are worse than others

-latency added by monitors. Good ones are in the 1ms range but bad ones can be as high as 40

Sadly the latency on all of these fronts has generally been trending higher as computers have gotten more powerful.


Are there any such tests using i3/X11 vs sway/wayland? I'm curious about Linux input latency now.


The solution is simple - triple buffering


afaik dwm does use triple buffering




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: