I've been experimenting with this too, in the context of the Windows front-end for xi editor. It's absolutely true that the compositor adds a frame of latency, but I have a very different take than "turn it off."
First, it's possible to design an app around the compositor. Instead of sending a frame to the system, send a tree of layers. When updating the content or scrolling, just send a small delta to that tree. Further, instead of sending a single frame, send a small (~100ms) slice of time and attach animations so the motion can be silky smooth. In my experiments so far (which are not quite ready to be made public but hopefully soon), this gets you window resizing that almost exactly tracks the mouse (as opposing to lagging at least one frame behind), low latency, and excellent power usage.
Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface. These already exist on mobile, where power is important, but making it work for the desktop is challenging. When that happens, you get your frame back.
So I think the answer is to move forward, embrace the compositor, and solve the engineering challenges, rather than move backwards, even though adding the compositor did regress latency metrics.
The issue with sending a tree of layers is that this prevents some important optimizations: avoiding overdraw with early Z becomes impossible, because your app doesn't know anything about the positions of the scrollable layers and so has to be conservative and paint all of their contents. So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making. (Note that today, almost all apps overdraw like crazy anyway, but that should be fixed.) :)
> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.
This seems like the best solution to me. It allows apps to paint content intelligently to prevent overdraw while avoiding any latency.
It might be lots of overdraw in the general web case, but for a text editor almost everything the app renders is going to be shown on the screen. The exception is a bit of stuff just outside the scroll viewport, so it can respond instantly to scroll requests. This feels like a good tradeoff to me.
> So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making.
I would jump at the chance to make that trade. My 8 year old CPU is rarely taxed by normal usage. What's the point in having a faster computer if it feels slower?
Overdraw doesn't just use some CPU. If that would be all, it'd be an easy problem to solve. It causes extra data to be transferred over various buses, extra stops in a pipeline of operations that is already cramped trying to provide a frame every few ms, and probably other problems that I don't know about. This problem isn't as easy as 'throw some more CPU at it'.
I think it’s only that complex and taxing for GUI frameworks that mix CPU and GPU rendering. Like Win32 GDI or GTK+.
For modern GUI frameworks that were designed for GPU, like MS WPF/XAML or Mozilla WebRender, overdraw just use some GPU resources, but that’s it. GPUs are designed for much more complex scenes. In-game stuff like particles, lights, volumetric effects, and foliage involve tons of overdraw. In comparison, a 2D GUI with overdraw is very light workload for the hardware.
Overdraw actually matters quite a bit for integrated GPUs, especially on HiDPI. On high-end NVIDIA or AMD GPUs, sure, 2D overdraw doesn't matter too much. But when power is a concern, you don't want to be running on the discrete GPU, so it's worth optimizing overdraw.
That seems like a dubious claim to me. In my experience maxing out memory or pcie bandwidth is rarely a bottleneck, and extremely unlikely / impossible with low CPU usage.
Do you have a link or more detail about 'already cramped pipelines'?
Depends heavily on the GPU and the scene being rendered. On Android I've seen low-end GPUs bottlenecked by the shader ALU of all things, even though the 2D renderer only produces trivial shaders. Turns out it's easier to whack off shader compute cores than it is to muck with the memory bus in some cases.
Most common situation is the GPU isn't taxed at all, though, and doesn't even leave idle clocks. We end up just being limited by GL draw call overhead.
You seem to be implying that rendering of a text editor has to be all 2d and that rendering a frame pushes the memory bandwidth to it's limits, both of which can't possibly be true. Why can games run at 144hz and above but a text editor can't afford to overdraw for decreased latency?
> Instead of sending a frame to the system, send a tree of layers.
Which API does one use to do this?
I'm slightly surprised that hardware overlays aren't already a feature, especially given that they're handled by the graphics card. I know there's a special API for video overlay, especially DRM video (where part of the requirement is that the system doesn't allow it near the compositor where it could be screenshotted). Can you do video "genlock" on Windows? (edit: https://msdn.microsoft.com/en-us/library/windows/desktop/dd7... )
I'm also wondering how VR/AR stuff handles this on Windows.
It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.
There's a bunch of stuff in the interface to support video and also integrated 3D content ("create swapchain for composition"), but I don't know how well it works. In my experiments and reading through the Chromium bug tracker, Microsoft's implementation of all this stuff is far from perfect, and it's hard, for example, to completely avoid artifacts on resizing.
> It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.
All hardware does some overlays, though not a large number. (You can see Intel GPU overlay setup at [1].) One scanout overlay is already in use on all major OS's, to draw the mouse cursor.
Typically hardware will have the base layer, a cursor layer, and possibly a video overlay. But in many cases, the video "overlay" is actually emulated by the driver using a shader to do the colorspace conversion and the texture hardware to do scaling.
People generally aren't composing other effects on top of a video overlay. Making it work in the general case requires hardware that can do all the things the compositing engine is doing in software.
The usecase mentioned in Jesse's talk (linked in my root comment) is displaying a notification from some other app while playing a game. They added the "flip_discard" swapchain effect so that the OS can paint the notification on top of the game content before flipping it in hardware. This is something of a hack; I think you're right that the endgame for this is that the hardware can indeed implement the full compositing stack. I'm not sure how far away we are from this.
In a text editor, we don't want to compose a whole frame just because a character was inserted. The situation of drawing directly into the visible frame buffer at any time we like without caring about V-sync is pretty much ideal.
If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.
> If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.
A new character won't tricker a draw call though. Rather the three characters will queue up, added to an internal buffer and when the next draw is due, they will all three be drawn.
That's how it should work. But when you look at the scatter plots, note how there are multiple bands under Windows 10. E.g. for gvim, not only is there an extra overall delay, but extra clusters of additional latency.
How are you testing the mouse cursor following the window borders on resize? I've read that windows turns off the hardware sprite mouse cursor when resizing windows so that it can software render it to always line up properly.
Visual inspection for now. I've got an Arduino and a high-speed camera, so my plan for the next step is to send mouse and keyboard events from the Arduino, blinking an LED at the same time, then capture both the LED and the monitor in the video. Then a bit of image analysis. This is the only way to be quantitative and capture all the sources of latency.
My reason being, I had adjusted the screen brightness using software - but the mouse cursor is still brighter than white anywhere else. When I dragged a window, the cursor turns dim.
Oh. This explains why the cursor flickers briefly and gets drawn partially across multiple monitors when at the split instead of the normal mouse cursor which is drawn only on one monitor at a given time.
> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.
This is also why picking a good monitor is important for software development.
Some monitors have tons of input lag (60-70ms) and that's the time it takes for you to see what you're typing to reach the display. This also includes the time it takes to see you move your mouse cursor too.
I had a hard time believing that quote at the time. John ended up answering the question I asked about it on SuperUser [1] with details about his setup. His writeup is impressive: he actually set up a high-speed camera to record the actual latency.
I used to work for Philips' TV branch, they used a similar setup. Instead of a camera they mounted a light sensor to the display and measured end-to-end delay for various color transitions (white-to-black, black-to-white, ...). All other manufacturers and many serious reviewers perform similar tests.
Of course, these tests are always performed with all built-in image processing disabled, those can easily add 2 frames of delay on modern televisions. MPEG deblocking and noise removal are performed by most TV brands to improve image quality at a cost of latency.
Right, the issue is that 60Hz is ridiculously slow. Ethernet hardware is multiple orders of magnitude faster. It's just because of broadcast TV and CRT legacy. G-Sync / FreeSync allow up to 240Hz, which is ~4ms latency.
60Hz is 16.7ms between each picture. However, that only gives you a lower bound on the latency. The actual time from GPU sending picture, to picture being on screen, can be a lot more. See the John Carmack answer above.
So, it's total response times from key press on a game controller to screen that is important not individual components. Sure, 0.04 seconds on it's own is not a big deal, but when 5 or six things take 0.04 seconds you hit noticeable delays.
this reminds me of the argument that instead of going to USB C 3.1 as the grand unifying connector for all peripherals (including video), we should have instead migrated to Ethernet cables for everything.
40 Gbps and 100 Gbps Ethernet standard was defined in 2010 [1]. Google "buy 100 Gbps Ethernet" and you will find it is real, although currently is geared primarily for datacenters.
Actually not a bad idea. Ethernet speed is one gigabit per second.
And there are high throughput devices like LIDARs (eg Velodyne LiDAR known from Google/Waymo self-driving cars) that are connected via Ethernet rather than USB.
Having 1 GBit Ethernet LAN network since 2004, I would prefer the possibility to upgrade to 10Gbit home Ethernet in near future.
It's a terrible idea. USB-3 is faster than 1Gbit/s. and there's no way you can drive a 5K monitor with any current or near future Ethernet standard, the bandwidth demands are too punishing.
Ethernet isn't optimized around short-haul signals, like computer to screen over a few metres at most, it's for 100m+ runs in datacentres. The signals have to be a lot more durable.
> there's no way you can drive a 5K monitor with any current or near future Ethernet standard, the bandwidth demands are too punishing.
Really? You need more than 100gbit for 5k monitor? Because there is an 40/100gbit ethernet phy standard right now (has been for a while, since 2011). And work is being done on up to 200gbit (due to be ratified in late 2018).
Careful, multi-mode fiber is near its scaling limits. 10GBASE-SR and 25GBASE-SR exist, but that's about it. (By contrast, 40GBASE-LR and 100GBASE-LR work over single-mode fiber, and DWDM + coherent optics can push the medium much further.) Wouldn't want to be stuck at 25 gbps forever ;-)
It's amazing how much you can bend fiber. Fiber is really hard to break just by bending it, assuming it is well insulated. You can have performance loss with bends and kinks though.
The little plastic bits always snap off. Ethernet is great for semipermanent wires but not very good for peripherals like portable hard drives that are constantly disconnected and reconnected.
Even with the protective guard the little plastic tabs inevitably break on me. I can't even say those have fared any better than the unprotected versions.
My main complaint is that the protective rubber often gets stiff as it ages, which can make it pretty dang hard to remove old cables (until you resort to pliers).
Yeah, live hinges (that is, plastic bits that intentionally bend, not rotate on pins) are always going to break eventually. They're just so easy to make that it's a trade off that doesn't fall in the consumer's favor.
Living hinges can have lifespan in the millions of cycles with the correct materials and design. I'm sure it's possible to make an Ethernet connector that doesn't break easily, but it would cost more, and people aren't willing to pay.
I plug/unplug from my display several times daily, since the MBP connected to it travels with me. Granted, the end of the cable in the monitor typically doesn't move¹
¹well, one of them. The crap LG monitors we have constantly forget about the peripherals that are attached, and one easy way to fix this is to detach/reattach the peripheral. I suspect that the company is never going to RMA these devices, sadly.
The obvious ones: Ethernet cables are big (imagine one on a phone), and aren't designed to be plugged and unplugged the massive number of times that USB cables are (spring contacts wear out, little clips break).
Indeed. I would rather take an excellent monitor, keyboard and mouse over a faster machine without them. Sadly this means I am pretty much limited to desktops as finding a laptop with a colour accurate 120Hz+ display with excellent keyboard seems to be impossible unless I look at some gaming laptop monstrosity that weights 5KG and has a 1 hour battery life. If anyone reading this know of a good laptop for development please let me know!
I can't offer you 120Hz, but the ThinkPad P51 with 4K screen has excellent colour gamut and accuracy and the best laptop keyboard in the business (with the possible exception of the old-style ThinkPad keyboard currently available only in the limited-edition "ThinkPad 25"). Battery life is pretty good.
Downsides: relatively bulky, pricey, no touchscreen or stylus input. (Upside of that: matt rather than glossy screen, which I think is difficult to combine with a touchscreen.)
It's true, but there's also a 4,000 word detailed blog post that describes how I came to pick that monitor along with giving you insights and tips on how to pick a different monitor if that one isn't for you.
It's good to know that the person posting the link has a financial stake in it. Depending on the context this can mean a lot - like a review or recommendation.
Except it's Amazon. I could understand that position if it was a Dell affiliate link. But Amazon sells so many brands of monitors, any of which could carry the same affiliate code, that I don't see how that matters.
The only thing I don't quite like is the pixel size, it's a bit small for native scaling. I kind of wish i had gone for the 27". I went with two 25" because two 27" were a bit too big for me.
P2715Q user here. One of the best monitors currently available, excellent color reproduction, top of the line quality. The 4K resolution however is a major problem when you are using Windows 7. Some applications will not play correctly when you try to scale up and at the end you get a mess. Windows 10 is better, but not perfect.
Have the same monitor, it's my third Dell monitor since I bought my 2405fpw back in 2005. I've always gotten excellent performance from them and the 2405 lasted 12 years.
The 4k resolution and scaling can cause issues with some apps as you noted. This is especially true if you use multiple panels with heterogeneous resolutions, connect with displayport and occasionally turn one monitor off. This causes windows to move everything to the other display which is a mess. However after futzing with it for a few days post-purchase I had it all working well enough to deal with, even though I occasionally get slack in an over-maxed window on the smaller display. Btw this program is really useful for smoothing mouse movement between the larger and smaller displays in windows 10: https://github.com/mgth/LittleBigMouse
I personally like the higher resolution that scaling provides, but if you don't mind low DPI you should check out the 42.5" 4k panel that Dell makes[0]. I have one in my home office, and while it's not great for design and photo editing, the sheer real-estate available is mind blowing.
Ditto, have a pair of those. They are the nicest I've used in a long time. Haven't noticed the Windows issues, but then I only run a couple applications under win in a VM. No issues with Linux.
They've released the UP2718Q at a serious price difference. I'd like to see one in person, but a 3x price increase requires a lot of improvement over an already great monitor.
I have a LG 27UD68-P (and just ordered a second one) and have found it pretty good for programming. Here in Europe they come with a swivel stand. The only real disadvantage compared to the Dell is they don't have DisplayPort daisy-chaining, but they support HDMI 2.0 so you can use that for 4k60.
With glasses (20/20) I can comfortably read text from up to 3 feet away from the 25" version at 1:1 scaling. Even tiny text like the HN text field box.
If you are looking for such monitors, look for gaming monitors with TN-panel, with 2ms or 4ms response time. Compared to office monitors or even TVs there is a big difference. (mind that TN-panel have pro and cons)
Which is why its hilarious whenever I ask around for recommendations, especially on Reddit, everyone tries to peg it as a non issue. As if your head never moves and you only have one monitor.
I will gladly believe this is a real problem, but this page does not demonstrate that (at least not convincingly); the metric used is simply too poor.
To quote:
> I used my own hacky measurement program written in C++ that sends a keypress to the application to be benchmarked. Then it waits until the character appears on screen. Virtual keypresses were sent with WinAPIs SendInput and pixels copied off screen with BitBlt.
So, this is measuring some rather artificial and fairly uninteresting time.
You really do need to measure the complete stack here, especially if your theory is that issues like vsync are at stake, because the level at which vsync happens can vary, and because there are interactions here that may matter. E.g. if there are 100 reasons to wait for vsync, and you remove one of them... you're still going to wait for vsync. It's not 100% clear that this measurement actually corresponds to anything real. Also, note that a compositor need not necessarily do anything on the CPU, so by trying read back the composited image, you may inadvertently be triggering some kind of unnecessary (or OS-dependent sync). E.g. it's conceivable that regardless of tearing on screen you want the read-back to work "transactionally", so you might imagine that a read requires additional synchronization that mere rendering might not.
And of course: all this is really complicated, there are many moving part's we're bound to overlook. It's just common sense to try to measure something that matters, to avoid whole classes of systemic error.
Ideally, you'd measure from real keypress upto visible light; but at the very least you'd want to measure from some software signal such as SendInput up to an hdmi output (...and using the same hardware and as similar as possible software). Because at the very least that captures the whole output stack, which he's interested in.
Another advantage of a whole-stack measurement is that it puts things into perspective: say the additional latency is 8ms; then it's probably relevant to know at least roughly how large the latency overall is.
The time measured is artificial but not necessarily uninteresting. You can't measure the true end to end latency this way, but you can compare different user applications to see how much latency they add. It's not perfect but it is the best available way to measure latency of arbitrary applications without extra hardware.
I did plenty of testing and found that the measurements obtained this way do correlate with true hardware based latency measurement.
All that said, it is a travesty that modern OSes and hardware platforms do not provide the appropriate APIs to measure latency accurately. A lot of what is known about latency at low levels of the stack is thrown out before you get to the APIs available to applications.
I'm sure they correlate: but the thing with correlation is that you can have correlation even when there are whole classes of situations where the relationship doesn't hold.
On the same os +drivers I'd be willing to believe that this measure is almost certainly useful (even there, it's not 100%). But it's exactly the kind of thing that a different way of, say... compositing... might the implementation to work a little differently, such that you're comparing apples to oranges. If BitBlt simply gets access a little earlier or later in the same pipeline and hey presto: you've got a difference in software that is meaningless in reality.
Ideally, you'd measure from real keypress upto visible light;
This, I think. When measuring latencies for e.g. reaction time experiments for neurophysiology I'd always use a photodiode or similar to figure out when things actually get on the screen as that's the only thing which matters. IIRC this was always with V-Sync on, but still with custom DirectX applications it was impossible to get lower than 2 frames of latency on Windows 7. So it has always been a bit mistery to me how these articles talk about latency of keyboard to screen of < 1 actual frame. Maybe this explains it, i.e. they're not measuring actual latency from keyboard to screen? I just don't know enough about graphics but even with V-Sync off I'm not sure you can just tell the GPU 'hey here's a pixel now put it on screen right now'?
> Don’t you find it a bit funny that Windows 95 is actually snappier than Windows 10? It’s really a shame that response times in modern computers are visibly worse than those in twenty years ago.
My Amiga A1200 (Only have extra RAM) feels faster and more responsive that any modern computer with Windows or GNU/Linux desktop.
Layers of abstraction take you further away from the metal. The more layers of abstraction your keypress must traverse before rendering is complete and the photons have reached your retina, the longer it will be until that happens.
Layers of abstraction make complex tasks more reachable by a larger number of programmers by reducing the amount of specialist knowledge about those lower layers required to do the job. The more layers of abstraction you have, the less they have to worry about lower levels and can just get on with what they want to do.
Layers of abstraction make complex tasks more reachable by a larger number of programmers by reducing the amount of specialist knowledge about those lower layers required to do the job. The more layers of abstraction you have, the less they have to worry about lower levels and can just get on with what they want to do.
Sometimes this is certainly true, but I think when it comes to abstraction layers, people tend to overestimate the benefits and underestimate the costs. Lately I've been thinking a lot about Joel's Law of Leaky Abstractions.
20 years ago we had native UI toolkits that were fast and responsive, relatively simple to program against, and yielded a common design language that was shared across all the apps on a given OS. Now we have monstrosities like Electron that are bulky and slow, yield non-native UIs across the board, and require programmers to understand a whole mess of technologies and frameworks to use effectively.
I mean, sure, now you don't have to rewrite your web code to build a desktop app, but don't get me started on the utter quagmire that is modern web development. These days it feels like software development has an infection, and instead of carving out the infection and letting it heal, we just keep piling more and more bandaids on top of it.
Absolutely, the amount of technologies often required for full stack web development is astounding. Even for a smallish website, you can be quickly staring at 30+ essential pieces of technology you need to at least understand somewhat. Several languages, package managers, IDEs, frameworks, a database, debuggers, compilers, project management tools, virtual machines, server software, your OS, supporting protocols (SSL, TCP/IP, SSH, DNS etc.) and the list goes on. Who can truly understand all of that simultaneously? All of that bloat just massively increases the chance of errors and inefficiencies.
Numerous games, including those having complex graphics and behavior, can render 120+ frames per second and realtime interactions (physics, optics, reactions) on pretty average hardware. I don’t think that game scripters who make final things like scenery or ui face complexity much harder than those in gtk/qt/wpf/htmljs widget programming. Details would be interesting though, since I’m no game developer.
If true, it must be something very wrong with traditional ui systems?
Edit: my apologies, I didn’t read the article first and thought it measured complete feedback like “press ctrl-f and wait for element to popup”, but I’m interested in my question regarding games anyway.
There isn't much of a difference between a GUI toolkit you'd find in a desktop application and the GUI framework you'd see in a game - the most likely difference will be that the game GUI will be redrawn every frame whereas the desktop GUI wont (and there are game GUI frameworks that cache their output to avoid redrawing the entire widget tree every frame).
The difference when it comes to why games can be snappier is that games are "allowed" to bypass most of the layers and cruft that exists between the user and the hardware, including the compositor that the linked article is talking about (in Windows at least).
Fortunately in Linux with Xorg you can get stuff on screen as fast as the code can ask for it, as long as you are not using a compositor (so you can even play games in a window with no additional lag!).
Hopefully the Wayland mania wont get to kill Xorg since yet another issue Wayland has and X11 doesn't is that with Wayland you are forced in a composition model.
The funny thing is that there is no technical reason at all for compositing to have worse latency, even for games.
Think about the actual operations that are involved. You certainly never want to render directly into the front-buffer (you'd end up scanning out partially rendered scenes). So you render to a back buffer. Which you then blit to the front buffer (assuming you're in windowed mode; in full-screen mode the compositor goes out of the way anyway and lets you just flip).
The only difference between the various modes of operation is who does that final blit. In plain X, it's the X server. In Wayland, it's the Wayland compositor. In X with a compositor, it's the compositor.
Now granted, some compositors might be silly and re-composite the whole screen each frame, but clearly that can be avoided in most cases.
Depending on the setup, there can also be some issues with the scheduling of the game's jobs vs. the compositor's jobs on the GPU. Valve are working on this currently for VR, since the problem is much more noticeable there -- clearly it can be fixed one way or another (on Radeon GPUs you could do a blit via asynchronous compute if need be, for example), but note that compositing actually doesn't change this issue (since the X server's jobs also need to compete with the game's jobs for scheduling time).
So if compositing has worse latency, it's because nobody has cared enough to polish the critical paths. Conversely, compositing clearly does have advantages in overall image quality. So why not fix the (entirely fixable) technical problems with compositing?
There is a very good practical reason why the compositor is in no position to fix that even if it could theoretically be possible.
A major source for the compositor latency (or actually, the increased response time you get with a compositor) is that the "render to back buffer" (ie. compositor's texture, at the best case) and the "blit to the front buffer" (which happens by the compositor by drawing the window geometry) do not happen at the same time.
From a technical perspective it is perfectly possible for a compositor to create a tight integration between a program and the compositor itself: simply synchronize the program's updates with the compositor updates. Every time the program says "i'm done drawing" (either via an explicit notification to the compositor, via glXSwapBuffers or whatever), issue a screen update.
The problem however here is the compositor has to take into account multiple windows from multiple programs so you cannot have a single window dictating the compositor updates. Imagine for example two windows with animations running, one at 100fps and another at 130fps. Depending which window is active (assuming that the compositor syncs itself with the active window), it would affect the perception of other window's updates (since what the user will see will be at the rate of the foreground window's update rate). Moreover, beyond just the perception, it will also affect the other windows' animation loops themselves - if a background window finishes drawing itself and notifies the compositor while the compositor is in the middle of an update, the background window will have to wait until the update is finished - thus having the foreground window indirectly also affect the animation loops of the background windows. This can be avoided through triple buffering, but that introduces an extra frame of latency - at least for background windows.
So to avoid the above problems, what all compositors do is to decouple window update notifications from screen updates and instead perform the screen updates at some predefined interval - usually the monitor refresh rate and they do the updates synchronized to it. However that creates the increased response time you get with the compositor being on a few milliseconds behind the user's actions, with the most common example would be window manipulation like resizing and moving windows lagging behind the mouse cursor (which is drawn by the GPU directly, thus bypassing the compositor).
Hence the linked article recommending a 144Hz monitor to avoid this, although this is just a workaround that makes the problem less visible but doesn't really solve it.
This "do not happen at the same time" is true in plain Xorg as well, though, since the final blit to the screen happens in the X server and not in the application.
Your example of 100fps vs. 130fps on the same screen is inherently unsolvable in a proper way with anything less than a 1300fps display. So you have a bunch of tradeoffs, and I'm sorry to say that if the tradeoff you prefer is tearing, you're in the losing minority by far.
That said, if you truly wanted to write a tearing Wayland compositor, you could easily do so, and in any case plain X is still going to work as well.
Without a compositor, when you ask to draw a line, a rectangle, a circle or even a bitmap, it is drawn immediately. Sure, it isn't done in zero time, there is some latency, but that is the case with any graphics system :-).
As for the compositor, it isn't impossible to create a Wayland "compositor" that draws directly on the front buffer either, it is just harder and pointless since Xorg exists :-P.
But yeah, if everyone abandons Xorg (and by everyone i mean Everyone, not just the popular kids) and nobody forks it (which i doubt it'll happen as there are a ton of people who dislike Wayland) and nobody else steps up to do something about it, then yeah, i'll most likely just make my own suckless Wayland compositor. I'd prefer the world to stay sane though so i can continue doing other stuff :-P.
The reason that apps on Windows can bypass the compositor is that you can use them as the scanout buffer directly on fullscreen. On Linux (both Xorg and Wayland), this same exact behavior is supported with a compositor. For strange legacy reasons, it's known as "fullscreen unredirection". If you're running windowed on all three OSes, you see the same compositor latency.
Note that on Linux you are not forced to use a compositor, personally i do not use one and so i do not have any such penalty when running games in windowed mode.
> the most likely difference will be that the game GUI will be redrawn every frame whereas the desktop GUI wont (and there are game GUI frameworks that cache their output to avoid redrawing the entire widget tree every frame).
Modern UI toolkits (WPF, QML, JavaFX) operate on a scene graph, so they work exactly the same. Android is slowly catching up; it's a disgusting mix of the worst of both worlds.
> can render 120+ frames per second and realtime interactions (physics, optics, reactions) on pretty average hardware
Yes, but when you consider that the underlying "average" hardware of 2017 has a million times as many transistors and runs a thousand times as fast as the Amiga that seems less impressive.
It's a lot easier to optimize the APIs underlying those scripts because there are far fewer of them in even the most complicated video games than there are comparable abstractions in modern OSs. And there's more motivation. People accept the slightly lower responsiveness in normal OS interactions whereas even millisecond delays in competitive games are intolerable.
They would have problems rendering text though. I'm working on a text editor and made some research on the fastest way to render text. It's really hard to beat the OS API.
Good point, I also touched pango-level text rendering and can remember how long some layout calculations may take. Do things get better with DirectWrite/2D or is it just a facade to old techniques incompatible with game environments?
Edit: I also like how OSX go-fullscreen animation is done. They render new window once (with e.g. lots of text) completely in background and simply scale old window to fullscreen with alpha-transition between the two. First few frames give enough time for new window to be rendered and then it magically appears as being live-resized. I suspect few users actually notice the trick.
Cost/value.
AAA games with "complex graphics" take years to develop, cost millions of dollars to produce, require dozens of developers, are extremely power hungry, and require specialized GPU programming to make look good and render fast.
They are typically judged by how fast/smoothly they perform, so it makes sense to direct resources to this.
This isn't an approach most people want to take for average mobile or desktop GUI apps.
Every realtime videogame, from AAA shooters built by hundreds of developers to tiny one man band indie platformers, is more responsive than the average desktop app. This happens because if the controls don't respond well, the game is automatically bad. Sadly this isn't the case with desktop apps
I was not talking about making a low-level engine; I only theoretized that gamemaking is somewhat as hard as modern ui at the top level where you script it and “draw” 3d/2d ui or interaction parts. For fair comparison on that scale game engines should correspond to at least font and vector rendering like pango/ft or cairo, or even direct blit ops, not to widget positioning. For one example, it is pretty easy to take unity3d and make a 9pool game — it is just ~two hour tutorial on youtube for people with no cg background at all.
Layers of abstraction aren't necessarily a problem if the designers care about latency. Oculus have invested huge resources in minimising latency, but most designers are fairly tolerant of latency if they can trade it off for increased throughput or lower development costs.
We saw this particularly strongly with Android - circa 2011, Google realised that Android latency was having a major impact on UX, so they invested the resources to address it. Unfortunately, early architectural decisions meant that they never quite caught up with Apple, who had prioritised latency in iOS from the outset.
Not necessarily: video output is ultimately limited by the display. If the display runs at, say, 60Hz and both drawing on an off-screen buffer and compositing together take up less than ~16ms, the result should be exactly the same as drawing directly on the front buffer.
The main problem is that modern GPU rendering is “pipelined”, so it's entirely possible to have a drawing operation that takes 16ms and a compositing operation that also takes 16ms, and still have your application running at 60FPS, albeit 1 frame "behind" the input. Most developers are not aware of that. (Including me, until recently. I learned about this while trying to figure out why my VR application felt "wobbly", despite running at the recommended 90FPS) The HTC Vive ships with a really neat tool for visualizing that: http://i.imgur.com/vqp01xn.png
This assumes you are synchronizing the updates with the monitor's refresh cycle, however if you aren't (and the major reason you see lag in compositors is because they do such synchronizations) then composition is indeed slower since it involves several more moving parts and the need to orchestrate the refresh of multiple windows (as opposed to the instant "i want to draw on the screen now" model that X11/Xorg without a compositor and Windows without DWM use).
Yeah, having to synchronize multiple windows is probably a pain. I guess that's a much smaller issue with a VR application (the OpenVR compositor supports overlays, but they're not used that often, and there's a clear priority to the "main" VR content)
I guess a valid approach would be to double buffer all windows on the compositor side, and render the "stale" buffer of any window that fails to update within a specified frame budget (16ms - expected compositing time), that way at least well behaved apps would have no noticeable latency. There would probably need to be some level of coordination with apps that already do their own double buffering, not sure how that's currently handled. Perhaps a hybrid approach between compositing and direct rendering is also possible, where different screen regions get composited at different frame rates. (Should work as long as there's no transparency involved)
Compositors already do that, you render into a compositor managed texture and the compositor simply uses whatever is there so applications can update at their own leisure.
... and when you give people direct access to the front buffer, they write code that tears or generally scans out incomplete renders and users end up blaming the operating system.
Compositing is a good thing, and in the vast majority of cases its latency isn't actually intrinsically higher than writing directly to the front buffer. Certainly its intrinsic latency is never higher than writing directly to the front buffer if you build a system without visual artifacts. (Because at the end of the day, all compositing does is shift around who does the job of putting things on the front buffer; the jobs themselves stay the same for all practical purposes.)
But i want the tearing, or at least i prefer it to the latency that compositors impose! This is why compositors must not be forced and instead be a user option. I do not see why i have to suffer a subpar computing experience because of some clueless users.
I even force vsync off system-wide where possible (that is in Windows, in Linux i haven't seen such an option and even in Windows DWM ignores the setting).
Except that the Amiga didn't have any memory protection, something not really possible when connected to Internet.
A more interesting comparison is BeOS: it had memory protection and it was (probably still is) way more responsive than Linux and Windows.
Thad said with a SSD a computer responsiveness is good enough and if display latency bother you, buy a monitor with an high refresh rate!
You'll have a fix much sooner than waiting for a software fix of the issue..
I recently measured it with my phone's camera in slow motion mode. The system is an AMD Ryzen 1800X with AMD R9 280x GPU, KDE Plasma with KWin window manager in compositing mode. Key press to screen output latency was ~33 millseconds (90 fps recording, so increments of 11) in KWrite.
The computer feels plenty responsive with that latency, and I hate latency...
It is a full stack real world result - for comparison purposes it makes sense to measure only the software as in the article, but in reality you want to optimize everything. Especially screens can be quite bad - tens of milliseconds, up to 100 in the worst. USB lag is usually quite low - when I measured it once for low-latency serial comm it was usually < 2 ms.
33 ms is two frames, if your monitor is at 60 Hz. If you tried vscode or other electron app, it might be 49 ms (3 frames). These are the numbers I'm getting from 1900X with Nvidia 1080 GPU, Gnome3, 4k@60hz, but without measuring latency of the keyboard itself.
Modern keyboards are another part of the problem. They can also take their sweet time since keypress until packet appears at the USB bus. See https://danluu.com/keyboard-latency/
The problem is often in keyboard controller, not in the interface. Apple managed to get fastest keyboard with only 15ms lag; other may be order of magnitude slower.
It’s a common myth that denouncing needs to meaningfully affect latency. It does not. It will affect maximum repeat rate, but you can pretty much report an event the moment you see an edge.
Replace with _any_ keyboard manufacture.. Similarly, while your keyboard may advertise USB2 or even USB3, the actual key-press USB interface is always running at USB 1 low speed (1.5Mbit).
I spent a fair amount of time trying to find a keyboard to work on a device that I have that only works with high speed devices, 30 or 40 keyboards later I gave up... If someone actually knows of such a thing I would be interested. Same basic problem with mice.. I guess the thought-process is that hey USB2 supports split transactions, and the keyboard/mouse won't actually generate even 1.5M bit of data, so we are going to continue to sell the same basic mouse/keyboard interfaces we were selling 20 years ago wrapped in styling variations.
PS: Some of the physical button keyboards I found with configurable colors/etc, usb hubs, do support USB3... For the color controls, or hub. The keyboard endpoint is still at low speed...
Likely because they want to implement the minimum necessary HID spec (or, in a nicer tone, the HID spec with the most compatibility), which would be the one supported by the BIOS.
I don't understand why someone hasn't come out with a dedicated keyboard chip yet. If it's cheap enough you don't need to run all the keys to a single chip, you could have multiple chips that all talk over a serial bus to one that it designated the master.
true, but adding another link (microcontroller) in the chain is going to add delay anyway. I think the original suggestion was pointless: instead of adding more microcontrollers you can just replace the main uC with the one that has enough pins. The reason this is not done is uCs with >100 pins are usually more powerful and expensive, so you can't just pay for more pins -- you also have to pay for more processing power and features you don't need.
Denouncing shouldn’t add lag. On the first closure detection you can send the key down code. You then need some debounce logic to decide when to send the key up code, but after the key is solidly down you are again in a state to send the key up code as soon as the up begins.
The only time there should be lag is when a very short keypress happens, the key up might be delayed while the controller rules out bounce.
I'm getting min 18.5-avg 24.3-max 35.2 ms in vscode on 2015 rMBP. However, the test won't finish and Typometer complains ("Previously undetected block cursor found"). In Emacs, it won't run at all.
Yeah, that seems to be a weak point of some visual measurements, especially when laptop and other scissor keyboards score better than older switch-based ones.
I don't - just filmed keyboard and screen together; key down is easy to see. I have a quality keyboard (Fujitsu KBPC PX eco) connected by PS/2, and as stated above I'd expect little extra latency from USB. As measured, anyway, there is no space for significant keyboard lag in the result.
The reason why I measured latency was that I seemed to notice a change after changing GPU driver kernel options. End to end was easiest to measure. The result was close to the theoretical minimum so I stopped there.
A recent iphone, the samsung s8 and the pixel phones can record at 240fps, which gives you much better precision if you have access to that kind of phone.
Someone recently gave me an old PowerBook G3, running Mac OS 8.6. I was amazed by how responsive the UI is compared to today's UIs, from Mac to Windows to iOS to Android. When I clicked something, it felt like there was a pushrod between the mouse button and the menu, which triggered it instantly.
Did you use OS X prior to 10.2? I guarantee you it was slower and worse in every way. Especially compared to classic. There is a reason they continued installing OS 9 side-by-side before 10.2.
> Virtual keypresses were sent with WinAPIs SendInput and pixels copied off screen with BitBlt.
This methodology alone could account for the differences in timing between Win7 and Win10. For all we know, Win10 could just be slower at getting the pixels back to the program from BitBlt, or SendInput could be slower triggering events, or a multitude of other issues.
The best way to truly detect key-to-photon latency is with an external video recorder that has both the screen and keyboard in frame. Grant a few ms of noise for key travel distance.
I'd be curious to see message traces and UMAPC (user mode asynchronous procedure calls) traces of this between the two. My hypothesis is that Win10 does quite a bit more in UMAPCs than win7 does in the interests of keeping the system 'responsive' even at the costs of latency. For those not aware UMAPCs only run when a thread is available in an 'alertable' state (see MSDN as that's not exactly simple to explain https://msdn.microsoft.com/en-us/library/windows/desktop/ms6... ), as such they tend to wait for input or other runtime idle points unless the application makes very heavy use of windows built in asynchronous methods and alertable waits.
I would also be curious to compare a D2D application versus a GDI application; as the majority of the work has gone to D2D in the last few years. Please note that D2D application in this case means one using a swap chain and device not an ID2DHWNDRenderTarget (this rasterizes and composites on the GPU but has GDI compatibility built in).
BitBlt is done in DMA ram by the CPU, so it may be even worse than just a copy as there is likely a wait involved too to prevent shared access. Using DMA ram prohibits the GPU/Driver from doing optimizations on that ram that it could do if the buffer was in dedicated gpu-ram. This is why DX12 resources are generally always copied into non-shared buffers.
Yes, but you lose access to any user interface components you don't paint yourself, and you can eat some lag/flickering when switching out of the app as control is returned to the compositor. If you end up needing to show the Open File picker from the OS or pop up the Print dialog you'll need to exit exclusive mode.
Me too. It's just so much quicker to alt-tab to my browser or Telegram or iTunes, even on my beefy rig it takes multiple seconds for my desktop to take control again.
In my experience the vast majority of games that I play don't suffer any perceptible ill effect from running in borderless fullscreen. If you're playing the game competitively it's another story, but for me running around shooting aliens in Destiny 2 or something I haven't noticed any degradation of my experience. There's of course the odd (usually older) game that doesn't support borderless fullscreen, but sometimes there are mods to support it.
It should be possible to reduce the latency quite a bit if a Wayland compositor had that goal in mind. It's something we're working on for Sway. Sometimes you have to choose beween (1) render correctly and (2) respond immediately to user feedback. When resizing windows for example, we can scale the old buffer up (stretching it) while we wait for a new buffer from the client, or we can wait to give you that feedback until the client has prepared a new buffer at the right size.
I would like to see end-to-end measurements (=high speed camera footage analysis) before making final conclusions. Not saying that compositing doesn't add latency, but I feel like the system is so complex that this sort of userspace software measurement might not tell the whole story
My work laptop (the only Windows computer I use) runs Windows 7, and I intend to keep it that way as long as Windows 7 still gets updates. This article just confirms my bias, and I freely admit I am biased. I do not like Windows very much to begin with, but as far as Windows goes, I think Windows 7 ____ing nailed it (for people without touchscreens, anyway).
On a related note, I have noticed that Outlook 2013 exhibits a notable lag between a keystroke and a character appearing in the message window. I have not done any measurements, but my best guess is that it is in the order of hundreds of milliseconds. If you type fast (I like to think that I do), Outlook can keep up throughput-wise, but this lag is terribly annoying.
> On a related note, I have noticed that Outlook 2013 exhibits a notable lag between a keystroke and a character appearing in the message window.
Try switching to text-only mails, no zoom... and if you must write HTML mails, do not have an image that is larger size than the window. As soon as there is an image that doesn't fit into the window at 100% zoom Outlook begins to crawl.
My work computer is stuck with Office 2007, because I have Office 2007 Professional, and I need Access about once per year for an arcane reason. Office Pro is fairly expensive, so for the time being, I am stuck with 2007. I am still not entirely sure if I should be happy or sad about it. ;-) But I have used Outlook 2013 on coworkers' computers every now and then, and it was pretty laggy.
These days, I do a lot more programming than sysadmin'ning and help desk, but when I was the IT support guy at our company, my overall impression of Office 2013 was not very good. I have seen it just stop working on a handful of computers (out of about 75-80, so to that is a lot), in such a way that I could only "fix" it by uninstalling and reinstalling Office from scratch. On one of our CAD workstations, Outlook and Autodesk Inventor started a feud where an update to MS Office caused Inventor to crash, and the subsequent reinstallation of Inventor caused Outlook to crash when we tried to write an email. (Then we reinstalled Office, and then suddenly things worked magically, so I remain clueless as to what happened.) The latter may be Autodesk's fault as much as Microsoft's (I get the vague impression that they care even less about their software crashing than Microsoft, as long as the license is paid for). But the impression I get is that MS Office has suffered quite a bit over the years. Therefore I am not entirely unhappy about being stuck on Office 2007. I do miss OneNote, the one program from their Office suite I really like, but I have org-mode, so I can manage. ;-)
EDIT: Sorry for venting, that one has been building up for a long time.
I always thought I was the only one noticing this.
With Compositing enabled, both with DWM and on GNU/Linux, the whole interaction seems to become "soft" instead of the raw that feels much nicer and snappy. From my experience it also has to do with passing through the stack to the GPU when compositing, running it all from the CPU is what makes it feel snappy.
I've also been researching about removing the triple buffer vsync on W10. It seems it was possible in the first builds by replacing some system files, but that option is gone now with the recent big releases.
As of that, I do not see the real reason why compositing would be needed on W10, as transparency and etc arent important factors.
Makes me think of the "smooth scrolling" option that you can find in most web browsers. Never liked that, and first thing i hunt down after a new install.
This because using it feels like scrolling through molasses for whatever reason.
> Actually, I don’t know why a compositing window manager should enforce V-Sync anyway? Obviously you get screen tearing without it but the option should still be there for those who want it.
Every additional option (especially in the realm of video settings) opens the door for additional complexity, implementation error, and user error in unintentionally setting the undesired mode. It's perfectly understandable why window managers would settle on one or the other of two extremely different render-to-screen approaches, especially when general consensus for quite some time now in the graphics space has been that minimizing the potential for tearing is preferable.
> your keyboard is already slower than you might expect.
An extract from the linked article:[0]
> A major source of latency is key travel time. It’s not a coincidence that the quickest keyboard measured also has the shortest key travel distance by a large margin.
They're not measuring from when the signal is sent from the keyboard, they're measuring from when the force begins to apply on the key. If you have a clicky or tactile switch (Cherry MX Blues, Greens, Browns, Clears, etc) then the latency measured here will be way disproportionate to how it actually /feels/.
In Windows 7, classic mode disabled the DWM compositor and V-Sync. It's incredibly dumb that Microsoft would arbitrarily remove that feature in Windows 10 to push their ugly as sin post-metro UI.
Your application renders a frame then the compositor gets it and does its transformations, if any. The composited result is then rendered to the screen thus adding one frame of latency. That's why the article says one solution would be to get a 144Hz monitor, it would reduce the time between frames so an extra frame of latency wouldn't be as bad.
You could potentially reduce this delay as well by having the application and the compositor in communication. Since rendering is going to be synced to vblank if you can get the application to not try to sync as well and instead just notify the compositor when it is done drawing a frame you could potentially get the application drawing and the compositor drawing in the same vblank interval. This is what Wayland and DRI3 Present let you do in the Linux world, I assume Windows has something similar but you'd need to opt-in to it so I bet nothing uses it.
That's throughput, though, not latency. I would guess that servo's input latency is significantly more than 2ms, even if their throughput for certain rendering operations is 500 fps.
Depends on the complexity of the html and CSS in question. To reproduce a full desktop environment, even just using Canvas, would be challenging and I highly doubt it would allow 500fps.
Thanks for the kind words :) But that was an artificial benchmarking mode that turned off any synchronization. It couldn't actually show the picture at hundreds of FPS, because the physical hardware can only update at 60.
i've been using sway[0] as my wm for some time now (it's a sort of port of i3 to wayland) and it's incredible that you can actually tell that it is much faster than wms running on X.
The irony is that it's most often not the fault of the DWM for the latency but the applications themself. Since DWM acts as the screens double buffer, your application needs to be synchronous with the DWM frame timing, not being in sync means latency and flickering.
Ah they are using margin:auto to center. I thought it must be an override since most user agents include a default body margin.
Yes I'm on mobile. My OnePlus 5 hides the first one or two pixels under the bezel if looking at it straight on, so the first character on each line gets a little cutoff. Not sure if this is just my model or if other phones do this also.
Either way the conclusion is the same: websites should have a minimum margin! I'm sure the author of the website is receptive to this feedback, so I sent an email.
Also, Firefox (and Safari on iOS) should have "view text-optimised version" button in the URL, maybe that would help you here? I don't know if other browsers have it though.
Is there a website that demonstrates the effects of latency after pressing a key? I know there's examples of different frame rates shown with moving circles, but I don't think that's quite the same.
I mean is there really a noticeable difference between say 20 and 40 ms?
sorry, so they had to write some code to test what they couldn't perceive but believe they can perceive?
I feel like its plausible that this is partly a psychological problem?
Perhaps it ends up being multiple vsync waits for a given rendered frame? Something like the application or OpenGL driver waiting for vsync before rendering into its buffer, then the compositor waiting for the next vsync before actually compositing/flipping.
This is a common source of delay in composited apps/games, yes. Ideally what you want is to have a completed frame ready for the compositor at least a few milliseconds before the next vertical sync arrives, but it's easy to screw that up, especially if you're getting fancy. Triple Buffering also enters the picture here (though mostly for games), because in the bad old days you had exactly two backbuffers, and if both were in use (one being scanned out to the monitor, the other your most recent completed frame) everything had to grind to a halt and wait before rendering or game code could continue. Triple buffering solved this by adding an extra frame, at the cost of an entire frame worth of display latency in exchange for your code spending less time spinning and waiting on the GPU. If someone is careless they could definitely end up with triple buffering enabled for their app (like if they're rendering using a media-oriented framework that turns it on.)
The 'Fast Sync' option NVIDIA added to their drivers in the last year or two is a fix for the triple buffering problem - you get spare buffers, but instead of adding a frame of latency the GPU always grabs the most recently completed frame for scanout. Of course, if a compositor is involved you now need the compositor to do this, and then for the compositor to utilize this feature when presenting a composited desktop to the GPU. I don't think any modern compositor does this at present.
Smartphones suffer from input latency too, though I’m unsure of the underlying cause (curious how iOS handles window/view drawing). It only seems to be getting worse, though I haven’t done tests on this. While each new model undoubtedly has better tech specs, the interface responsiveness doesn’t seem to improve.
Ghetto latency test: finger-scroll alternatingly up and down very quickly, see at which frequency your finger and scroll position are 180° out of phase, i.e. you finger is up while the contents are down or vice versa.
Smartphones seem to be fine according to that test. Android is very good and iOS is even better.
Hm I get about 4 up and downs per second before the scroll position is 180deg out of phase in Safari on iPhone 7+. That translates to about 125 (1000/8) ms latency?
Yes, that is how it works :)
That value seems surprisingly bad. My limited experience with iDevices (don't own one) has been that the position of the stationary finger on the page vs. finger on the page while scrolling (another way to measure - unless it is specifically fudged with some kind of prediction to make scrolling feel less detached) is very small. But I can't argue with data.
FWIW, I like to test Android in the scroll view of the OS settings app or the address book. Those are well implemented and presumably don't add unnecessary lag.
"The big problem with latency is that it accumulates. Once some component introduces delay somewhere in the input chain you aren’t going to get it back. That’s why it’s really important to eliminate latency where you can." - a lesson that applies to many things besides the narrow case of Windows 10.
I might be interested in seeing that if you can find the link.
I have a transwarped IIGS, which for CPU benchmarks is slower than most of the modern emulators, but on the actual hardware its pretty amazing (particularly since it boots faster from a compaq flash card than most win10 PC's i've seen). I would guess that a USB keyboard->windows->emulator->app response->draw->Emulator->windows->gpu path is much, much longer than the IIGS keyboard poll rate draw cycle even given the 1000 cycle/sec advantage a modern PC has.
ps: every time I boot an old desktop I always have this feeling. The new stacks are amazing, they do it all, but .. I love the immediate feeling of the old ones, even at some cost. And this comes from a compositing fetishist.
It wasn't much better on Windows 95 on average hardware around the time of the release of Windows 95. Heck, if you dared click [Start] as soon as the desktop displayed (read: Windows hadn't yet finished booting) then the whole OS would hang for several minutes.
Windows 95 had a ton of "clever" tricks, like "OLE Chicken" aka shimming in fake OLE instead of loading real thing just to display desktop faster, executing anything triggered .dll load anyway, but official metric was blue desktop....
If you want to have the most minimal Windows setup:
- Don't use an antivirus
- Stop unused services running in the background (e.g: services.msc)
- Turn off all vision effects including compositing and animation
Then you might want to set up a firewall to block all the nonsense like SMB, NetBIOS, etc. You can also set up a cheap old machine to act as your firewall, reverse proxy cache, antivirus/antispam, etc.
You can set up a script to turn on all the printing related services when you are actually going to use a printer.
While LTSB looks good (fewer bundled apps, that's great), it is very dissimilar from Arch or Debian.
Regarding hardware compatibility, please be aware of the fact that Debian supports many more processor architectures than Windows.
Maybe Windows will support some peripherals better, but Linux has improved a lot in this respect. Chances are out of the box support for hardware is better in Linux than it is on Windows these days.
FreeSync only helps when you're running a GPU heavy game that can't keep up with the monitor's refresh rate (dips below 60/120/144/whatever Hz). All desktop compositors definitely can and do render at your monitor's refresh rate :)
Vulkan wouldn't help much. It has less overhead (no validation in production, etc.), but GL/GLES are plenty fast for any compositing tasks. The difference might be completely negligible.
Is this something related to the latest Windows 10 updates or it was always the case? I mentioned this because, may be unrelated, there are some slow drawings in the UI after the October major update. I can notice this when I login and the desktop is drawn.
The article mainly explores the difference with DWM (Desktop Window Manager) enabled and disabled. DWM's main role is to render all of the windows into separate buffers in memory and then "compose" them on the fly. This for example avoids the effect that dragging a window over a frozen application would result in glitches.
Since Windows 8 DWM can not be disabled. By default in W7 it is also enabled (the infamous Aero), but at least you can get rid of it.
WOW! I usually run about 7 emulators of android in one monitor, and my web browser and other stuff in the other monitor. things get really laggy sometimes, turning off this feature made HUGE difference. my experience has improved dramatically
the argument about vsync and framebuffers seems mistaken. vsync only prevents partial renderers, not full renders, so disabling it does not remove latency in most cases,
this article makes it sound like there is some magical way to draw directly into the buffer without it being redrawn, which is not true. the best you can get is a chance of faster drawing because you can write into the buffer is its being drawn... (and probably get some tearing)
the idea that compositing is somehow slower is also very misleading... how exactly is a stacked renderer faster?
i think the author is blaming a poor implementation on technical details that they only partly understand.
> vsync only prevents partial renderers, not full renders, so disabling it does not remove latency in most cases,
V-Sync will, as the name implies, wait until the partial draw finishes. This "wait" is what is adding the latency.
> the idea that compositing is somehow slower is also very misleading... how exactly is a stacked renderer faster?
Yes, compositing is slower because when an application wants to draw its window needs to send the image to the compositor (or acquire the handle to the backing texture for the window that the compositor uses or whatever, that is implementation details) and then the compositor will at some point later draw it together will all the other windows (almost all compositors are V-Synced, meaning that you will see at most 60Hz updates - or whatever refresh rate your monitor is running at). This adds a very noticeable delay between the program needing to update itself and the update being visible to the user.
On the other hand, without a compositor and assuming a window system based on clipping regions (like X11 and Windows without DWM) with direct to frontbuffer drawing, the application will ask the window system to prepare for drawing in the window (which usually means the window system will setup the clipping to be inside the window's visible area), then perform the drawing directly on the framebuffer and notify the window system that it is done (so that the clipping stuff can go away). Notice how nothing here waits on anything else, like v-sync (or any other interval) and how this totally ignores other windows - each window draws itself immediately when needed instead of having to orchestrate an update for all windows on the screen.
Of course with the latter approach you do get tearing since you can have windows drawing themselves during a monitor refresh, but if that is a problem or not is up to the user. Personally i never care about tearing to the point that i barely notice it, yet i immediately notice any sort of V-Sync or compositor induced lag so i always try to avoid these.
I find the exact opposite, the UI latency of macOS seems to me a lot lower latency and more 'snappy' that Windows 7/8/10, especially with Metal2 under High Sierra.
The data speaks for itself but I'm trying to perceive any latency in my Firefox (win10) and unable to. Typing seems instantaneous to me. I hate the milliseconds the start menu takes to animate though. Win7 start was instant.
From my experience latency is rarely bad when it's very very consistent. After some time the brain compensates.
I do recall playing World of Tanks with a 800ms Ping. After about 3 months or so I was back to thinking there was no Ping until a problem with my ISP was fixed and I dropped to 8ms. After which I ran into walls a lot.
The brain can adapt to such things rather well, given enough time.
I agree with you about consistency but 792ms is something though!
I remember when we upgraded from a 28.8k modem to ISDN. My Quake got a lot better purely because of lower latency. A friend was left behind on modem and he could tell the difference too, I was virtually unbeatable to him after the upgrade.
Of course we both knew that it was down to an unfair advantage so it wasn't a true victory.
What I hate even more is when I press the Windows key and start typing an application's name into the Start menu textbox, and it misses the first two or three keypresses.
As a comparison point, gnome 3 does the right thing and buffers the input until it can handle it. On an old machine I usually have the full name typed out and enter pressed before any animation actually starts being displayed.
I think it somehow depends on whether the OS is installed on a spinning disk or an SSD. I know it's stupid but this is my theory based on observation. Would love for someone to verify it.
I'm sure a spinning disk would make it worse, but I do have an SSD (admittedly, a 2013 model and not the fastest). To answer the grandparent, build 15063 and I'm using the laptop keyboard.
I tried a few times and I can consistently make Windows miss at least the first keypress if I haven't opened the Start menu in a few minutes.
When the only source of latency is the extra ~16ms added by the composting, you likely can't feel it, but it adds up to other sources of latency that can make things feel bad more often. Such as:
-latency added by text editors depending on their quality and the amount of work they are doing (colors, intellisense, file size, etc).
-latency added by keyboards, some are worse than others
-latency added by monitors. Good ones are in the 1ms range but bad ones can be as high as 40
Sadly the latency on all of these fronts has generally been trending higher as computers have gotten more powerful.
First, it's possible to design an app around the compositor. Instead of sending a frame to the system, send a tree of layers. When updating the content or scrolling, just send a small delta to that tree. Further, instead of sending a single frame, send a small (~100ms) slice of time and attach animations so the motion can be silky smooth. In my experiments so far (which are not quite ready to be made public but hopefully soon), this gets you window resizing that almost exactly tracks the mouse (as opposing to lagging at least one frame behind), low latency, and excellent power usage.
Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface. These already exist on mobile, where power is important, but making it work for the desktop is challenging. When that happens, you get your frame back.
So I think the answer is to move forward, embrace the compositor, and solve the engineering challenges, rather than move backwards, even though adding the compositor did regress latency metrics.
Edit: here's the talk I was referencing that mentions overlays: https://www.youtube.com/watch?v=E3wTajGZOsA