
Desktop compositing latency is real - dezgeg
http://www.lofibucket.com/articles/dwm_latency.html
======
raphlinus
I've been experimenting with this too, in the context of the Windows front-end
for xi editor. It's absolutely true that the compositor adds a frame of
latency, but I have a very different take than "turn it off."

First, it's possible to design an app around the compositor. Instead of
sending a frame to the system, send a tree of layers. When updating the
content or scrolling, just send a small delta to that tree. Further, instead
of sending a single frame, send a small (~100ms) slice of time and attach
animations so the motion can be silky smooth. In my experiments so far (which
are not quite ready to be made public but hopefully soon), this gets you
window resizing that almost exactly tracks the mouse (as opposing to lagging
at least one frame behind), low latency, and excellent power usage.

Further, Microsoft engineers have said that hardware overlays are coming, in
which the graphics card _pulls_ content from the windows as it's sending the
video out the port, rather than taking an extra frame time to _paint_ the
windows onto a composited surface. These already exist on mobile, where power
is important, but making it work for the desktop is challenging. When that
happens, you get your frame back.

So I think the answer is to move forward, embrace the compositor, and solve
the engineering challenges, rather than move backwards, even though adding the
compositor did regress latency metrics.

Edit: here's the talk I was referencing that mentions overlays:
[https://www.youtube.com/watch?v=E3wTajGZOsA](https://www.youtube.com/watch?v=E3wTajGZOsA)

~~~
CyberDildonics
How are you testing the mouse cursor following the window borders on resize?
I've read that windows turns off the hardware sprite mouse cursor when
resizing windows so that it can software render it to always line up properly.

~~~
raphlinus
Visual inspection for now. I've got an Arduino and a high-speed camera, so my
plan for the next step is to send mouse and keyboard events from the Arduino,
blinking an LED at the same time, then capture both the LED and the monitor in
the video. Then a bit of image analysis. This is the only way to be
quantitative and capture all the sources of latency.

~~~
CyberDildonics
The implication of what I was saying is that windows will always show the
cursor lining up with the border of a window during resizing.

~~~
raphlinus
Would that it were so.

~~~
eptcyka
Hmm, are you inspired or were part of
[https://github.com/google/walt](https://github.com/google/walt) ?

------
nickjj
This is also why picking a good monitor is important for software development.

Some monitors have tons of input lag (60-70ms) and that's the time it takes
for you to see what you're typing to reach the display. This also includes the
time it takes to see you move your mouse cursor too.

I did a huge write up on picking a good monitor for development which can be
found at: [https://nickjanetakis.com/blog/how-to-pick-a-good-monitor-
fo...](https://nickjanetakis.com/blog/how-to-pick-a-good-monitor-for-software-
development)

I ended up buying the Dell UltraSharp U2515H
[http://amzn.to/2jF3WHp](http://amzn.to/2jF3WHp). It has very low input lag
(10-15ms) and it runs natively at 2560x1440.

~~~
83457
"I can send an IP packet to Europe faster than I can send a pixel to the
screen. How f’d up is that?" \- John Carmack

~~~
Jasper_
Right, the issue is that 60Hz is ridiculously slow. Ethernet hardware is
multiple orders of magnitude faster. It's just because of broadcast TV and CRT
legacy. G-Sync / FreeSync allow up to 240Hz, which is ~4ms latency.

~~~
CyberDildonics
No, the issue was that he was using a display with a crazy amount of buffering
because they were doing so much filtering and doing it poorly.

~~~
Retric
It's not just monitors even keyboards can have 40+ms worth of delay and it all
adds up.

~~~
CyberDildonics
I don't think that was what he was referring to

~~~
Retric
"I video record showing both the game controller and the screen with a 240 fps
camera"
[https://superuser.com/q/419070/2269](https://superuser.com/q/419070/2269)

So, it's total response times from key press on a game controller to screen
that is important not individual components. Sure, 0.04 seconds on it's own is
not a big deal, but when 5 or six things take 0.04 seconds you hit noticeable
delays.

------
emn13
I will gladly believe this is a real problem, but this page does _not_
demonstrate that (at least not convincingly); the metric used is simply too
poor.

To quote:

> I used my own hacky measurement program written in C++ that sends a keypress
> to the application to be benchmarked. Then it waits until the character
> appears on screen. Virtual keypresses were sent with WinAPIs SendInput and
> pixels copied off screen with BitBlt.

So, this is measuring some rather artificial and fairly uninteresting time.

You really do need to measure the complete stack here, especially if your
theory is that issues like vsync are at stake, because the level at which
vsync happens can vary, and because there are interactions here that may
matter. E.g. if there are 100 reasons to wait for vsync, and you remove one of
them... you're still going to wait for vsync. It's not 100% clear that this
measurement actually corresponds to anything real. Also, note that a
compositor need not _necessarily_ do anything on the CPU, so by trying read
back the composited image, you may inadvertently be triggering some kind of
unnecessary (or OS-dependent sync). E.g. it's conceivable that regardless of
tearing on screen you want the read-back to work "transactionally", so you
might imagine that a read requires additional synchronization that mere
rendering _might_ not.

And of course: all this is really complicated, there are many moving part's
we're bound to overlook. It's just common sense to try to measure something
that matters, to avoid whole classes of systemic error.

Ideally, you'd measure from real keypress upto visible light; but at the very
least you'd want to measure from some software signal such as SendInput up to
an hdmi output (...and using the same hardware and as similar as possible
software). Because at the very least that captures the whole output stack,
which he's interested in.

Another advantage of a whole-stack measurement is that it puts things into
perspective: say the additional latency is 8ms; then it's probably relevant to
know at least roughly how large the latency overall is.

~~~
modeless
The time measured is artificial but not necessarily uninteresting. You can't
measure the true end to end latency this way, but you can compare different
user applications to see how much latency they add. It's not perfect but it is
the best available way to measure latency of arbitrary applications without
extra hardware.

I built a browser latency benchmark based on a similar method:
[https://google.github.io/latency-
benchmark/](https://google.github.io/latency-benchmark/)

I did plenty of testing and found that the measurements obtained this way do
correlate with true hardware based latency measurement.

All that said, it is a travesty that modern OSes and hardware platforms do not
provide the appropriate APIs to measure latency accurately. A lot of what is
known about latency at low levels of the stack is thrown out before you get to
the APIs available to applications.

~~~
emn13
I'm sure they correlate: but the thing with correlation is that you can have
correlation even when there are whole classes of situations where the
relationship doesn't hold.

On the same os +drivers I'd be willing to believe that this measure is almost
certainly useful (even there, it's not 100%). But it's exactly the kind of
thing that a different way of, say... compositing... might the implementation
to work a little differently, such that you're comparing apples to oranges. If
BitBlt simply gets access a little earlier or later in the same pipeline and
hey presto: you've got a difference in software that is meaningless in
reality.

------
Zardoz84
> Don’t you find it a bit funny that Windows 95 is actually snappier than
> Windows 10? It’s really a shame that response times in modern computers are
> visibly worse than those in twenty years ago.

My Amiga A1200 (Only have extra RAM) feels faster and more responsive that any
modern computer with Windows or GNU/Linux desktop.

~~~
fredley
Layers of abstraction take you further away from the metal. The more layers of
abstraction your keypress must traverse before rendering is complete and the
photons have reached your retina, the longer it will be until that happens.

Layers of abstraction make complex tasks more reachable by a larger number of
programmers by reducing the amount of specialist knowledge about those lower
layers required to do the job. The more layers of abstraction you have, the
less they have to worry about lower levels and can just get on with what they
want to do.

~~~
wruza
Numerous games, including those having complex graphics and behavior, can
render 120+ frames per second and realtime interactions (physics, optics,
reactions) on pretty average hardware. I don’t think that game scripters who
make final things like scenery or ui face complexity much harder than those in
gtk/qt/wpf/htmljs widget programming. Details would be interesting though,
since I’m no game developer.

If true, it must be something very wrong with traditional ui systems?

Edit: my apologies, I didn’t read the article first and thought it measured
complete feedback like “press ctrl-f and wait for element to popup”, but I’m
interested in my question regarding games anyway.

~~~
badsectoracula
There isn't much of a difference between a GUI toolkit you'd find in a desktop
application and the GUI framework you'd see in a game - the most likely
difference will be that the game GUI will be redrawn every frame whereas the
desktop GUI wont (and there are game GUI frameworks that cache their output to
avoid redrawing the entire widget tree every frame).

The difference when it comes to why games can be snappier is that games are
"allowed" to bypass most of the layers and cruft that exists between the user
and the hardware, including the compositor that the linked article is talking
about (in Windows at least).

Fortunately in Linux with Xorg you can get stuff on screen as fast as the code
can ask for it, as long as you are not using a compositor (so you can even
play games in a window with no additional lag!).

Hopefully the Wayland mania wont get to kill Xorg since yet another issue
Wayland has and X11 doesn't is that with Wayland you are forced in a
composition model.

~~~
nhaehnle
The funny thing is that there is no technical reason at all for compositing to
have worse latency, even for games.

Think about the actual operations that are involved. You certainly never want
to render directly into the front-buffer (you'd end up scanning out partially
rendered scenes). So you render to a back buffer. Which you then blit to the
front buffer (assuming you're in windowed mode; in full-screen mode the
compositor goes out of the way anyway and lets you just flip).

The only difference between the various modes of operation is who does that
final blit. In plain X, it's the X server. In Wayland, it's the Wayland
compositor. In X with a compositor, it's the compositor.

Now granted, some compositors might be silly and re-composite the whole screen
each frame, but clearly that can be avoided in most cases.

Depending on the setup, there can also be some issues with the scheduling of
the game's jobs vs. the compositor's jobs on the GPU. Valve are working on
this currently for VR, since the problem is much more noticeable there --
clearly it can be fixed one way or another (on Radeon GPUs you could do a blit
via asynchronous compute if need be, for example), but note that compositing
actually doesn't change this issue (since the X server's jobs also need to
compete with the game's jobs for scheduling time).

So if compositing has worse latency, it's because nobody has cared enough to
polish the critical paths. Conversely, compositing clearly does have
advantages in overall image quality. So why not fix the (entirely fixable)
technical problems with compositing?

~~~
badsectoracula
There is a very good practical reason why the compositor is in no position to
fix that even if it could theoretically be possible.

A major source for the compositor latency (or actually, the increased response
time you get with a compositor) is that the "render to back buffer" (ie.
compositor's texture, at the best case) and the "blit to the front buffer"
(which happens by the compositor by drawing the window geometry) do not happen
at the same time.

From a technical perspective it is perfectly possible for a compositor to
create a tight integration between a program and the compositor itself: simply
synchronize the program's updates with the compositor updates. Every time the
program says "i'm done drawing" (either via an explicit notification to the
compositor, via glXSwapBuffers or whatever), issue a screen update.

The problem however here is the compositor has to take into account multiple
windows from multiple programs so you cannot have a single window dictating
the compositor updates. Imagine for example two windows with animations
running, one at 100fps and another at 130fps. Depending which window is active
(assuming that the compositor syncs itself with the active window), it would
affect the perception of other window's updates (since what the user will see
will be at the rate of the foreground window's update rate). Moreover, beyond
just the perception, it will also affect the other windows' animation loops
themselves - if a background window finishes drawing itself and notifies the
compositor while the compositor is in the middle of an update, the background
window will have to wait until the update is finished - thus having the
foreground window indirectly also affect the animation loops of the background
windows. This can be avoided through triple buffering, but that introduces an
extra frame of latency - at least for background windows.

So to avoid the above problems, what all compositors do is to decouple window
update notifications from screen updates and instead perform the screen
updates at some predefined interval - usually the monitor refresh rate and
they do the updates synchronized to it. However that creates the increased
response time you get with the compositor being on a few milliseconds behind
the user's actions, with the most common example would be window manipulation
like resizing and moving windows lagging behind the mouse cursor (which is
drawn by the GPU directly, thus bypassing the compositor).

Hence the linked article recommending a 144Hz monitor to avoid this, although
this is just a workaround that makes the problem less visible but doesn't
really solve it.

~~~
nhaehnle
This "do not happen at the same time" is true in plain Xorg as well, though,
since the final blit to the screen happens in the X server and not in the
application.

Your example of 100fps vs. 130fps on the same screen is inherently unsolvable
in a proper way with anything less than a 1300fps display. So you have a bunch
of tradeoffs, and I'm sorry to say that if the tradeoff you prefer is tearing,
you're in the losing minority by far.

That said, if you truly wanted to write a tearing Wayland compositor, you
could easily do so, and in any case plain X is still going to work as well.

~~~
badsectoracula
Without a compositor, when you ask to draw a line, a rectangle, a circle or
even a bitmap, it is drawn immediately. Sure, it isn't done in zero time,
there is some latency, but that is the case with any graphics system :-).

As for the compositor, it isn't impossible to create a Wayland "compositor"
that draws directly on the front buffer either, it is just harder and
pointless since Xorg exists :-P.

But yeah, if everyone abandons Xorg (and by everyone i mean _Everyone_ , not
just the popular kids) and nobody forks it (which i doubt it'll happen as
there are a ton of people who dislike Wayland) and nobody else steps up to do
something about it, then yeah, i'll most likely just make my own suckless
Wayland compositor. I'd prefer the world to stay sane though so i can continue
doing other stuff :-P.

------
ahartmetz
I recently measured it with my phone's camera in slow motion mode. The system
is an AMD Ryzen 1800X with AMD R9 280x GPU, KDE Plasma with KWin window
manager in compositing mode. Key press to screen output latency was ~33
millseconds (90 fps recording, so increments of 11) in KWrite. The computer
feels plenty responsive with that latency, and I hate latency...

It is a full stack real world result - for comparison purposes it makes sense
to measure only the software as in the article, but in reality you want to
optimize everything. Especially screens can be quite bad - tens of
milliseconds, up to 100 in the worst. USB lag is usually quite low - when I
measured it once for low-latency serial comm it was usually < 2 ms.

~~~
vetinari
33 ms is two frames, if your monitor is at 60 Hz. If you tried vscode or other
electron app, it might be 49 ms (3 frames). These are the numbers I'm getting
from 1900X with Nvidia 1080 GPU, Gnome3, 4k@60hz, but without measuring
latency of the keyboard itself.

Modern keyboards are another part of the problem. They can also take their
sweet time since keypress until packet appears at the USB bus. See
[https://danluu.com/keyboard-latency/](https://danluu.com/keyboard-latency/)

~~~
floatboth
Modern motherboards often still have a PS/2 port! And most USB keyboards still
support PS/2, a passive adapter works great.

~~~
vetinari
The problem is often in keyboard controller, not in the interface. Apple
managed to get fastest keyboard with only 15ms lag; other may be order of
magnitude slower.

~~~
fhood
Could someone explain to me why on earth 15ms of lag for a key press is
considered good? It is a switch for gods sake. It should be near instant.

~~~
vetinari
The linked article explains it (TLDR: key travel time, scanning keyboard
matrix, debouncing).

~~~
looiid
It’s a common myth that denouncing needs to meaningfully affect latency. It
does not. It will affect maximum repeat rate, but you can pretty much report
an event the moment you see an edge.

------
forgotmypw
Someone recently gave me an old PowerBook G3, running Mac OS 8.6. I was amazed
by how responsive the UI is compared to today's UIs, from Mac to Windows to
iOS to Android. When I clicked something, it felt like there was a pushrod
between the mouse button and the menu, which triggered it instantly.

~~~
raverbashing
Well compositing was introduced in 10.2, I'm not sure if running Classic Mac
OS was an advantage in this case

~~~
sandyarmstrong
Did you use OS X prior to 10.2? I guarantee you it was slower and worse in
every way. Especially compared to classic. There is a reason they continued
installing OS 9 side-by-side before 10.2.

~~~
galad87
OS X did compositing on the cpu before Quartz Extreme in 10.2

------
jsd1982
> Virtual keypresses were sent with WinAPIs SendInput and pixels copied off
> screen with BitBlt.

This methodology alone could account for the differences in timing between
Win7 and Win10. For all we know, Win10 could just be slower at getting the
pixels back to the program from BitBlt, or SendInput could be slower
triggering events, or a multitude of other issues.

The best way to truly detect key-to-photon latency is with an external video
recorder that has both the screen and keyboard in frame. Grant a few ms of
noise for key travel distance.

~~~
mastax
As I understand it:

\- Compositing is done on the GPU

\- BitBlt is done on the CPU

\- Copies from GPU -> CPU are slow

So, yeah, compositing adds a frame of VSync latency, but these measurements
are complete bunkum.

~~~
leeter
BitBlt is done in DMA ram by the CPU, so it may be even worse than just a copy
as there is likely a wait involved too to prevent shared access. Using DMA ram
prohibits the GPU/Driver from doing optimizations on that ram that it could do
if the buffer was in dedicated gpu-ram. This is why DX12 resources are
generally always copied into non-shared buffers.

------
deniska
That's why I prefer to play games in fullscreen as opposed to "borderless
windowed", I have noticed quite a bit of input lag in the latter mode.

~~~
cybermancy
This is because fullscreen mode allows the use of something called "DirectDraw
Exclusive mode" which bypasses Windows for making calls to the GPU.

[https://msdn.microsoft.com/en-
us/library/windows/desktop/dd3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/dd375451\(v=vs.85\).aspx)

~~~
nulagrithom
I wonder if it would be feasible to use this to reduce input lag in an editor?

~~~
kevingadd
Yes, but you lose access to any user interface components you don't paint
yourself, and you can eat some lag/flickering when switching out of the app as
control is returned to the compositor. If you end up needing to show the Open
File picker from the OS or pop up the Print dialog you'll need to exit
exclusive mode.

------
pmoriarty
This article is about Windows, but I wonder how Wayland on Linux measures up.

~~~
upofadown
Wayland does everything with compositing. The Wayland people love v-sync
because they hate tearing. So chances are that this effect applies...

~~~
digi_owl
And then Gnome goes and builds on that by hooking the mouse pointer up to the
redraw...

------
zokier
I would like to see end-to-end measurements (=high speed camera footage
analysis) before making final conclusions. Not saying that compositing doesn't
add latency, but I feel like the system is so complex that this sort of
userspace software measurement might not tell the whole story

------
krylon
My work laptop (the only Windows computer I use) runs Windows 7, and I intend
to keep it that way as long as Windows 7 still gets updates. This article just
confirms my bias, and I freely admit I _am_ biased. I do not like Windows very
much to begin with, but as far as Windows goes, I think Windows 7 ____ing
nailed it (for people without touchscreens, anyway).

On a related note, I have noticed that Outlook 2013 exhibits a notable lag
between a keystroke and a character appearing in the message window. I have
not done any measurements, but my best guess is that it is in the order of
hundreds of milliseconds. If you type fast (I like to think that I do),
Outlook can keep up throughput-wise, but this lag is terribly annoying.

~~~
mschuster91
> On a related note, I have noticed that Outlook 2013 exhibits a notable lag
> between a keystroke and a character appearing in the message window.

Try switching to text-only mails, no zoom... and if you must write HTML mails,
do not have an image that is larger size than the window. As soon as there is
an image that doesn't fit into the window at 100% zoom Outlook begins to
crawl.

~~~
krylon
_My_ work computer is stuck with Office 2007, because I have Office 2007
Professional, and I need Access about once per year for an arcane reason.
Office Pro is fairly expensive, so for the time being, I am stuck with 2007. I
am still not entirely sure if I should be happy or sad about it. ;-) But I
have used Outlook 2013 on coworkers' computers every now and then, and it was
pretty laggy.

These days, I do a lot more programming than sysadmin'ning and help desk, but
when I was the IT support guy at our company, my overall impression of Office
2013 was not very good. I have seen it just stop working on a handful of
computers (out of about 75-80, so to that is a lot), in such a way that I
could only "fix" it by uninstalling and reinstalling Office from scratch. On
one of our CAD workstations, Outlook and Autodesk Inventor started a feud
where an update to MS Office caused Inventor to crash, and the subsequent
reinstallation of Inventor caused Outlook to crash when we tried to write an
email. (Then we reinstalled Office, and then suddenly things worked magically,
so I remain clueless as to what happened.) The latter may be Autodesk's fault
as much as Microsoft's (I get the vague impression that they care even less
about their software crashing than Microsoft, as long as the license is paid
for). But the impression I get is that MS Office has suffered quite a bit over
the years. Therefore I am not entirely unhappy about being stuck on Office
2007. I do miss OneNote, the one program from their Office suite I really
like, but I have org-mode, so I can manage. ;-)

EDIT: Sorry for venting, that one has been building up for a long time.

------
tauio111
I always thought I was the only one noticing this. With Compositing enabled,
both with DWM and on GNU/Linux, the whole interaction seems to become "soft"
instead of the raw that feels much nicer and snappy. From my experience it
also has to do with passing through the stack to the GPU when compositing,
running it all from the CPU is what makes it feel snappy.

I've also been researching about removing the triple buffer vsync on W10. It
seems it was possible in the first builds by replacing some system files, but
that option is gone now with the recent big releases.

As of that, I do not see the real reason why compositing would be needed on
W10, as transparency and etc arent important factors.

~~~
digi_owl
Makes me think of the "smooth scrolling" option that you can find in most web
browsers. Never liked that, and first thing i hunt down after a new install.

This because using it feels like scrolling through molasses for whatever
reason.

------
fixermark
> Actually, I don’t know why a compositing window manager should enforce
> V-Sync anyway? Obviously you get screen tearing without it but the option
> should still be there for those who want it.

Every additional option (especially in the realm of video settings) opens the
door for additional complexity, implementation error, and user error in
unintentionally setting the undesired mode. It's perfectly understandable why
window managers would settle on one or the other of two extremely different
render-to-screen approaches, especially when general consensus for quite some
time now in the graphics space has been that minimizing the potential for
tearing is preferable.

------
half-kh-hacker
> your keyboard is already slower than you might expect.

An extract from the linked article:[0]

> A major source of latency is key travel time. It’s not a coincidence that
> the quickest keyboard measured also has the shortest key travel distance by
> a large margin.

They're not measuring from when the signal is sent from the keyboard, they're
measuring from when the force begins to apply on the key. If you have a clicky
or tactile switch (Cherry MX Blues, Greens, Browns, Clears, etc) then the
latency measured here will be _way disproportionate_ to how it actually
/feels/.

[0]: [https://danluu.com/keyboard-latency/](https://danluu.com/keyboard-
latency/)

------
jd3
further reading:

[https://pavelfatin.com/typing-with-pleasure/](https://pavelfatin.com/typing-
with-pleasure/)

[https://danluu.com/keyboard-latency/](https://danluu.com/keyboard-latency/)

In Windows 7, classic mode disabled the DWM compositor and V-Sync. It's
incredibly dumb that Microsoft would arbitrarily remove that feature in
Windows 10 to push their ugly as sin post-metro UI.

------
RyanRies
The DWM compositor is bad for games too. The only way to take it out of the
equation is to use your GPU in exclusive mode.

------
srcmap
According to this:
[https://www.youtube.com/watch?v=BTURkjYJ_uk](https://www.youtube.com/watch?v=BTURkjYJ_uk)

Firefox's servo engine can compose CSS elements/Display List together at 500
frames / second.

Maybe next version of Windows / Linux desktop should use FF's servo engine?

~~~
amaranth
Your application renders a frame then the compositor gets it and does its
transformations, if any. The composited result is then rendered to the screen
thus adding one frame of latency. That's why the article says one solution
would be to get a 144Hz monitor, it would reduce the time between frames so an
extra frame of latency wouldn't be as bad.

You could potentially reduce this delay as well by having the application and
the compositor in communication. Since rendering is going to be synced to
vblank if you can get the application to not try to sync as well and instead
just notify the compositor when it is done drawing a frame you could
potentially get the application drawing and the compositor drawing in the same
vblank interval. This is what Wayland and DRI3 Present let you do in the Linux
world, I assume Windows has something similar but you'd need to opt-in to it
so I bet nothing uses it.

------
gciruelos
i've been using sway[0] as my wm for some time now (it's a sort of port of i3
to wayland) and it's incredible that you can actually tell that it is much
faster than wms running on X.

[0] [http://swaywm.org/](http://swaywm.org/)

~~~
Sir_Cmpwn
It's funny you mention this - it's only in the past few days that we've been
taking this sort of thing more seriously, and our work is unreleased!

------
revanx_
The irony is that it's most often not the fault of the DWM for the latency but
the applications themself. Since DWM acts as the screens double buffer, your
application needs to be synchronous with the DWM frame timing, not being in
sync means latency and flickering.

------
caleblloyd
No left margin on webpages is real too, and it annoys me

~~~
vanderZwan
Are you on mobile? It centres perfectly fine for me on desktop. The linked CSS
file uses this method:

    
    
        body {
        	max-width: 844px;
        	margin-left: auto;
        	margin-right: auto;
        	font-family: Verdana, Arial, Helvetica, sans-serif;
        }
    

I guess wrapping the whole article with a div with 0.5em margin would fix it
on mobile.

~~~
caleblloyd
Ah they are using margin:auto to center. I thought it must be an override
since most user agents include a default body margin.

Yes I'm on mobile. My OnePlus 5 hides the first one or two pixels under the
bezel if looking at it straight on, so the first character on each line gets a
little cutoff. Not sure if this is just my model or if other phones do this
also.

~~~
vanderZwan
Either way the conclusion is the same: websites should have a minimum margin!
I'm sure the author of the website is receptive to this feedback, so I sent an
email.

Also, Firefox (and Safari on iOS) should have "view text-optimised version"
button in the URL, maybe that would help you here? I don't know if other
browsers have it though.

------
igor_p
Is there a website that demonstrates the effects of latency after pressing a
key? I know there's examples of different frame rates shown with moving
circles, but I don't think that's quite the same.

I mean is there really a noticeable difference between say 20 and 40 ms?

------
Quarrelsome
sorry, so they had to write some code to test what they couldn't perceive but
believe they can perceive? I feel like its plausible that this is partly a
psychological problem?

~~~
mfukar
Well, it does say

> At least I can feel the difference when typing.

~~~
Quarrelsome
I can feel people's auras. Discuss.

~~~
mfukar
I suspect you know what 'feel' and 'perceive' mean, and just being
quarrelsome.

~~~
Quarrelsome
I'm suggesting its in their head and they can't actually feel or perceive it.

------
marcosdumay
Why would enforcing vsync add more than 1/60s of latency to anything?

This looks way more like badly designed animations than some fundamental
problem coming from the hardware.

~~~
0xcde4c3db
Perhaps it ends up being multiple vsync waits for a given rendered frame?
Something like the application or OpenGL driver waiting for vsync before
rendering into its buffer, then the compositor waiting for the next vsync
before actually compositing/flipping.

~~~
kevingadd
This is a common source of delay in composited apps/games, yes. Ideally what
you want is to have a completed frame ready for the compositor at least a few
milliseconds before the next vertical sync arrives, but it's easy to screw
that up, especially if you're getting fancy. Triple Buffering also enters the
picture here (though mostly for games), because in the bad old days you had
exactly two backbuffers, and if both were in use (one being scanned out to the
monitor, the other your most recent completed frame) everything had to grind
to a halt and wait before rendering or game code could continue. Triple
buffering solved this by adding an extra frame, at the cost of an entire frame
worth of display latency in exchange for your code spending less time spinning
and waiting on the GPU. If someone is careless they could definitely end up
with triple buffering enabled for their app (like if they're rendering using a
media-oriented framework that turns it on.)

The 'Fast Sync' option NVIDIA added to their drivers in the last year or two
is a fix for the triple buffering problem - you get spare buffers, but instead
of adding a frame of latency the GPU always grabs the most recently completed
frame for scanout. Of course, if a compositor is involved you now need the
compositor to do this, and then for the compositor to utilize this feature
when presenting a composited desktop to the GPU. I don't think any modern
compositor does this at present.

------
mzzter
Smartphones suffer from input latency too, though I’m unsure of the underlying
cause (curious how iOS handles window/view drawing). It only seems to be
getting worse, though I haven’t done tests on this. While each new model
undoubtedly has better tech specs, the interface responsiveness doesn’t seem
to improve.

~~~
ahartmetz
Ghetto latency test: finger-scroll alternatingly up and down very quickly, see
at which frequency your finger and scroll position are 180° out of phase, i.e.
you finger is up while the contents are down or vice versa. Smartphones seem
to be fine according to that test. Android is very good and iOS is even
better.

~~~
mzzter
Hm I get about 4 up and downs per second before the scroll position is 180deg
out of phase in Safari on iPhone 7+. That translates to about 125 (1000/8) ms
latency?

~~~
ahartmetz
Yes, that is how it works :) That value seems surprisingly bad. My limited
experience with iDevices (don't own one) has been that the position of the
stationary finger on the page vs. finger on the page while scrolling (another
way to measure - unless it is specifically fudged with some kind of prediction
to make scrolling feel less detached) is very small. But I can't argue with
data. FWIW, I like to test Android in the scroll view of the OS settings app
or the address book. Those are well implemented and presumably don't add
unnecessary lag.

------
stmw
"The big problem with latency is that it accumulates. Once some component
introduces delay somewhere in the input chain you aren’t going to get it back.
That’s why it’s really important to eliminate latency where you can." \- a
lesson that applies to many things besides the narrow case of Windows 10.

------
djsumdog
Are there any such tests using i3/X11 vs sway/wayland? I'm curious about Linux
input latency now.

------
babuskov
> Don’t you find it a bit funny that Windows 95 is actually snappier than
> Windows 10?

Comparing them on the same hardware?

~~~
amiga-workbench
Try popping the start menu open on a Windows 10 machine with a mechanical hard
drive.

~~~
laumars
It wasn't much better on Windows 95 on average hardware around the time of the
release of Windows 95. Heck, if you dared click [Start] as soon as the desktop
displayed (read: Windows hadn't yet finished booting) then the whole OS would
hang for several minutes.

~~~
rasz
Windows 95 had a ton of "clever" tricks, like "OLE Chicken" aka shimming in
fake OLE instead of loading real thing just to display desktop faster,
executing anything triggered .dll load anyway, but official metric was blue
desktop....

[https://blogs.msdn.microsoft.com/oldnewthing/20040705-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20040705-00/?p=38573)

------
partycoder
If you want to have the most minimal Windows setup:

\- Don't use an antivirus

\- Stop unused services running in the background (e.g: services.msc)

\- Turn off all vision effects including compositing and animation

Then you might want to set up a firewall to block all the nonsense like SMB,
NetBIOS, etc. You can also set up a cheap old machine to act as your firewall,
reverse proxy cache, antivirus/antispam, etc.

You can set up a script to turn on all the printing related services when you
are actually going to use a printer.

~~~
fra0
Pick up an LTSB release and remove the desktop (custom shell).

It looks a lot like Arch / Debian with first class hardware support.

~~~
partycoder
While LTSB looks good (fewer bundled apps, that's great), it is very
dissimilar from Arch or Debian.

Regarding hardware compatibility, please be aware of the fact that Debian
supports many more processor architectures than Windows.

Maybe Windows will support some peripherals better, but Linux has improved a
lot in this respect. Chances are out of the box support for hardware is better
in Linux than it is on Windows these days.

------
0xJRS
Is it really fair to compare something like Gvim to Slack though? I'm assuming
Gvim is going to be magnitudes faster than an Electron app.

------
nwah1
I wonder if any compositing window managers support FreeSync. That could help
significantly, for any v-sync related latency.

Vulkan rendering would help as well.

~~~
floatboth
FreeSync only helps when you're running a GPU heavy game that can't keep up
with the monitor's refresh rate (dips below 60/120/144/whatever Hz). All
desktop compositors definitely can and do render at your monitor's refresh
rate :)

Vulkan wouldn't help much. It has less overhead (no validation in production,
etc.), but GL/GLES are _plenty_ fast for any compositing tasks. The difference
might be completely negligible.

------
mschuster91
Well that certainly explains why a W7 with Classic theme feels faster than
with the default Aero theme. Thanks to OP for digging!

------
wslh
Is this something related to the latest Windows 10 updates or it was always
the case? I mentioned this because, may be unrelated, there are some slow
drawings in the UI after the October major update. I can notice this when I
login and the desktop is drawn.

~~~
yoz-y
The article mainly explores the difference with DWM (Desktop Window Manager)
enabled and disabled. DWM's main role is to render all of the windows into
separate buffers in memory and then "compose" them on the fly. This for
example avoids the effect that dragging a window over a frozen application
would result in glitches.

Since Windows 8 DWM can not be disabled. By default in W7 it is also enabled
(the infamous Aero), but at least you can get rid of it.

------
autokad
WOW! I usually run about 7 emulators of android in one monitor, and my web
browser and other stuff in the other monitor. things get really laggy
sometimes, turning off this feature made HUGE difference. my experience has
improved dramatically

------
jheriko
the argument about vsync and framebuffers seems mistaken. vsync only prevents
partial renderers, not full renders, so disabling it does not remove latency
in most cases,

this article makes it sound like there is some magical way to draw directly
into the buffer without it being redrawn, which is not true. the best you can
get is a chance of faster drawing because you can write into the buffer is its
being drawn... (and probably get some tearing)

the idea that compositing is somehow slower is also very misleading... how
exactly is a stacked renderer faster?

i think the author is blaming a poor implementation on technical details that
they only partly understand.

~~~
badsectoracula
> vsync only prevents partial renderers, not full renders, so disabling it
> does not remove latency in most cases,

V-Sync will, as the name implies, wait until the partial draw finishes. This
"wait" is what is adding the latency.

> the idea that compositing is somehow slower is also very misleading... how
> exactly is a stacked renderer faster?

Yes, compositing is slower because when an application wants to draw its
window needs to send the image to the compositor (or acquire the handle to the
backing texture for the window that the compositor uses or whatever, that is
implementation details) and then the compositor will at some point later draw
it together will all the other windows (almost all compositors are V-Synced,
meaning that you will see at most 60Hz updates - or whatever refresh rate your
monitor is running at). This adds a very noticeable delay between the program
needing to update itself and the update being visible to the user.

On the other hand, without a compositor and assuming a window system based on
clipping regions (like X11 and Windows without DWM) with direct to frontbuffer
drawing, the application will ask the window system to prepare for drawing in
the window (which usually means the window system will setup the clipping to
be inside the window's visible area), then perform the drawing directly on the
framebuffer and notify the window system that it is done (so that the clipping
stuff can go away). Notice how nothing here waits on anything else, like
v-sync (or any other interval) and how this totally ignores other windows -
each window draws itself immediately when needed instead of having to
orchestrate an update for all windows on the screen.

Of course with the latter approach you do get tearing since you can have
windows drawing themselves during a monitor refresh, but if that is a problem
or not is up to the user. Personally i never care about tearing to the point
that i barely notice it, yet i immediately notice any sort of V-Sync or
compositor induced lag so i always try to avoid these.

------
baxuz
I also noticed that there's a lot more compositing latency on MacOS

~~~
acdha
How did you measure this? That's contrary to what I've seen using
[https://itunes.apple.com/us/app/is-it-
snappy/id1219667593](https://itunes.apple.com/us/app/is-it-
snappy/id1219667593).

------
outworlder
The compositor adds some latency, yes.

It also removes latency for everything else. We no longer need to suffer when
apps are slow to redraw their content whenever we move windows around.

------
throwaway613834
Does anybody know if DirectX and/or OpenGL also suffer from the latency? For
example, would a Windows XP VM in Windows 10 have the one-frame lag?

------
oskenso
Gnome 3 is the worst at this, especially in wayland. Moving your mouse while
the compositor is busy causes input to be lost as well

------
arca_vorago
"use an operating system that has a stacking window manager"

Exactly, one more reason gnu+linux is the superior os.

------
rekshaw
As the author mentions, i3wm ftw. Seriously check it out.

------
antouank
i3wm is indeed amazing.

Good article. Just turned the dwm off at the win7 machine at work, and it's
like a free hardware upgrade! Everything is more responsive.

~~~
rasz
placebo, and as a bonus enjoy tearing videos.

------
pmarreck
100hz UWQHD monitor for work, worked for me

------
baybal2
The solution is simple - triple buffering

~~~
bwat49
afaik dwm does use triple buffering

------
fourthark
I am so glad I don't notice this.

------
dingo_bat
The data speaks for itself but I'm trying to perceive any latency in my
Firefox (win10) and unable to. Typing seems instantaneous to me. I hate the
milliseconds the start menu takes to animate though. Win7 start was instant.

~~~
malbertife
What I hate even more is when I press the Windows key and start typing an
application's name into the Start menu textbox, and it misses the first two or
three keypresses.

~~~
ben-schaaf
As a comparison point, gnome 3 does the right thing and buffers the input
until it can handle it. On an old machine I usually have the full name typed
out and enter pressed before any animation actually starts being displayed.

~~~
phkahler
How about they disable animation when windows are activated by keyboard? Or
just disable it entirely ;-)

