First, it's possible to design an app around the compositor. Instead of sending a frame to the system, send a tree of layers. When updating the content or scrolling, just send a small delta to that tree. Further, instead of sending a single frame, send a small (~100ms) slice of time and attach animations so the motion can be silky smooth. In my experiments so far (which are not quite ready to be made public but hopefully soon), this gets you window resizing that almost exactly tracks the mouse (as opposing to lagging at least one frame behind), low latency, and excellent power usage.
Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface. These already exist on mobile, where power is important, but making it work for the desktop is challenging. When that happens, you get your frame back.
So I think the answer is to move forward, embrace the compositor, and solve the engineering challenges, rather than move backwards, even though adding the compositor did regress latency metrics.
Edit: here's the talk I was referencing that mentions overlays: https://www.youtube.com/watch?v=E3wTajGZOsA
> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.
This seems like the best solution to me. It allows apps to paint content intelligently to prevent overdraw while avoiding any latency.
I would jump at the chance to make that trade. My 8 year old CPU is rarely taxed by normal usage. What's the point in having a faster computer if it feels slower?
For modern GUI frameworks that were designed for GPU, like MS WPF/XAML or Mozilla WebRender, overdraw just use some GPU resources, but that’s it. GPUs are designed for much more complex scenes. In-game stuff like particles, lights, volumetric effects, and foliage involve tons of overdraw. In comparison, a 2D GUI with overdraw is very light workload for the hardware.
Do you have a link or more detail about 'already cramped pipelines'?
Source: I've been working on GPU 2D rendering full time for years now.
Most common situation is the GPU isn't taxed at all, though, and doesn't even leave idle clocks. We end up just being limited by GL draw call overhead.
Which API does one use to do this?
I'm slightly surprised that hardware overlays aren't already a feature, especially given that they're handled by the graphics card. I know there's a special API for video overlay, especially DRM video (where part of the requirement is that the system doesn't allow it near the compositor where it could be screenshotted). Can you do video "genlock" on Windows? (edit: https://msdn.microsoft.com/en-us/library/windows/desktop/dd7... )
I'm also wondering how VR/AR stuff handles this on Windows.
It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.
There's a bunch of stuff in the interface to support video and also integrated 3D content ("create swapchain for composition"), but I don't know how well it works. In my experiments and reading through the Chromium bug tracker, Microsoft's implementation of all this stuff is far from perfect, and it's hard, for example, to completely avoid artifacts on resizing.
All hardware does some overlays, though not a large number. (You can see Intel GPU overlay setup at .) One scanout overlay is already in use on all major OS's, to draw the mouse cursor.
A new character won't tricker a draw call though. Rather the three characters will queue up, added to an internal buffer and when the next draw is due, they will all three be drawn.
My reason being, I had adjusted the screen brightness using software - but the mouse cursor is still brighter than white anywhere else. When I dragged a window, the cursor turns dim.
Yeah, because there isn't enough latency already.
So they are copying the Xerox Alto.
Some monitors have tons of input lag (60-70ms) and that's the time it takes for you to see what you're typing to reach the display. This also includes the time it takes to see you move your mouse cursor too.
I did a huge write up on picking a good monitor for development which can be found at:
I ended up buying the Dell UltraSharp U2515H http://amzn.to/2jF3WHp. It has very low input lag (10-15ms) and it runs natively at 2560x1440.
Of course, these tests are always performed with all built-in image processing disabled, those can easily add 2 frames of delay on modern televisions. MPEG deblocking and noise removal are performed by most TV brands to improve image quality at a cost of latency.
So, it's total response times from key press on a game controller to screen that is important not individual components. Sure, 0.04 seconds on it's own is not a big deal, but when 5 or six things take 0.04 seconds you hit noticeable delays.
(joking, but with a tiny serious element)
Even a simple 1920x1080 60Hz monitor needs 3 Gb/s, a 2560*1440 144Hz monitor 13 Gb/s, a 4k 60Hz 12 Gb/s.
DisplayPort 1.2 offers 17 Gb/s and DisplayPort 1.3 offers 26 Gb/s.
And there are high throughput devices like LIDARs (eg Velodyne LiDAR known from Google/Waymo self-driving cars) that are connected via Ethernet rather than USB.
Having 1 GBit Ethernet LAN network since 2004, I would prefer the possibility to upgrade to 10Gbit home Ethernet in near future.
Ethernet isn't optimized around short-haul signals, like computer to screen over a few metres at most, it's for 100m+ runs in datacentres. The signals have to be a lot more durable.
But Cat 6 Ethernet is 10 Gbit/s.
Really? You need more than 100gbit for 5k monitor? Because there is an 40/100gbit ethernet phy standard right now (has been for a while, since 2011). And work is being done on up to 200gbit (due to be ratified in late 2018).
Granted 40GB cards are running 50$ and 100Gbit cards are running ~700$, but that's most from the low volumes involved.
Really though the larger issue is Ethernet connectors are not designed to be plugged in and removed regularly.
Well could easily use a smaller and easier plug while still being pin compatible with ethernet.
Even HDMI is better.
¹well, one of them. The crap LG monitors we have constantly forget about the peripherals that are attached, and one easy way to fix this is to detach/reattach the peripheral. I suspect that the company is never going to RMA these devices, sadly.
Outside HDMI and some DisplayPort cables, all display connectors click or screw in
And we’re talking about desktop OS and displays
Downsides: relatively bulky, pricey, no touchscreen or stylus input. (Upside of that: matt rather than glossy screen, which I think is difficult to combine with a touchscreen.)
At this point I've been using 1 for almost a year and it's been nothing but great. I want to get a second one and orient it vertically.
He didn't, so I did for him.
Sure, it's over $1k, but if you're making $10k/mo writing code on it, then you should be able to afford it.
hah... if only the world was silicon valley.
The only thing I don't quite like is the pixel size, it's a bit small for native scaling. I kind of wish i had gone for the 27". I went with two 25" because two 27" were a bit too big for me.
The 4k resolution and scaling can cause issues with some apps as you noted. This is especially true if you use multiple panels with heterogeneous resolutions, connect with displayport and occasionally turn one monitor off. This causes windows to move everything to the other display which is a mess. However after futzing with it for a few days post-purchase I had it all working well enough to deal with, even though I occasionally get slack in an over-maxed window on the smaller display. Btw this program is really useful for smoothing mouse movement between the larger and smaller displays in windows 10: https://github.com/mgth/LittleBigMouse
At 4k 150% scaling you end up with the same screen space as 2560x1440.
They've released the UP2718Q at a serious price difference. I'd like to see one in person, but a 3x price increase requires a lot of improvement over an already great monitor.
With glasses (20/20) I can comfortably read text from up to 3 feet away from the 25" version at 1:1 scaling. Even tiny text like the HN text field box.
Which is why its hilarious whenever I ask around for recommendations, especially on Reddit, everyone tries to peg it as a non issue. As if your head never moves and you only have one monitor.
Pro: Cheap, low latency
Con: Everything else
> I used my own hacky measurement program written in C++ that sends a keypress to the application to be benchmarked. Then it waits until the character appears on screen. Virtual keypresses were sent with WinAPIs SendInput and pixels copied off screen with BitBlt.
So, this is measuring some rather artificial and fairly uninteresting time.
You really do need to measure the complete stack here, especially if your theory is that issues like vsync are at stake, because the level at which vsync happens can vary, and because there are interactions here that may matter. E.g. if there are 100 reasons to wait for vsync, and you remove one of them... you're still going to wait for vsync. It's not 100% clear that this measurement actually corresponds to anything real. Also, note that a compositor need not necessarily do anything on the CPU, so by trying read back the composited image, you may inadvertently be triggering some kind of unnecessary (or OS-dependent sync). E.g. it's conceivable that regardless of tearing on screen you want the read-back to work "transactionally", so you might imagine that a read requires additional synchronization that mere rendering might not.
And of course: all this is really complicated, there are many moving part's we're bound to overlook. It's just common sense to try to measure something that matters, to avoid whole classes of systemic error.
Ideally, you'd measure from real keypress upto visible light; but at the very least you'd want to measure from some software signal such as SendInput up to an hdmi output (...and using the same hardware and as similar as possible software). Because at the very least that captures the whole output stack, which he's interested in.
Another advantage of a whole-stack measurement is that it puts things into perspective: say the additional latency is 8ms; then it's probably relevant to know at least roughly how large the latency overall is.
I built a browser latency benchmark based on a similar method: https://google.github.io/latency-benchmark/
I did plenty of testing and found that the measurements obtained this way do correlate with true hardware based latency measurement.
All that said, it is a travesty that modern OSes and hardware platforms do not provide the appropriate APIs to measure latency accurately. A lot of what is known about latency at low levels of the stack is thrown out before you get to the APIs available to applications.
On the same os +drivers I'd be willing to believe that this measure is almost certainly useful (even there, it's not 100%). But it's exactly the kind of thing that a different way of, say... compositing... might the implementation to work a little differently, such that you're comparing apples to oranges. If BitBlt simply gets access a little earlier or later in the same pipeline and hey presto: you've got a difference in software that is meaningless in reality.
This, I think. When measuring latencies for e.g. reaction time experiments for neurophysiology I'd always use a photodiode or similar to figure out when things actually get on the screen as that's the only thing which matters. IIRC this was always with V-Sync on, but still with custom DirectX applications it was impossible to get lower than 2 frames of latency on Windows 7. So it has always been a bit mistery to me how these articles talk about latency of keyboard to screen of < 1 actual frame. Maybe this explains it, i.e. they're not measuring actual latency from keyboard to screen? I just don't know enough about graphics but even with V-Sync off I'm not sure you can just tell the GPU 'hey here's a pixel now put it on screen right now'?
My Amiga A1200 (Only have extra RAM) feels faster and more responsive that any modern computer with Windows or GNU/Linux desktop.
Layers of abstraction make complex tasks more reachable by a larger number of programmers by reducing the amount of specialist knowledge about those lower layers required to do the job. The more layers of abstraction you have, the less they have to worry about lower levels and can just get on with what they want to do.
Sometimes this is certainly true, but I think when it comes to abstraction layers, people tend to overestimate the benefits and underestimate the costs. Lately I've been thinking a lot about Joel's Law of Leaky Abstractions.
20 years ago we had native UI toolkits that were fast and responsive, relatively simple to program against, and yielded a common design language that was shared across all the apps on a given OS. Now we have monstrosities like Electron that are bulky and slow, yield non-native UIs across the board, and require programmers to understand a whole mess of technologies and frameworks to use effectively.
I mean, sure, now you don't have to rewrite your web code to build a desktop app, but don't get me started on the utter quagmire that is modern web development. These days it feels like software development has an infection, and instead of carving out the infection and letting it heal, we just keep piling more and more bandaids on top of it.
It wouldn't be so bad if the many layers of abstractions actually resulted in increased productivity, but they clearly don't.
Simplicity >> abstractions.
If true, it must be something very wrong with traditional ui systems?
Edit: my apologies, I didn’t read the article first and thought it measured complete feedback like “press ctrl-f and wait for element to popup”, but I’m interested in my question regarding games anyway.
The difference when it comes to why games can be snappier is that games are "allowed" to bypass most of the layers and cruft that exists between the user and the hardware, including the compositor that the linked article is talking about (in Windows at least).
Fortunately in Linux with Xorg you can get stuff on screen as fast as the code can ask for it, as long as you are not using a compositor (so you can even play games in a window with no additional lag!).
Hopefully the Wayland mania wont get to kill Xorg since yet another issue Wayland has and X11 doesn't is that with Wayland you are forced in a composition model.
Think about the actual operations that are involved. You certainly never want to render directly into the front-buffer (you'd end up scanning out partially rendered scenes). So you render to a back buffer. Which you then blit to the front buffer (assuming you're in windowed mode; in full-screen mode the compositor goes out of the way anyway and lets you just flip).
The only difference between the various modes of operation is who does that final blit. In plain X, it's the X server. In Wayland, it's the Wayland compositor. In X with a compositor, it's the compositor.
Now granted, some compositors might be silly and re-composite the whole screen each frame, but clearly that can be avoided in most cases.
Depending on the setup, there can also be some issues with the scheduling of the game's jobs vs. the compositor's jobs on the GPU. Valve are working on this currently for VR, since the problem is much more noticeable there -- clearly it can be fixed one way or another (on Radeon GPUs you could do a blit via asynchronous compute if need be, for example), but note that compositing actually doesn't change this issue (since the X server's jobs also need to compete with the game's jobs for scheduling time).
So if compositing has worse latency, it's because nobody has cared enough to polish the critical paths. Conversely, compositing clearly does have advantages in overall image quality. So why not fix the (entirely fixable) technical problems with compositing?
A major source for the compositor latency (or actually, the increased response time you get with a compositor) is that the "render to back buffer" (ie. compositor's texture, at the best case) and the "blit to the front buffer" (which happens by the compositor by drawing the window geometry) do not happen at the same time.
From a technical perspective it is perfectly possible for a compositor to create a tight integration between a program and the compositor itself: simply synchronize the program's updates with the compositor updates. Every time the program says "i'm done drawing" (either via an explicit notification to the compositor, via glXSwapBuffers or whatever), issue a screen update.
The problem however here is the compositor has to take into account multiple windows from multiple programs so you cannot have a single window dictating the compositor updates. Imagine for example two windows with animations running, one at 100fps and another at 130fps. Depending which window is active (assuming that the compositor syncs itself with the active window), it would affect the perception of other window's updates (since what the user will see will be at the rate of the foreground window's update rate). Moreover, beyond just the perception, it will also affect the other windows' animation loops themselves - if a background window finishes drawing itself and notifies the compositor while the compositor is in the middle of an update, the background window will have to wait until the update is finished - thus having the foreground window indirectly also affect the animation loops of the background windows. This can be avoided through triple buffering, but that introduces an extra frame of latency - at least for background windows.
So to avoid the above problems, what all compositors do is to decouple window update notifications from screen updates and instead perform the screen updates at some predefined interval - usually the monitor refresh rate and they do the updates synchronized to it. However that creates the increased response time you get with the compositor being on a few milliseconds behind the user's actions, with the most common example would be window manipulation like resizing and moving windows lagging behind the mouse cursor (which is drawn by the GPU directly, thus bypassing the compositor).
Hence the linked article recommending a 144Hz monitor to avoid this, although this is just a workaround that makes the problem less visible but doesn't really solve it.
Your example of 100fps vs. 130fps on the same screen is inherently unsolvable in a proper way with anything less than a 1300fps display. So you have a bunch of tradeoffs, and I'm sorry to say that if the tradeoff you prefer is tearing, you're in the losing minority by far.
That said, if you truly wanted to write a tearing Wayland compositor, you could easily do so, and in any case plain X is still going to work as well.
As for the compositor, it isn't impossible to create a Wayland "compositor" that draws directly on the front buffer either, it is just harder and pointless since Xorg exists :-P.
But yeah, if everyone abandons Xorg (and by everyone i mean Everyone, not just the popular kids) and nobody forks it (which i doubt it'll happen as there are a ton of people who dislike Wayland) and nobody else steps up to do something about it, then yeah, i'll most likely just make my own suckless Wayland compositor. I'd prefer the world to stay sane though so i can continue doing other stuff :-P.
Modern UI toolkits (WPF, QML, JavaFX) operate on a scene graph, so they work exactly the same. Android is slowly catching up; it's a disgusting mix of the worst of both worlds.
I mean christ, windows has no support for remote windowing in it's API and still has RDP, even for single windows.
Linux people just love to complain about any change that actually makes the system better because then they can't feel quite as elitist for using it.
Yes, but when you consider that the underlying "average" hardware of 2017 has a million times as many transistors and runs a thousand times as fast as the Amiga that seems less impressive.
Edit: I also like how OSX go-fullscreen animation is done. They render new window once (with e.g. lots of text) completely in background and simply scale old window to fullscreen with alpha-transition between the two. First few frames give enough time for new window to be rendered and then it magically appears as being live-resized. I suspect few users actually notice the trick.
We saw this particularly strongly with Android - circa 2011, Google realised that Android latency was having a major impact on UX, so they invested the resources to address it. Unfortunately, early architectural decisions meant that they never quite caught up with Apple, who had prioritised latency in iOS from the outset.
I've got a Blizzard 1230-IV with 128mb of RAM bunged into my A1200 at the moment and it runs rings around my desktop running Gnome.
I believe Wayland has made some latency improvements.
The main problem is that modern GPU rendering is “pipelined”, so it's entirely possible to have a drawing operation that takes 16ms and a compositing operation that also takes 16ms, and still have your application running at 60FPS, albeit 1 frame "behind" the input. Most developers are not aware of that. (Including me, until recently. I learned about this while trying to figure out why my VR application felt "wobbly", despite running at the recommended 90FPS) The HTC Vive ships with a really neat tool for visualizing that: http://i.imgur.com/vqp01xn.png
I give a few more details here:
I guess a valid approach would be to double buffer all windows on the compositor side, and render the "stale" buffer of any window that fails to update within a specified frame budget (16ms - expected compositing time), that way at least well behaved apps would have no noticeable latency. There would probably need to be some level of coordination with apps that already do their own double buffering, not sure how that's currently handled. Perhaps a hybrid approach between compositing and direct rendering is also possible, where different screen regions get composited at different frame rates. (Should work as long as there's no transparency involved)
Compositing is a good thing, and in the vast majority of cases its latency isn't actually intrinsically higher than writing directly to the front buffer. Certainly its intrinsic latency is never higher than writing directly to the front buffer if you build a system without visual artifacts. (Because at the end of the day, all compositing does is shift around who does the job of putting things on the front buffer; the jobs themselves stay the same for all practical purposes.)
I even force vsync off system-wide where possible (that is in Windows, in Linux i haven't seen such an option and even in Windows DWM ignores the setting).
A more interesting comparison is BeOS: it had memory protection and it was (probably still is) way more responsive than Linux and Windows.
Thad said with a SSD a computer responsiveness is good enough and if display latency bother you, buy a monitor with an high refresh rate!
You'll have a fix much sooner than waiting for a software fix of the issue..
It is a full stack real world result - for comparison purposes it makes sense to measure only the software as in the article, but in reality you want to optimize everything. Especially screens can be quite bad - tens of milliseconds, up to 100 in the worst. USB lag is usually quite low - when I measured it once for low-latency serial comm it was usually < 2 ms.
Modern keyboards are another part of the problem. They can also take their sweet time since keypress until packet appears at the USB bus. See https://danluu.com/keyboard-latency/
I spent a fair amount of time trying to find a keyboard to work on a device that I have that only works with high speed devices, 30 or 40 keyboards later I gave up... If someone actually knows of such a thing I would be interested. Same basic problem with mice.. I guess the thought-process is that hey USB2 supports split transactions, and the keyboard/mouse won't actually generate even 1.5M bit of data, so we are going to continue to sell the same basic mouse/keyboard interfaces we were selling 20 years ago wrapped in styling variations.
PS: Some of the physical button keyboards I found with configurable colors/etc, usb hubs, do support USB3... For the color controls, or hub. The keyboard endpoint is still at low speed...
SPI can be way faster than that I believe.
I don't claim that 15ms of lag is to be considered good, but the problem that one has to solve is debouncing the switches.
The only time there should be lag is when a very short keypress happens, the key up might be delayed while the controller rules out bounce.
This methodology alone could account for the differences in timing between Win7 and Win10. For all we know, Win10 could just be slower at getting the pixels back to the program from BitBlt, or SendInput could be slower triggering events, or a multitude of other issues.
The best way to truly detect key-to-photon latency is with an external video recorder that has both the screen and keyboard in frame. Grant a few ms of noise for key travel distance.
I would also be curious to compare a D2D application versus a GDI application; as the majority of the work has gone to D2D in the last few years. Please note that D2D application in this case means one using a swap chain and device not an ID2DHWNDRenderTarget (this rasterizes and composites on the GPU but has GDI compatibility built in).
- Compositing is done on the GPU
- BitBlt is done on the CPU
- Copies from GPU -> CPU are slow
So, yeah, compositing adds a frame of VSync latency, but these measurements are complete bunkum.
In my experience the vast majority of games that I play don't suffer any perceptible ill effect from running in borderless fullscreen. If you're playing the game competitively it's another story, but for me running around shooting aliens in Destiny 2 or something I haven't noticed any degradation of my experience. There's of course the odd (usually older) game that doesn't support borderless fullscreen, but sometimes there are mods to support it.
On a related note, I have noticed that Outlook 2013 exhibits a notable lag between a keystroke and a character appearing in the message window. I have not done any measurements, but my best guess is that it is in the order of hundreds of milliseconds. If you type fast (I like to think that I do), Outlook can keep up throughput-wise, but this lag is terribly annoying.
Try switching to text-only mails, no zoom... and if you must write HTML mails, do not have an image that is larger size than the window. As soon as there is an image that doesn't fit into the window at 100% zoom Outlook begins to crawl.
These days, I do a lot more programming than sysadmin'ning and help desk, but when I was the IT support guy at our company, my overall impression of Office 2013 was not very good. I have seen it just stop working on a handful of computers (out of about 75-80, so to that is a lot), in such a way that I could only "fix" it by uninstalling and reinstalling Office from scratch. On one of our CAD workstations, Outlook and Autodesk Inventor started a feud where an update to MS Office caused Inventor to crash, and the subsequent reinstallation of Inventor caused Outlook to crash when we tried to write an email. (Then we reinstalled Office, and then suddenly things worked magically, so I remain clueless as to what happened.) The latter may be Autodesk's fault as much as Microsoft's (I get the vague impression that they care even less about their software crashing than Microsoft, as long as the license is paid for). But the impression I get is that MS Office has suffered quite a bit over the years. Therefore I am not entirely unhappy about being stuck on Office 2007. I do miss OneNote, the one program from their Office suite I really like, but I have org-mode, so I can manage. ;-)
EDIT: Sorry for venting, that one has been building up for a long time.
I've also been researching about removing the triple buffer vsync on W10. It seems it was possible in the first builds by replacing some system files, but that option is gone now with the recent big releases.
As of that, I do not see the real reason why compositing would be needed on W10, as transparency and etc arent important factors.
This because using it feels like scrolling through molasses for whatever reason.
Every additional option (especially in the realm of video settings) opens the door for additional complexity, implementation error, and user error in unintentionally setting the undesired mode. It's perfectly understandable why window managers would settle on one or the other of two extremely different render-to-screen approaches, especially when general consensus for quite some time now in the graphics space has been that minimizing the potential for tearing is preferable.
An extract from the linked article:
> A major source of latency is key travel time. It’s not a coincidence that the quickest keyboard measured also has the shortest key travel distance by a large margin.
They're not measuring from when the signal is sent from the keyboard, they're measuring from when the force begins to apply on the key. If you have a clicky or tactile switch (Cherry MX Blues, Greens, Browns, Clears, etc) then the latency measured here will be way disproportionate to how it actually /feels/.
In Windows 7, classic mode disabled the DWM compositor and V-Sync. It's incredibly dumb that Microsoft would arbitrarily remove that feature in Windows 10 to push their ugly as sin post-metro UI.
Firefox's servo engine can compose CSS elements/Display List together at 500 frames / second.
Maybe next version of Windows / Linux desktop should use FF's servo engine?
You could potentially reduce this delay as well by having the application and the compositor in communication. Since rendering is going to be synced to vblank if you can get the application to not try to sync as well and instead just notify the compositor when it is done drawing a frame you could potentially get the application drawing and the compositor drawing in the same vblank interval. This is what Wayland and DRI3 Present let you do in the Linux world, I assume Windows has something similar but you'd need to opt-in to it so I bet nothing uses it.
font-family: Verdana, Arial, Helvetica, sans-serif;
Yes I'm on mobile. My OnePlus 5 hides the first one or two pixels under the bezel if looking at it straight on, so the first character on each line gets a little cutoff. Not sure if this is just my model or if other phones do this also.
Also, Firefox (and Safari on iOS) should have "view text-optimised version" button in the URL, maybe that would help you here? I don't know if other browsers have it though.
I mean is there really a noticeable difference between say 20 and 40 ms?
> At least I can feel the difference when typing.
This looks way more like badly designed animations than some fundamental problem coming from the hardware.
The 'Fast Sync' option NVIDIA added to their drivers in the last year or two is a fix for the triple buffering problem - you get spare buffers, but instead of adding a frame of latency the GPU always grabs the most recently completed frame for scanout. Of course, if a compositor is involved you now need the compositor to do this, and then for the compositor to utilize this feature when presenting a composited desktop to the GPU. I don't think any modern compositor does this at present.
I fill that as about as likely as hardcoding the frames of animations, and displaying them all whatever device frequency you have.
Comparing them on the same hardware?
I have a transwarped IIGS, which for CPU benchmarks is slower than most of the modern emulators, but on the actual hardware its pretty amazing (particularly since it boots faster from a compaq flash card than most win10 PC's i've seen). I would guess that a USB keyboard->windows->emulator->app response->draw->Emulator->windows->gpu path is much, much longer than the IIGS keyboard poll rate draw cycle even given the 1000 cycle/sec advantage a modern PC has.
ps: every time I boot an old desktop I always have this feeling. The new stacks are amazing, they do it all, but .. I love the immediate feeling of the old ones, even at some cost. And this comes from a compositing fetishist.