I'll spend some time looking into iTerm2's latency. I'm sure there are some low-hanging fruit here. But there have also been a handful of complaints that latency was too low—when you hit return at the shell prompt, the next frame drawn should include the next shell prompt, not the cursor on the next line before the new shell prompt has been read. So it's tricky to get right, especially considering how slow macOS's text drawing is.
If I could draw a whole frame in a reasonable amount of time, this problem would be much easier! But I can't. Using Core Text, it can easily take over 150ms to draw a single frame for a 4k display on a 2015 macbook pro. The deprecated core graphics API is significantly faster, but it does a not-so-great job at anything but ASCII text, doesn't support ligatures, etc.
Using layers helps on some machines and hurts on others. You also lose the ability to blur the contents behind the window, which is very popular. It also introduces a lot of bugs—layers on macOS are not as fully baked as they are on iOS. So this doesn't seem like a productive avenue.
How is Terminal.app as fast as it is? I don't know for sure. I do know that they ditched NSScrollView. They glued some NSScrollers onto a custom NSView subclass and (presumably) copy-pasted a bunch of scrolling inertia logic into their own code. AFAICT that's the main difference between Terminal and iTerm2, but it's just not feasible for a third-party developer to do.
Holy cow! I wonder if iTerm2 would benefit from using something like pathfinder for text rendering. I mean, web browsers are able to render huge quantities of (complex, non-ASCII, with weird fonts) text in much less than 150ms on OS X somehow; how do they manage it? Pathfinder is part of the answer for how Servo does it, apparently.
I'm guessing that you're somehow preventing Core Text from taking advantage of its caching. Apple's APIs can be a bit fussy about their internal caches; for example, the glyph cache is (or at least used to be) a global lock, so if you tried to rasterize glyphs on multiple threads you would get stalls. Try to reuse Core Text and font objects as much as possible. Also check to make sure you aren't copying bitmaps around needlessly; copying around 3840x2160 RGBA buffers on the CPU is not fast. :)
For a terminal, all you really need to do for fast performance is to cache glyphs and make sure you don't get bogged down in the shaper. Pathfinder would help when the cache is cold, but on a terminal the cache hit rate will be 99% unless you're dealing with CJK or similar. There's a lot lower hanging fruit than adopting Pathfinder, which is undergoing a major rewrite anyway.
(I'm the author of Pathfinder.)
I've confirmed that I always use the same NSFont in the attributed string used to create the CTLineRef.
If you're curious, the text drawing code is in drawTextOnlyAttributedStringWithoutUnderline:atPoint:positions:backgroundColor:graphicsContext:smear: and is located here: https://github.com/gnachman/iTerm2/blob/master/sources/iTerm...
If you're able to spare some cycles, please get in touch with me. firstname.lastname@example.org.
"(I'm the author of Pathfinder.)"
I keep thinking that sometime soon the magic of HN will wear off, or we will hit "peak hackernews" or something like that.
This draws the curves directly in the pixel shader.
TextEdit - http://i.imgur.com/RIDBuKP.png
Xcode - http://i.imgur.com/PYFLOxH.png
SublimeText - http://i.imgur.com/ZhnQR1v.png
CotEditor - http://i.imgur.com/J9TPiO6.png
I switched to iTerm2 recently after repeated Terminal.app slowdowns and crashes, and have no issues doing the exact same things I did in the previous application. Good work.
Wow, what were you doing? I think I've seen maybe one Terminal.app crash in the past few years, and I'm a very heavy terminal user.
I have no idea what the cause of the instability was but I couldn't fix it and couldn't tolerate it.
With regards to the shell prompt, I'm almost positive that Terminal.app is using some heuristic to read the prompt after a <return>. A little experiment with a tunable spinloop and a fake prompt in C suggests that Terminal.app waits for about 1ms after a return character before updating the screen: a delay of 950us produces an "instant" prompt, while a delay of 1050us shows the cursor at the start of the next line. As the article notes, a 1ms delay is not really noticeable, and that kind of delay only has to happen in a handful of situations.
There is a lot of reasons, mostly: font rendering, unicode handling, all effects, while pure text is simply copying memory areas and nothing more.
You only need to be as fast as one screen refresh
period, 1/60th of a second usually, plenty of
cycles on modern cpus. Many games that are very
complex graphically do it without problem
Rendering 60fps (or even 1,000fps) is not remotely the same as achieving low latency.
Even if a game is rendering at 60fps, input latency (the time between a user clicking a button and something happening on the screen) is often over 100ms.
This article breaks it down... there's a link to it in the original story about terminal latency posted by OP:
Your reply also reads like games are getting away with introducing 100s of milliseconds of additional latency, which just isn't the case for the most part. There is hardware and kernel buffering that userland software can do little about, admittedly, that can easily add up to getting you above 100ms as you say. Even there, some improvements can be made:
Gsync reduces latency by getting rendered frames displayed as quickly as possible, by tying refresh rates to frame rates. "Time warp" reprojects rendered scenes shortly before refresh to help reduce latency between head movements and screen movements in VR - but this is also effectively "improving" throughput and worst case latency, by ensuring there's always a new head-rotated frame - even if the main scene renderer wasn't able to provide a full scene update yet. There's some talk of going back to single buffering and chasing the beam, although I'm skeptical if that'll actually happen for anything but the most niche applications.
Moreover, text mode display hardware does not absolve one of Unicode handling.
Will you be releasing beta version or users to test or any updates will go to the main release version?
On a related note, I am big into latency analysis and driving down latency in interactive systems. I'm quite familiar with the touchscreen work cited at the top, and having played with the system I can attest that <1ms latency feels actually magical. At that level, it really doesn't feel like any touchscreen you've ever used - it genuinely feels like a physical object you're dragging around (the first demo of the system only let you drag a little rectangle around a projected screen). It's amazing what they had to do to get the latency down - a custom DLP projector with hacked firmware that could only display a little square at a specified position at thousands of FPS, a custom touchscreen controller, and a direct line between the two. No OS, no windowing system, nada. After seeing that demo, I can't help but believe that latency is the one thing that will make or break virtual reality - the one thing that separates "virtual" from "reality". I want to build a demo someday that does the latency trick in VR - a custom rig that displays ultra-simple geometry that has sub-millisecond latency to human head movement. I will bet that even simple geometry will feel more realistic than the most complex scene at 90 FPS.
On the one hand, I have to agree that Terminal.app is quite good and very impressive. I don't bother with a third party terminal application and I do everything in the terminal.
However, one of the very valuable things about working in the terminal is the safety and immunity that it provides. No matter what bizarro virus attachment you send me, I can text edit it in a terminal without risk. There's nothing you can paste to me in irc that will infect or crash my computer.
Or at least, that's how it should be.
But the trickier we get with the terminal - the more things it does "out of band" and the more it "understands" the content of the text that it is rendering, the more of this safety we give up.
Frankly, it bothers me greatly that the terminal would have any idea whatsoever what text editor I am running or that I am running a text editor at all. It bothers me even more to think that I could copy or paste text and get results that were anything other than those characters ...
Make terminals fancy at your peril ...
I'm not sure what you mean by this. Terminal.app doesn't know that you're running a text editor. It does know that you're running a process called 'vim', which is kind of magic, but not too much (ps has always been able to show what processes are attached to a given tty, for example). If you're referring to the parent comment's "full mouse support for apps like vim", they just mean it supports "mouse reporting" control sequences, which date back to xterm. If anything, Terminal.app is late to support this (only in the latest release, whereas alternatives like iTerm have supported it for ages).
> It bothers me even more to think that I could copy or paste text and get results that were anything other than those characters ...
Well, the terminal has to interpret escape sequences for colors and such in order to display them, so why shouldn't it also preserve that metadata when copying and pasting? Like any other rich text copy+paste, it will only be kept if you paste into a rich text field; discarded if you paste into a plaintext field.
That said, there are a few 'odd' things Terminal.app supports: e.g. printf '\e[2t' to minimize the window. (This also comes from xterm.)
I'm afraid there aren't many of us that care about the subtle details of interaction that mean the most to one's experience. Working in audio, I know the difference between hitting a button and hearing a sound 40ms afterwards, and 4ms afterwords. I would much prefer to use the 4ms, even if it means sacrificing half of the system's features.
I feel like such a product will never reach the market, because the market will think they need lots of features, which results in a sacrifice of latency and other UI consistencies. There's always some developer writing the weakest link of an otherwise perfect system. For example, the CPU/GPU hardware, kernel, and browser's accelerated rendering are all engineered with millions of man-hours to be as blazingly fast as possible, and then a web developer comes along and puts a single setInterval() call in their online game or something, and all the optimization benefit goes to the trash. Or, because animation is a trend in UI design right now, developers purposely put in hundreds of milliseconds of delay between common actions like minimizing windows, switching desktops, scrolling, opening/closing apps on mobile, etc.
Basically in order for your dream of true virtual reality to be achieved, the principles and respect for low latency has to be maintained across the whole system's stack, especially the higher-level parts.
I've dabbled in music, and I can fully agree with the audio latency comment. Human hearing is exquisitely tuned for latency (probably since it's integral to direction-finding), so even the slightest delay is noticeable. Hitting a note in an orchestra even a few ms late makes you stick out like a sore thumb.
iTerm2 is vastly better on mac, but still far inferior to gnome-terminal.
And: what's with "disk contention"? How do terminals do that and what's relevant to the discussion?
At the end of the day, there is a trade off to be made. Terminals (or any program, really) can have 1-frame input latency (typically 1/60sec) and give up v-sync and tearing results, or they can have a worst-case 2-frame input latency with v-sync, and then you're looking at 2/60sec or ~32ms.
The way I understand it tripple buffering adds latency in exchange for higher sub display hz framerates.
Double buffering renders the next frame while displaying the current.
That results in latency of 1 frame since input.
Triple buffering adds another frame to the queue, resulting in a 2 frame lag.
With double buffering the framerate gets cut in half if it cannot meet vsync, with triple buffering it can also get cut in thirds. So double buffering is 60 -> 30, where the frame lasts 2 refreshes. Triple is 60 -> 40, where one frame is displayed for 1 refresh and another is displayed for 2.
Nowadays it's probably better to use adaptive vsync, which simply disables vsync when the framerate drops. This will reintroduce tearing, which might be preferable in fast action games.
"In triple buffering the program has two back buffers and can immediately start drawing in the one that is not involved in such copying. The third buffer, the front buffer, is read by the graphics card to display the image on the monitor. Once the image has been sent to the monitor, the front buffer is flipped with (or copied from) the back buffer holding the most recent complete image. Since one of the back buffers is always complete, the graphics card never has to wait for the software to complete. Consequently, the software and the graphics card are completely independent and can run at their own pace. Finally, the displayed image was started without waiting for synchronization and thus with minimum lag.
Due to the software algorithm not having to poll the graphics hardware for monitor refresh events, the algorithm is free to run as fast as possible. This can mean that several drawings that are never displayed are written to the back buffers. Nvidia has implemented this method under the name "Fast sync"."
Triple buffering uses more display buffer memory, but roughly the same cpu/gpu load as vsync-off. It's great for latency. It makes a whole lot of sense. It's been around for like 20 years. But you rarely see it used ...
I think it's funny to have the suckless project page for st go on and on about how XTerm is clunky and old and unmaintainable, but the result of this small and clean minimalist terminal is a closer loser in terminal performance, which subconsciously and consciously detracts from the experience.
XTerm has the logic for handling partial screen updates and window obscuring other terminals don't bother with because it was written in an era that these weren't mere 10-50msec delays but 100+msec delays; Anyone who used dtterm on a sun IPX knows what I'm talking about
I'm also a Terminal.app user.
> alacritty and terminal.app are fast enough that they’re actually limited by the speed of tmux.
Initial testing has shown it not to (noticeably) impact perf in our highly unscientific benchmarks.
I'm a native Windows user these days, so I can't use it quite yet, and as such, have fallen behind the time on news.
When new Mac users ask for general app recommendations, they often seem to get immediately steered away from Terminal.app and into iTerm. I'd understand this phenomenon if T.app was horrible, but it's rather good!
(I have a theory about the long shadow of Internet Explorer causing the use of stock OS apps to subconsciously feel passè)
And iTerm still has feature edges e.g. truecolor, or better multiplexer support (Terminal only supports vertical pane splitting).
I'm 90% sure that's not accurate. It's had tabs for quite a while.
I don't think that's true. I definitely had tabs in Lion (10.7) and I'm pretty sure they're at least as old as Leopard (10.5) and maybe older.
Here I don't really see what you mean since linux-st is the term-emulator with the lowest latency (only comparable to alacritty) so it looks like it's simple (though not that simple) and quite fast (on Linux).
If you double check the plots, you will notice that "linux-st" isn't performing bad at all. The author also suspects XQuartz as one reason for the higher latency, which makes absolutely sense.
It's handy for keeping "tail" or "watch" commands or similar visible — the same reasons people use tmux, tiling window managers, so on.
The UX isn't perfect, but it's useful enough that I've stuck with iTerm despite the lower performance (and bugs — it's pretty buggy, and the main author rarely seems to address Gitlab issues).
iTerm has other nice features. It can run without a title bar (saves space), it does cmd-click-to-open-file, and it has a lot of customization options. I don't really use most of the features; the tiling aspect is the main feature I rely on.
My setup is to run MacVim on the left half of the monitor and then iTerm2 on the right half. iTerm is then split into generally three horizontal splits.
I love the tmux integration, I used tmux before anyway and it's honestly not that different if you used tmux's built in mouse support but focus follows mouse in terminal panes is a nice touch.
Displaying images inline is alright I guess but I don't actually use it that much.
There's a bunch of stuff listed on their features page that sound useful but I don't actually use (yet). Idk I suppose I haven't noticed any appreciable difference in speed.
As far as I know, in terminal you can't use cmd as meta key, which immediately kills it for me as an emacs user (furthermore in iterm2 you can set it up so that left cmd = meta, right cmd = cmd, which I find very useful).
In my view, simplicity often leads to better performance as a side effect -- but of course there are many exceptions.
Nevertheless, I wouldn't start optimising software unless the software is really unusable. Optimising software to look well in rare corner cases is not a good idea imho, if the price is adding a lot of complexity.
It's a really helpful benchmark, IMO, as it's the main problem I see with different terminals. On a chromebook, most SSH clients are effectively useless because if you accidentally run a command that prints a lot of output (even just 'dmesg'), the terminal locks up for a huge amount of time, seconds or even minutes. You can't even interrupt the output quickly.
I appreciate that it's a different problem to the latency that the OP is trying to measure, but as a benchmark, it's actually very useful.
> The closest thing that I care about is the speed at which I can ^C a command when I’ve accidentally output too much to stdout, but as we’ll see when we look at actual measurements, a terminal’s ability to absorb a lot of input to stdout is only weakly related to its responsiveness to ^C.
Given how easy it is to accidentally spew something that I don't want to wait for, even if it is spewing quickly, I'm squarely with him in not caring about the speed of display. Slow it down to just faster than my eyes can make sense of it, and make ^C fast, and my life will be better.
when SSHed into a remote machine, if I run a command that spews a lot of text, how quickly does the terminal respond to ^C, stop printing text, and return me to the prompt.
Based on my findings, ^C is highly related to the speed of the output because the process running in the shell may be way ahead of the terminal's parsing/rendering. Imagine you run `cat foo`, the shell could take around 1s to send the output over to the terminal, the terminal might then take 10 seconds to parse and render the output. So after 1 second a ^C will actually do nothing because the cat call has finished. This is the case with Hyper, it hangs due to slow parsing and too much DOM interaction (as hterm was not designed for this sort of thing).
There's actually a mechanism for telling the process to pause and resume (sending XOFF/XON signals), which allows the terminal and shell to stay completely in sync (^C very responsive). However, these only really work well in bash as oh-my-zsh for example overrides the signal with a custom keybinding. Related links:
Original PR: https://github.com/sourcelair/xterm.js/pull/447
Post-PR bug: https://github.com/sourcelair/xterm.js/issues/511
If it instead takes a long time, there are probably large buffers between:
* If you're talking about ssh to a faraway machine, the TCP layer is probably responsible. (I'm not even sure if there's anything you can do about this; the buffer (aka "window size" in TCP terminology, plus the send and receive buffers on their respective ends) is meant to be at least the bandwidth-delay product, and as far as I know, the OS doesn't provide an interface to tell it you don't need a lot of bandwidth for this connection. It'd be nice if you could limit the TCP connection's bandwidth to what the terminal could sink.)
* If you're talking about something running on the machine itself, it's probably an over-large buffer inside the terminal program itself.
If this becomes popular enough, then zsh will figure out how to offer the feature that bash already does, and terminals will happily adopt it. Then everyone's lives are better! :-)
I agree totally with the speed of display updates not really being important - if my terminal is spewing hundreds of pages of text, it doesn't matter whether it's redrawing at 50fps or 5fps. My hunch is that the slowest terminals are the ones that insist upon drawing every single character of output. They then end up with a huge buffer of text that needs to be rendered even though the ^C may have been sent and the noisy program has been terminated.
I trust the author's benchmarks over your experience. (Doubly so since my experience matches the author's.)
So why not (say) buffer it up somewhere, show a preview, show a "I would be spewing output right now, press <Space> to see what the last screen is, press Ctrl-C to stop, press 's' to just spew output". The speeds and amounts of data for which this happens could be completely configurable.
You can always put a limit on how much to buffer up (I'm not saying that this system should completely buffer output files until you run out of disk space), and sometimes 'spewing' is actually what we want.
The vt220 had a 'slow scroll' speed which would buffer text and scroll it at a viewable speed, limited by the quite small memory on the screen. It also had a 'pause' key which would pause the display and then continue the output (again limited by the terminal's memory). See also PC 'scroll lock' key.
The way to tackle this is not to "buffer output up". In fact, the way to tackle this is the opposite of filling up buffers with output.
It is to decouple the terminal emulation from the rendering. Mosh runs the terminal emulator on the remote server machine, and transmits snapshots of state (using a difference algorithm for efficiency) at regular intervals over the network to the client, which renders the terminal state snapshots to the display on the local client.
Various tools can and do deal with the spew though. Ofhand, script (typescript), screen, and tmux. If you're running a session through a serial terminal emulator (e.g., minicom), that would be another instance.
I'm not going to claim these are particularly mainstream uses (though I've made use of each of them, and been grateful for the ability to do so). But they do exist, and I suspect there are others.
stdout [MB/s] idle 50 [ms]
urxvt 34.9 19.8
xterm 2.2 1.9
rxvt 4.3 7.0
aterm 6.0 7.0
konsole 13.1 13.0 note: stops moving when printing large file
terminator 9.1 29.4 note: stops moving when printing large file
st 23.0 11.2
alacritty 45.5 15.5
Side note: was supervising some kid who absolutely couldn't believe that the reason his 'workstation was locking up' was that he was catting giant logfiles in a second pane of his single-process gnome terminal.. I told him to use Xterm or something else, since they spawned one process per window, and he tried for a while, but went back, and continued to complain, because just couldn't believe that the terminal could get bogged down, and further, missed his pretty anti-aliased fonts.
Latency for input: https://github.com/pavelfatin/typometer
For me this is a great illustration of how much latency there is in the GUI. Not sure if everyone can feel it, but to me console mode is much more immediate and less "stuffy".
I'm a fast typer, but this is 2017 - computers are fast, right? Nope. On newest, maxed dell xps there is a HUGE latency difference I feel if I use pure linux console vs any graphical one (is it gnome/windows or mac).
Typing and working with pure text is really fast and you INSTANTLY feel and see the difference. Try it.
It definitely has slow throughput at catting files, e.g. if I cat the output of "seq 100000". The latency seems better though. Probably not as good as text mode.
I honestly don't know what text/console mode even is. I know there is a VGA "spec" -- I think all graphics drivers for PC-compatible devices have to support VGA and text mode? Or is it part of the BIOS?
The easiest way to turn it off is to add "nomodeset" to your kernel commandline.
The correct term, by the way, is not "text mode", but rather "a virtual console" or "virtual tty" or such.
Hopefully that gives you enough search terms to learn more.
The confusion arises because people, as you have done, erroneously conflate the kernel virtual terminals with "text mode". In fact, the norm nowadays is for kernel virtual terminals to use graphics mode. It permits a far larger glyph repertoire and more colours, for starters, as well as things like software cursors that can be a wider range of shapes and sprites for mouse pointers.
The point being that it's not measuring latency, nor throughput of the rendering, but rather throughput of the emulation.
People notice the extra latency though. I sure do. I remember what it's like to have a CRT getting photons displayed nearly immediately after a keystroke. That's exactly what makes an old 286 feel snappier than a 2017 macbook pro while typing.
I have to look into Kmscon which seems promising.
Estimating, it takes at least 500 - 1500 ms. I have no idea what's happening here... if it's an X thing, a driver thing, etc.
I don't recall it being fast on any machine I've used recently. It's at least 100x slower than it should be to be usable -- it should be around 5 to 15 ms, or even less.
Specifically I run kmscon on one virtual terminal + tmux for scrolling/tabs, then an X server running chromium on another. I'm not a web dev, but I am continuously having to switch between the two for tracking merge requests, ticket status, testing via our frontends etc.
It's not bad, but kmscon is pretty much abandoned at this point, and I don't know of anyway to have a setup like this run nicely on multiple monitors. It was meant to be just an experiment at an über-minimal setup, I was planning to switch back to my previous i3 + st based setup after a month or so, but now it's been most of a year and I'm still using it for some reason.
I think the big thing I really enjoy about this setup is the complete lack of window management. Even the minimal window management I had to do with i3 (1-2 terminals running tmux + 1-2 browser instances) is gone. It feels like that's removed a small unnoticed stress point from my work day. If I ever get round to setting up a window manager again I think I'm going to try and keep it limited to 1 terminal + 1 browser instance and rely entirely on their internal tab support.
Are you running a distro or did you build this setup yourself?
What role does kmscon play? I think you could just run raw tmux in one VT and X in another? Although to me it seems slow to switch between the two.
It seems like Linux should support the multi-monitor setup as I said in a sibling comment -- maybe I will take some time to investigate it.
Was latency once of your considerations, or was it mainly lack of window management?
If you have time a screenshot would be helpful :)
kmscon has better handling for colors, fonts, etc. than the linux console; that's the only real reason I'm using it. On my laptop's builtin display I have no delay switching between any of the linux console/kmscon/X; when I plug in an external monitor I do get ~1-2 second delay switching from the linux console/kmscon -> X, no delay the other way.
There does appear to be some bug with switching from X -> kmscon, it just shows a black screen, but I've gotten used to switching X (VT 3) -> linux console (VT 1) -> kmscon (VT 2) which seems to work around that. There's also another bug where the Ctrl key seems to get stuck down in kmscon when switching to it sometimes, has only happened ~4 times in the last 8 months and I can fix it by just running `systemctl restart kmsconvt@tty2` and attaching to my tmux session again.
Since I'm not doing any frontend changes I don't ever really need to look at both my terminal and browser at the same time so haven't taken the time into seeing if I can have different VT displaying on different monitors. I prefer the portability of a laptop over having the most productive single location setup.
Latency was not at all a consideration, it was purely an exercise in how minimal a setup I could have. I spent a couple of weeks without having X installed and using command line browsers when I needed to, but using GitLab and JIRA through command line browsers was a real pain (and if I recall correctly some stuff was impossible).
A screenshot is difficult since it's multiple VTs and I don't think kmscon has any kind of builtin screenshot support. Just imagine a full-screen terminal with tmux, you hit Ctrl-Alt-F2, now it's a full screen browser; that's basically it.
One other thing I do have setup to make life a little easier are some little aliases syncing the X clipboard and tmux paste buffer so I can copy-paste between X and my terminal. And I have DISPLAY setup in the terminal so things like xdg-open and urlview can open stuff in my web browser.
That is, the X server would only know about one monitor. But the kernel would know about both, and it could run processes connected to a TTY which writes to the second monitor. Rather than a TTY connected to an xterm connected to the X server. (I think that is the way it works)
This goes back to my question: is text mode part of the graphics driver or part of the BIOS? I assume the BIOS has no knowledge of dual monitors, but my knowledge is fuzzy there.
rxvt versions newer than 2.7.1 and, more recently, rxvt-unicode seem to have some other issues that make them really slow (particularly non-bitmap font rendering), but rxvt-unicode supports mixing bitmap and other (Terminus) fonts which seems to be a reasonable solution.
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
Loading personal and system profiles took 542ms.
I think it doesn't display if your profiles take less than 500ms to load.
EDIT: Just tested with a clean profile with the line "Start-Sleep -m xxxx" for various values of xxxx, and the message has shown up with times just above 500ms but not below.
Hyper which currently uses a fork of hterm, is in the process of moving over to xterm.js due to the feature/performance improvements we've made over the past 12 months. Hyper's 100% CPU/crash issue for example should be fixed through some clever management of the buffer and minimizing changing the DOM when the viewport will completely change on the next frame.
I'd love to see the same set of tests on Hyper after they adopt xterm.js and/or on VS Code's terminal.
Related: I'm currently in the process of reducing xterm.js' memory consumption in order to support truecolor without a big memory hit.
> on my old and now quite low-end laptop
Trust me, that's not a low-end laptop. Either that has the shittiest cpu ever and a terribly mismatched amount of memory, or the author's view of that is high-end or low-end is skewed; in either case, what's low-end nowadays would be ≤4GB RAM. 16GB is LOTS, useful for developers that run large builds and/or VM's regularly.
I very much like the rest of the article though, would love to see some latency improvements here and there!
For user-facing terminals/workstations, I would consider ≤8GB low-end ("unusable"). 16GB would be mid-low ("usable"), 32GB would be mid-high ("comfortable"), ≥64GB would be high ("good").
For servers, ≤64GB is low-end, ≥512GB being high-end.
That 8GB would be considered low-end does not mean that no one uses it, though. Some people might still rock 2GB laptops, or use the original 256MB Pi 1 as a light desktop.
- 8GB is easily available and what most cheap laptops sport,
- 16GB is either default or an addon for cheaper laptops,
- 32GB is a premium that is not always available, and
- 64GB is usually only available in huge workstation or gamer "laptops", although there are some decent-size Dells with it.
While those are not my choice of metrics, they do seem to support the "low", "mid-low", "mid-high" and "high" labels I personally added. At most, you could argue that ≤8GB is low, 16GB is mid and ≥32GB is high. 10 years ago, 16GB would have been high-end for a laptop, but no more.
> And it turns out that when extra latency is A/B tested, people can and do notice latency in the range we’re discussing here.
Yes, this is true. But the methodology is important, and the test used doesn't really apply to typing in terminals. The test isn't a "type and see if you can tell it's slow" test, it's a hit the mark hand-eye coordination test, something you don't do when typing text. Latency when playing Guitar Hero is super duper important, way more important than most other games, which is why they have a latency calibrator right in the game. Latency when playing a Zelda game is a lot less important, but they still try very hard to reduce latency.
The same people who can distinguish between 2ms of difference in a drum beat also can't distinguish between an extra 30ms of response time when they click a button in a dialog box.
I'd like to see a stronger justification for why lower latency in a terminal is just as important as it is for hand-eye coordination tasks in games.
~2 msec (mouse)
8 msec (average time we wait for the input to be processed
by the game)
16.6 (game simulation)
16.6 (rendering code)
16.6 (GPU is rendering the previous frame, current frame
16.6 (GPU rendering)
8 (average for missing the vsync)
16.6 (frame caching inside of the display)
16.6 (redrawing the frame)
5 (pixel switching)
Games that actually run at 60fps usually do not have greater than 100ms latency. They also don't miss vsync every other frame on average, that 8ms thrown in there looks bizarre to me. Render code and GPU rendering are normally the same thing. Both current and previous frame GPU rendering is listed, huh? Sim & render code run in parallel, not serially. The author even said that in his article, but lists them separately... ?
Consumer TVs come with like 50ms of latency by default. That's often half of it right there. Games are often triple-buffered too, that accounts for some of it. The ~2ms right at the top belongs in the 8ms wait, it disappears completely.
I just get the feeling the author of this list was trying hard to pad the numbers to make his point, it feels a like a very hand-wavy analysis masquerading as a proper accounting of latency.
I noticed during my thesis on realtime systems that my brain had difficulties compensating for latency, even more for jitter (inconsistent latency). I do more typos when I have a high latency. I notice it when playing music in a high latency context, or when the ssh connection has a high ping. We did the experiment slowing down the click of the mouse by 100ms. Users hated the results.
I'm also more productive if I can notice my typos 100ms earlier, or confirm that everything I previously typed is good earlier. Even if it takes me 500ms to process the information displayed on screen, it stays on the critical path of the overall task speed performance.
My analysis would be something like this (assuming 60 fps):
1. Input latency - variable from hardware
2. Input event queuing - avg 8 ms (input event arrives in frame X but is handled in frame X+1)
3. Animation, game simulation, submitting GPU commands, other CPU work - 16.7 ms
4. GPU work - 16.7 ms
5. Display controller scanning out frame - 16.7 ms
6. Display panel latency - < 10 ms?
Steps 3 (specifically the submitting GPU commands part) and 4 can happen within one frame interval so you can be double buffered in that case. Otherwise you need triple buffering to maintain 60 fps. If you have a window compositor, then you need to account for when that latches and releases the buffer you submitted and if it performs copies or steals your buffer. This is the part that's especially murky. Games have it easy with fullscreen non compositing mode.
As far as the latency analysis, I'd probably just lump display latency all in one and not try to break it down further. There's some number of ms it takes from the GPU output until pixels are on-screen. Consumer TVs are worse than computer LCDs because the TVs now are doing all kinds of filtering. OTOH, consumer TVs are also trending toward 240Hz refresh, so there is a counter-force causing latency to go down too.
Someone else here mentioned there are two kinds of triple buffering, which I didn't know, but FWIW I was talking about the 3-frames of latency kind. There's also a 2 frames of latency kind, and it sounds like that's the kind you're talking about?
Anyway, I tend to think about game latency as simply 1, 2 or 3 frames of latency, depending solely on how much buffering is going on. That explains almost all game latency, and the games are bad because they don't run at 60Hz, tons of games run at 15-30Hz with double or triple buffering, so the latency is automatically bad. No other sources of latency are needed to explain why. There is the 1/2 frame of latency between input and when the system polls the input and recognizes it, that's fair. So a typical double-buffered game has 2.5 frames of latency, and then add on the display device latency, and that's the sum total. It doesn't need to be made to look more complicated than that, IMO.
In that light, it totally makes sense that typing latency matters. If you aim to type in a smooth rhythm and use the screen as feedback, any hiccup will slow you down.
You can more easily hear a 10-20ms delay than you can see one, it's a physical feature of our human hardware. And hand-eye coordination tasks are all about anticipating an event. Hitting or catching a baseball for example, we can see it coming, the pattern of it's trajectory is what allows us to compensate for the 100ms of delay in our nervous system & is brain and allow us to have 1ms accuracy.
Neither of those is true for typing. Don't get me wrong; I want lower latency in my terminals and editors. I just don't buy that it's particularly important until the latency hits a threshold of badness, which is probably around 100ms. People largely aren't complaining about terminal latency nearly as much as they complain about video game latency, even though both are widely used.
The reason guitar hero has latency adjustment controls and almost no other games do is because they're mixing audio with hand-eye coordination tasks. I can very easily tell the difference between 5ms of delay in Guitar Hero. But I have no idea what my terminal latencies are, and I generally don't care until it stalls more than probably 200ms. It makes a very subtle responsiveness difference when there's an extra 30ms latency while I type, but it doesn't make a large functional difference or compromise my ability to type in any easily felt or measurable way. With Guitar Hero, on the other hand, I drastically lose my ability to play the game when the latency is off by 20ms.
Anyway, I appreciate the response & discussion, but I still want to hear a stronger justification for typing latency being very important. There might be one, I just don't think I've heard it yet.
This talk blew my mind and made me feel like a terrible engineer. He's talking about end-to-end latency in VR, which actually has a commercial motivation because VR products with high latency will make you sick. (this obviously doesn't happen with shells and terminals!)
He's talking about the buffering and filtering at every step of the way. And it's not just software -- sensors have their own controllers to do filtering, which requires buffering, before your OS kernel can even SEE the the first byte, let alone user space. On the other side, display devices and audio output devices also do nontrivial processing after you've sent them your data.
It's an integrated hardware/software problem and Carmack is great at explaining it! It's a dense, long-ish talk, but worth it.
On a MacBookPro11,1 - 2,8 GHz this shows:
iTerm2: 2.182 total
iTerm2 with tmux: 0.860 total
terminal.app: 0.135 total
terminal.app with tmux: 0.910 total
Surprisingly, iTerm2 is faster with tmux than terminal.app is with tmux. But terminal.app without tmux is the fastest.
Anyone knows why the performance with tmux is so different between the both terminals?
Gedit gained this problem when it switched to GTK3. The forced animation for scrolling doesn't help. Mousepad still has low latency, at least in the version included with Debian Unstable, but I worry that port of XFCE to GTK3 will make it as bad as GNOME.
top just doesn't look the same without the changes trickling down the screen, matrix like..
Thankfully I can run GlassTTY font connected to the KVM serial console for a near approximation.. but it's still too fast :)
Grew up in the VC/GUI transition era, but buying a vt220 and running it at 19200 on a used Vax taught me a Zen of command line that nothing else could... Not only did you have to think about what the command would do, but also how much text it would display, and whether you'd need to reboot the terminal after it got hosed up...
So this isn't exactly the same, but I improved my Vim muscle memory considerably by running the MS-DOS version of Vim inside of DOSBox, reducing its speed of emulation to the lowest option, and finding the most efficient ways to edit files at a decent pace by making use of the best command sequences possible at any given point in time.
I'd imagine that terminal users will often be looking at the last bit of output as they type, and hardly looking at the thing they're typing at all (one glance before hitting Return). They aren't going to notice a bit of latency. And terminals are often used to communicate over networks that introduce a lot more latency than any of the measurements here.
I think, for me, this is a bit like the sales pitch for Alacritty -- fastest terminal ever as long as you don't mind that it doesn't scroll. Someone is using their terminal very differently from the way I use mine.
Basically we want the stuff in our brains to be able to manifest itself as fast as we're thinking it. Already having to drive human hands to make this happen is a big penalty; we don't need unnecessary extra latency if it can be helped.
Still waiting for that neural interface... plug me in please.
> Terminals were fullscreened before running tests. This affects test results, and resizing the terminal windows can and does significantly change performance (e.g., it’s possible to get hyper to be slower than iterm2 by changing the window size while holding everything else constant).
Perhaps we don't notice because it's so much lower at window sizes much less than fullscreen?
If you getting unusual slowness in linux/BSD, some things to check: (1) any significant overhead on the system that could be making forking slow; (2) some option in your $HOME/.bashrc that is adding latency, e.g., indexing options or dynamic stuff in your PS1; (3) unusual filesystem stuff, e.g., your home directory is mounted over NFS.
If you're in linux and it is reproducible, run strace against a bash launch, and ctrl+c it when it gets stuck. Have a look at the recent system calls.
It's instructive to dig in and debug it, you may be really surprised how much insane amounts of stuff a shell can be doing in those 3-4 seconds. Just some examples of what my shells are doing: archiving my history (diffing & zipping files), bash completion, git completion, read color definitions, run OS-specific scripts, run host-specific scripts, add (read & parse) lots of aliases & functions, run /etc/bashrc (which on my macOS laptop is almost 3k lines long)
The last performance issue I had was my history archiving junk got too big, I was saving my full .bash_history into an archive every day without clearing .bash_history, so they kept getting bigger. Diffing against the previous day, and writing a smaller file, fixed it.
You can binary search by putting a couple of "date +%s%N" in your .bashrc or .bash_profile to find out what's taking the longest.
On my system, hitting Ctrl-Shift-T inside xterm is almost instant, and that starts a bash process. Likewise for Ctrl-B C in tmux.
Bash's startup files are annoying, but you can probably pinpoint the problem with strace.
Or maybe try running:
strace bash --norc --rcfile=/dev/null
bash-completion (available in many package managers) used to be one culprit, should be less of one now due to on-demand completion loading, but still noticeable. I don't use it. Another is rvm, nvm, etc (managers of ruby or nodejs versions for development, install process typically adds to your .bashrc).
Doesn't measuring keypresses to display latency require special hardware?
All tests were done on a dual core 2.6GHz 13” Mid-2014 Macbook pro. The machine has 16GB of RAM and a 2560x1600 screen. The OS X version was 10.12.5. Some tests were done in Linux (Lubuntu 16.04) to get a comparison between macOS and Linux. 10k keypresses were for each latency measurements.
Latency measurements were done with the . key and throughput was done with default base32 output, which is all plain ASCII text. This is significant. For example, terminal.app appears to slow down when outputting non-latin unicode characters.
However, I don't know if that's intentional, because they don't think it's ready yet for people who won't install/compile the whole Rust stack from scratch?
Oh my.. I wish that was the case. Even current development machines tend to have just 8 around here..
Using a WYSIWYG word processor, there's enough latency between keypress and visual update that I find them impossible to use.
When Apple came out with Pages, it was apparent that they paid strong attention to latency. That means the latency is small enough that (for me) using it isn't an exercise in frustration.
As long as you press "i" first! :)