
Terminal and shell performance - darwhy
https://danluu.com/term-latency/
======
gnachman
iTerm2 author here.

I'll spend some time looking into iTerm2's latency. I'm sure there are some
low-hanging fruit here. But there have also been a handful of complaints that
latency was too low—when you hit return at the shell prompt, the next frame
drawn should include the next shell prompt, not the cursor on the next line
before the new shell prompt has been read. So it's tricky to get right,
especially considering how slow macOS's text drawing is.

If I could draw a whole frame in a reasonable amount of time, this problem
would be much easier! But I can't. Using Core Text, it can easily take over
150ms to draw a single frame for a 4k display on a 2015 macbook pro. The
deprecated core graphics API is significantly faster, but it does a not-so-
great job at anything but ASCII text, doesn't support ligatures, etc.

Using layers helps on some machines and hurts on others. You also lose the
ability to blur the contents behind the window, which is very popular. It also
introduces a lot of bugs—layers on macOS are not as fully baked as they are on
iOS. So this doesn't seem like a productive avenue.

How is Terminal.app as fast as it is? I don't know for sure. I do know that
they ditched NSScrollView. They glued some NSScrollers onto a custom NSView
subclass and (presumably) copy-pasted a bunch of scrolling inertia logic into
their own code. AFAICT that's the main difference between Terminal and iTerm2,
but it's just not feasible for a third-party developer to do.

~~~
tomjakubowski
> If I could draw a whole frame in a reasonable amount of time, this problem
> would be much easier! But I can't. Using Core Text, it can easily take over
> 150ms to draw a single frame for a 4k display on a 2015 macbook pro.

Holy cow! I wonder if iTerm2 would benefit from using something like
pathfinder[1] for text rendering. I mean, web browsers are able to render huge
quantities of (complex, non-ASCII, with weird fonts) text in much less than
150ms on OS X somehow; how do they manage it? Pathfinder is part of the answer
for how Servo does it, apparently.

[1]:
[https://github.com/pcwalton/pathfinder](https://github.com/pcwalton/pathfinder)

~~~
pcwalton
> Using Core Text, it can easily take over 150ms to draw a single frame for a
> 4k display on a 2015 macbook pro.

I'm guessing that you're somehow preventing Core Text from taking advantage of
its caching. Apple's APIs can be a bit fussy about their internal caches; for
example, the glyph cache is (or at least used to be) a global lock, so if you
tried to rasterize glyphs on multiple threads you would get stalls. Try to
reuse Core Text and font objects as much as possible. Also check to make sure
you aren't copying bitmaps around needlessly; copying around 3840x2160 RGBA
buffers on the CPU is not fast. :)

For a terminal, all you really need to do for fast performance is to cache
glyphs and make sure you don't get bogged down in the shaper. Pathfinder would
help when the cache is cold, but on a terminal the cache hit rate will be 99%
unless you're dealing with CJK or similar. There's a lot lower hanging fruit
than adopting Pathfinder, which is undergoing a major rewrite anyway.

(I'm the author of Pathfinder.)

~~~
rsync
"iTerm2 author here"

...

"(I'm the author of Pathfinder.)"

I keep thinking that sometime soon the magic of HN will wear off, or we will
hit "peak hackernews" or something like that.

Not today!

~~~
nkristoffersen
Same reason I love HN. We don't have to guess what the creator was thinking.
They'll tell us :-)

------
nneonneo
I think one thing that this really points out is just how much care Apple has
poured into Terminal.app. It's _very_ good, and every time I have to use
another terminal application (ugh conhost.exe) I am reminded of this. It's got
a bunch of really thoughtful little features (showing what processes are
attached to the current pty, option-clicking to move the cursor rapidly, full
mouse support for apps like vim, good linewrap detection, and recently support
for rich-text copy/paste which is useful for showing coloured terminal output,
etc. etc.), and it remains really fast and snappy despite these features.

On a related note, I am big into latency analysis and driving down latency in
interactive systems. I'm quite familiar with the touchscreen work cited at the
top, and having played with the system I can attest that <1ms latency feels
actually magical. At that level, it really doesn't feel like any touchscreen
you've ever used - it genuinely feels like a physical object you're dragging
around (the first demo of the system only let you drag a little rectangle
around a projected screen). It's amazing what they had to do to get the
latency down - a custom DLP projector with hacked firmware that could only
display a little square at a specified position at thousands of FPS, a custom
touchscreen controller, and a direct line between the two. No OS, no windowing
system, nada. After seeing that demo, I can't help but believe that latency is
the one thing that will make or break virtual reality - the one thing that
separates "virtual" from "reality". I want to build a demo someday that does
the latency trick in VR - a custom rig that displays ultra-simple geometry
that has sub-millisecond latency to human head movement. I will bet that even
simple geometry will feel more realistic than the most complex scene at 90
FPS.

~~~
rsync
"It's got a bunch of really thoughtful little features (showing what processes
are attached to the current pty, option-clicking to move the cursor rapidly,
full mouse support for apps like vim, good linewrap detection, and recently
support for rich-text copy/paste which is useful for showing coloured terminal
output, etc. etc.), and it remains really fast and snappy despite these
features."

On the one hand, I have to agree that Terminal.app is quite good and very
impressive. I don't bother with a third party terminal application and _I do
everything in the terminal_.

However, one of the very valuable things about working in the terminal is the
safety and immunity that it provides. No matter what bizarro virus attachment
you send me, I can text edit it in a terminal without risk. There's nothing
you can paste to me in irc that will infect or crash my computer.

Or at least, that's how it should be.

But the trickier we get with the terminal - the more things it does "out of
band" and the more it "understands" the content of the text that it is
rendering, the more of this safety we give up.

Frankly, it bothers me greatly that the terminal would have _any idea
whatsoever_ what text editor I am running or that I am running a text editor
at all. It bothers me even more to think that I could copy or paste text and
get results that were anything other than those characters ...

Make terminals fancy at your peril ...

~~~
comex
> Frankly, it bothers me greatly that the terminal would have any idea
> whatsoever what text editor I am running or that I am running a text editor
> at all.

I'm not sure what you mean by this. Terminal.app doesn't know that you're
running a text editor. It does know that you're running a process called
'vim', which is kind of magic, but not too much (ps has always been able to
show what processes are attached to a given tty, for example). If you're
referring to the parent comment's "full mouse support for apps like vim", they
just mean it supports "mouse reporting" control sequences, which date back to
xterm. If anything, Terminal.app is late to support this (only in the latest
release, whereas alternatives like iTerm have supported it for ages).

> It bothers me even more to think that I could copy or paste text and get
> results that were anything other than those characters ...

Well, the terminal has to interpret escape sequences for colors and such in
order to display them, so why shouldn't it also preserve that metadata when
copying and pasting? Like any other rich text copy+paste, it will only be kept
if you paste into a rich text field; discarded if you paste into a plaintext
field.

That said, there are a few 'odd' things Terminal.app supports: e.g. printf
'\e[2t' to minimize the window. (This also comes from xterm.)

------
jwilm
We can do better in Alacritty. For those interested, I've filed a bug on our
issue tracker about where this latency is coming from and what can be done:
[https://github.com/jwilm/alacritty/issues/673](https://github.com/jwilm/alacritty/issues/673)

At the end of the day, there is a trade off to be made. Terminals (or any
program, really) can have 1-frame input latency (typically 1/60sec) and give
up v-sync and tearing results, or they can have a worst-case 2-frame input
latency with v-sync, and then you're looking at 2/60sec or ~32ms.

~~~
zokier
Triple buffering solves the latency issue of vsync latency by trading off
buring additional cpu/gpu time, bringing the worst-case latency back to 1
frame.

~~~
taw55
I don't think that's correct.

The way I understand it tripple buffering _adds_ latency in exchange for
higher sub display hz framerates.

Double buffering renders the next frame while displaying the current. That
results in latency of 1 frame since input. Triple buffering adds another frame
to the queue, resulting in a 2 frame lag.

With double buffering the framerate gets cut in half if it cannot meet vsync,
with triple buffering it can also get cut in thirds. So double buffering is 60
-> 30, where the frame lasts 2 refreshes. Triple is 60 -> 40, where one frame
is displayed for 1 refresh and another is displayed for 2.

Nowadays it's probably better to use adaptive vsync, which simply disables
vsync when the framerate drops. This will reintroduce tearing, which might be
preferable in fast action games.

~~~
speleo_engr
Triple buffering confusingly means different things. Both you and the OP are
correct. The OP meant something in line with what Wikipedia says:
[https://en.wikipedia.org/wiki/Multiple_buffering#Triple_buff...](https://en.wikipedia.org/wiki/Multiple_buffering#Triple_buffering)

"In triple buffering the program has two back buffers and can immediately
start drawing in the one that is not involved in such copying. The third
buffer, the front buffer, is read by the graphics card to display the image on
the monitor. Once the image has been sent to the monitor, the front buffer is
flipped with (or copied from) the back buffer holding the most recent complete
image. Since one of the back buffers is always complete, the graphics card
never has to wait for the software to complete. Consequently, the software and
the graphics card are completely independent and can run at their own pace.
Finally, the displayed image was started without waiting for synchronization
and thus with minimum lag.[1]

Due to the software algorithm not having to poll the graphics hardware for
monitor refresh events, the algorithm is free to run as fast as possible. This
can mean that several drawings that are never displayed are written to the
back buffers. Nvidia has implemented this method under the name "Fast sync"."

~~~
ploxiln
It annoys me that most games don't seem to offer this option - if they have
this level of control, they just offer vsync on or off. The AMD gpu control
panel can force triple buffering, but only for OpenGL, and I think the vast
majority of games on Windows use DirectX.

Triple buffering uses more display buffer memory, but roughly the same cpu/gpu
load as vsync-off. It's great for latency. It makes a whole lot of sense. It's
been around for like 20 years. But you rarely see it used ...

------
mikejmoffitt
Interesting results. I have loved XTerm for a long time because it "felt
snappy". On MacOS I've always preferred Terminal.app to the often recommended
iTerm2 for similar reasons.

I think it's funny to have the suckless project page for st go on and on about
how XTerm is clunky and old and unmaintainable, but the result of this small
and clean minimalist terminal is a closer loser in terminal performance, which
subconsciously and consciously detracts from the experience.

~~~
gumby
I've really wondered that too: why is everyone recommending iterm? I'm glad
it's just a matter of taste -- I'm perfectly happy with Terminal (and running
a shell inside Emacs in my terminal :-).

~~~
patrec
It's a matter of taste in that both have features that the other doesn't have
(terminal being faster being it's main, but significant one).

As far as I know, in terminal you can't use cmd as meta key, which immediately
kills it for me as an emacs user (furthermore in iterm2 you can set it up so
that left cmd = meta, right cmd = cmd, which I find very useful).

~~~
gumby
You sure can (I assume you meant opt), and have been able to since the NeXT
days. There are many Emacs users at Apple (look at the Emacs key bindings in
the text widgets)

------
joosters
_the most common terminal benchmark I see cited (by at least two orders of
magnitude) is the rate at which a terminal can display output, often measured
by running cat on a large file. This is pretty much as useless a benchmark as
I can think of_

It's a really helpful benchmark, IMO, as it's the main problem I see with
different terminals. On a chromebook, most SSH clients are effectively useless
because if you accidentally run a command that prints a lot of output (even
just 'dmesg'), the terminal locks up for a huge amount of time, seconds or
even minutes. You can't even interrupt the output quickly.

I appreciate that it's a different problem to the latency that the OP is
trying to measure, but as a benchmark, it's actually very useful.

~~~
btilly
Later in the same paragraph he addresses your exact problem, and points out
that the speed of quitting a command that prints too much output is poorly
correlated with the speed of spewing output.

Given how easy it is to accidentally spew something that I don't want to wait
for, even if it is spewing quickly, I'm squarely with him in not caring about
the speed of display. Slow it down to just faster than my eyes can make sense
of it, and make ^C fast, and my life will be better.

~~~
Tyriar
I looked into this quite a bit when I was optimizing xterm.js.

Based on my findings, ^C is highly related to the speed of the output because
the process running in the shell may be way ahead of the terminal's
parsing/rendering. Imagine you run `cat foo`, the shell could take around 1s
to send the output over to the terminal, the terminal might then take 10
seconds to parse and render the output. So after 1 second a ^C will actually
do nothing because the cat call has finished. This is the case with Hyper, it
hangs due to slow parsing and too much DOM interaction (as hterm was not
designed for this sort of thing).

There's actually a mechanism for telling the process to pause and resume
(sending XOFF/XON signals), which allows the terminal and shell to stay
completely in sync (^C very responsive). However, these only really work well
in bash as oh-my-zsh for example overrides the signal with a custom
keybinding. Related links:

Original PR:
[https://github.com/sourcelair/xterm.js/pull/447](https://github.com/sourcelair/xterm.js/pull/447)
Post-PR bug:
[https://github.com/sourcelair/xterm.js/issues/511](https://github.com/sourcelair/xterm.js/issues/511)

~~~
scottlamb
If the sending process blocks after filling the pipe (on Linux, 512 bytes),
hitting ctrl-C should be effective—any of these terminals should be able to
sink that in negligible time.

If it instead takes a long time, there are probably large buffers between:

* If you're talking about ssh to a faraway machine, the TCP layer is probably responsible. (I'm not even sure if there's anything you can do about this; the buffer (aka "window size" in TCP terminology, plus the send and receive buffers on their respective ends) is meant to be at least the bandwidth-delay product, and as far as I know, the OS doesn't provide an interface to tell it you don't need a lot of bandwidth for this connection. It'd be nice if you could limit the TCP connection's bandwidth to what the terminal could sink.)

* If you're talking about something running on the machine itself, it's probably an over-large buffer inside the terminal program itself.

------
def-
Was curious and tried this out a bit on Linux+X11 on an i7 6700k with the
igpu:

    
    
                 stdout [MB/s]  idle 50 [ms]
        urxvt             34.9          19.8
        xterm              2.2           1.9
        rxvt               4.3           7.0
        aterm              6.0           7.0
        konsole           13.1          13.0  note: stops moving when printing large file
        terminator         9.1          29.4  note: stops moving when printing large file
        st                23.0          11.2
        alacritty         45.5          15.5

~~~
sim-
Can you paste or describe your measuring technique?

~~~
def-
For the throughput I created a file with 1 GB of random source code and ran
`time cat file`

Latency for input:
[https://github.com/pavelfatin/typometer](https://github.com/pavelfatin/typometer)

------
chubot
If you're sensitive to latency and run Linux, try hitting Ctrl-Alt-F1, and do
a little work in console mode at the terminal. (Ctrl-Alt-F7 to get back.)

For me this is a great illustration of how much latency there is in the GUI.
Not sure if everyone can feel it, but to me console mode is much more
immediate and less "stuffy".

~~~
ezequiel-garzon
I wonder how many folks use linux virtual consoles for day-to-day work. I
guess it's not practical for web developers given that you probably have to
switch to X to check results with a graphical browser, no?

~~~
Nemo157
I do :)

Specifically I run kmscon on one virtual terminal + tmux for scrolling/tabs,
then an X server running chromium on another. I'm not a web dev, but I am
continuously having to switch between the two for tracking merge requests,
ticket status, testing via our frontends etc.

It's not bad, but kmscon is pretty much abandoned at this point, and I don't
know of anyway to have a setup like this run nicely on multiple monitors. It
was meant to be just an experiment at an über-minimal setup, I was planning to
switch back to my previous i3 + st based setup after a month or so, but now
it's been most of a year and I'm still using it for some reason.

I think the big thing I really enjoy about this setup is the complete lack of
window management. Even the minimal window management I had to do with i3 (1-2
terminals running tmux + 1-2 browser instances) is gone. It feels like that's
removed a small unnoticed stress point from my work day. If I ever get round
to setting up a window manager again I think I'm going to try and keep it
limited to 1 terminal + 1 browser instance and rely entirely on their internal
tab support.

~~~
chubot
That sounds interesting. I'm having trouble visualizing what you are doing
though.

Are you running a distro or did you build this setup yourself?

What role does kmscon play? I think you could just run raw tmux in one VT and
X in another? Although to me it seems slow to switch between the two.

It seems like Linux should support the multi-monitor setup as I said in a
sibling comment -- maybe I will take some time to investigate it.

Was latency once of your considerations, or was it mainly lack of window
management?

If you have time a screenshot would be helpful :)

~~~
Nemo157
I'm running Arch linux, I just installed kmscon and enabled the
kmsconvt@tty2.service systemd service to have kmscon take over the second VT.
It includes its own login manager that replaces getty. I then also login to
the linux console on VT 3 and manually launch X from there (with just chromium
in my .xinitrc).

kmscon has better handling for colors, fonts, etc. than the linux console;
that's the only real reason I'm using it. On my laptop's builtin display I
have no delay switching between any of the linux console/kmscon/X; when I plug
in an external monitor I do get ~1-2 second delay switching from the linux
console/kmscon -> X, no delay the other way.

There does appear to be some bug with switching from X -> kmscon, it just
shows a black screen, but I've gotten used to switching X (VT 3) -> linux
console (VT 1) -> kmscon (VT 2) which seems to work around that. There's also
another bug where the Ctrl key seems to get stuck down in kmscon when
switching to it sometimes, has only happened ~4 times in the last 8 months and
I can fix it by just running `systemctl restart kmsconvt@tty2` and attaching
to my tmux session again.

Since I'm not doing any frontend changes I don't ever really need to look at
both my terminal and browser at the same time so haven't taken the time into
seeing if I can have different VT displaying on different monitors. I prefer
the portability of a laptop over having the most productive single location
setup.

Latency was not at all a consideration, it was purely an exercise in how
minimal a setup I could have. I spent a couple of weeks without having X
installed and using command line browsers when I needed to, but using GitLab
and JIRA through command line browsers was a real pain (and if I recall
correctly some stuff was impossible).

A screenshot is difficult since it's multiple VTs and I don't think kmscon has
any kind of builtin screenshot support. Just imagine a full-screen terminal
with tmux, you hit Ctrl-Alt-F2, now it's a full screen browser; that's
basically it.

One other thing I do have setup to make life a little easier are some little
aliases syncing the X clipboard and tmux paste buffer so I can copy-paste
between X and my terminal. And I have DISPLAY setup in the terminal so things
like xdg-open and urlview can open stuff in my web browser.

------
Tyriar
I work on the VS Code terminal/xterm.js[1].

Hyper which currently uses a fork of hterm, is in the process of moving over
to xterm.js due to the feature/performance improvements we've made over the
past 12 months. Hyper's 100% CPU/crash issue[2] for example should be fixed
through some clever management of the buffer and minimizing changing the DOM
when the viewport will completely change on the next frame.

I'd love to see the same set of tests on Hyper after they adopt xterm.js
and/or on VS Code's terminal.

Related: I'm currently in the process of reducing xterm.js' memory
consumption[3] in order to support truecolor without a big memory hit.

[1]:
[https://github.com/sourcelair/xterm.js](https://github.com/sourcelair/xterm.js)
[2]:
[https://github.com/zeit/hyper/issues/94](https://github.com/zeit/hyper/issues/94)
[3]:
[https://github.com/sourcelair/xterm.js/issues/791](https://github.com/sourcelair/xterm.js/issues/791)

------
tomsmeding
> even the three year old hand-me-down laptop I’m using has 16GB of RAM

> on my old and now quite low-end laptop

Trust me, that's not a low-end laptop. Either that has the shittiest cpu ever
and a terribly mismatched amount of memory, or the author's view of that is
high-end or low-end is skewed; in either case, what's low-end nowadays would
be ≤4GB RAM. 16GB is LOTS, useful for developers that run large builds and/or
VM's regularly.

I very much like the rest of the article though, would love to see some
latency improvements here and there!

~~~
arghwhat
It may very well be your view that is skewed.

For user-facing terminals/workstations, I would consider ≤8GB low-end
("unusable"). 16GB would be mid-low ("usable"), 32GB would be mid-high
("comfortable"), ≥64GB would be high ("good").

For servers, ≤64GB is low-end, ≥512GB being high-end.

That 8GB would be considered low-end does not mean that no one uses it,
though. Some people might still rock 2GB laptops, or use the original 256MB Pi
1 as a light desktop.

~~~
tomsmeding
Apparently that's true then. I'm in the Netherlands, so maybe the US is
different in this regard. Thanks for clearing up!

~~~
59nadir
No, you're completely right in that 16 GB RAM is not considered low-end. You
only have to browse for laptops for about 2 minutes to discover that 16 isn't
at all "low-end".

~~~
arghwhat
I think you missed the point I was trying to make entirely (that people's
definitions of "low end" differ, although "low-end" steadily moves up), but
from that perspective:

\- 8GB is easily available and what most cheap laptops sport,

\- 16GB is either default or an addon for cheaper laptops,

\- 32GB is a premium that is not always available, and

\- 64GB is usually only available in _huge_ workstation or gamer "laptops",
although there are some decent-size Dells with it.

While those are not my choice of metrics, they do seem to support the "low",
"mid-low", "mid-high" and "high" labels I personally added. At most, you could
argue that ≤8GB is low, 16GB is mid and ≥32GB is high. 10 years ago, 16GB
would have been high-end for a laptop, but no more.

------
dahart
I love the analysis of terminal latencies! And I'm in full agreement with the
overall goal of less latency everywhere. But, of course, I feel like picking a
few nits.

> And it turns out that when extra latency is A/B tested, people can and do
> notice latency in the range we’re discussing here.

Yes, this is true. But the methodology is important, and the test used doesn't
really apply to typing in terminals. The test isn't a "type and see if you can
tell it's slow" test, it's a hit the mark hand-eye coordination test,
something you don't do when typing text. Latency when playing Guitar Hero is
super duper important, way more important than most other games, which is why
they have a latency calibrator right in the game. Latency when playing a Zelda
game is a _lot_ less important, but they still try very hard to reduce
latency.

The same people who can distinguish between 2ms of difference in a drum beat
also can't distinguish between an extra 30ms of response time when they click
a button in a dialog box.

I'd like to see a stronger justification for why lower latency in a terminal
is just as important as it is for hand-eye coordination tasks in games.

    
    
      ~2 msec (mouse)
      8 msec (average time we wait for the input to be processed 
      by the game)
      16.6 (game simulation)
      16.6 (rendering code)
      16.6 (GPU is rendering the previous frame, current frame 
      is cached)
      16.6 (GPU rendering)
      8 (average for missing the vsync)
      16.6 (frame caching inside of the display)
      16.6 (redrawing the frame)
      5 (pixel switching)
    

I find this list pretty strange. It's generally right - there are a bunch of
sources of latency. But having done optimization for game consoles for a
decade, this explanation of game latency feels kinda weird.

Games that actually run at 60fps usually do not have greater than 100ms
latency. They also don't miss vsync every other frame on average, that 8ms
thrown in there looks bizarre to me. Render code and GPU rendering are
normally the same thing. Both current and previous frame GPU rendering is
listed, huh? Sim & render code run in parallel, not serially. The author even
said that in his article, but lists them separately... ?

Consumer TVs come with like 50ms of latency by default. That's often half of
it right there. Games are often triple-buffered too, that accounts for some of
it. The ~2ms right at the top belongs in the 8ms wait, it disappears
completely.

I just get the feeling the author of this list was trying hard to pad the
numbers to make his point, it feels a like a very hand-wavy analysis
masquerading as a proper accounting of latency.

~~~
sunnyps
I had a similar discussion with a colleague a few weeks ago. They were double
counting latency in different pipeline stages when a delay in one stage causes
high latency in the other stage.

My analysis would be something like this (assuming 60 fps): 1\. Input latency
- variable from hardware 2\. Input event queuing - avg 8 ms (input event
arrives in frame X but is handled in frame X+1) 3\. Animation, game
simulation, submitting GPU commands, other CPU work - 16.7 ms 4\. GPU work -
16.7 ms 5\. Display controller scanning out frame - 16.7 ms 6\. Display panel
latency - < 10 ms?

Steps 3 (specifically the submitting GPU commands part) and 4 can happen
within one frame interval so you can be double buffered in that case.
Otherwise you need triple buffering to maintain 60 fps. If you have a window
compositor, then you need to account for when that latches and releases the
buffer you submitted and if it performs copies or steals your buffer. This is
the part that's especially murky. Games have it easy with fullscreen non
compositing mode.

~~~
dahart
An interesting aspect of this is that for hand-eye coordination tasks, 60Hz
provides a perceptual smoothness advantage over 30Hz. There's some benefit to
having a higher frame rate even if it costs some latency. I don't know where
the trade off point is, but it's not always better to reduce latency if it
means a lower frame rate. It probably is always better to have both low
latency and high frame rate, but that's hard.

As far as the latency analysis, I'd probably just lump display latency all in
one and not try to break it down further. There's some number of ms it takes
from the GPU output until pixels are on-screen. Consumer TVs are worse than
computer LCDs because the TVs now are doing all kinds of filtering. OTOH,
consumer TVs are also trending toward 240Hz refresh, so there is a counter-
force causing latency to go down too.

Someone else here mentioned there are two kinds of triple buffering, which I
didn't know, but FWIW I was talking about the 3-frames of latency kind.
There's also a 2 frames of latency kind, and it sounds like that's the kind
you're talking about?

Anyway, I tend to think about game latency as simply 1, 2 or 3 frames of
latency, depending solely on how much buffering is going on. That explains
almost all game latency, and the games are bad because they don't run at 60Hz,
tons of games run at 15-30Hz with double or triple buffering, so the latency
is automatically bad. No other sources of latency are needed to explain why.
There is the 1/2 frame of latency between input and when the system polls the
input and recognizes it, that's fair. So a typical double-buffered game has
2.5 frames of latency, and then add on the display device latency, and that's
the sum total. It doesn't need to be made to look more complicated than that,
IMO.

------
chubot
If you care about end-to-end latency, I highly recommend this talk by John
Carmack:

[https://www.youtube.com/watch?v=lHLpKzUxjGk](https://www.youtube.com/watch?v=lHLpKzUxjGk)

This talk blew my mind and made me feel like a terrible engineer. He's talking
about end-to-end latency in VR, which actually has a commercial motivation
because VR products with high latency will make you sick. (this obviously
doesn't happen with shells and terminals!)

He's talking about the buffering and filtering at every step of the way. And
it's not just software -- sensors have their own controllers to do filtering,
which requires buffering, before your OS kernel can even SEE the the first
byte, let alone user space. On the other side, display devices and audio
output devices also do nontrivial processing after you've sent them your data.

It's an integrated hardware/software problem and Carmack is great at
explaining it! It's a dense, long-ish talk, but worth it.

------
chillee
One other thing to note is that compositors seem to add a fairly large amount
of latency. I ran the app linked in the "Typing with Pleasure" post and I saw
a roughly ~20ms improvement across various text editors with the compositor
turned off (I'm using Termite with Compton as my compositor).

[http://imgur.com/0G3qbpr](http://imgur.com/0G3qbpr)

------
diggan
Fun test to check how fast the terminal can handle loads of text, run "time
head -c 1000000 /dev/urandom"

On a MacBookPro11,1 - 2,8 GHz this shows:

iTerm2: 2.182 total

iTerm2 with tmux: 0.860 total

terminal.app: 0.135 total

terminal.app with tmux: 0.910 total

Surprisingly, iTerm2 is faster with tmux than terminal.app is with tmux. But
terminal.app without tmux is the fastest.

Anyone knows why the performance with tmux is so different between the both
terminals?

~~~
mbernstein
My iTerm2 is taking 17s vs. 0.3s for Terminal.app. Any idea why yours is an
order of magnitude faster?

~~~
diggan
Hm, have the same CPU? Was tested with build 3.0.15

------
gue5t
It'd be cool if someone would replicate this on Linux under X11 and a few
Wayland compositors, and throw urxvt, rxvt, xterm, aterm, and some others in
the mix.

~~~
bennofs
And alacritty
([https://github.com/jwilm/alacritty](https://github.com/jwilm/alacritty))
would be interesting to see as well.

------
mrob
The common LibVTE-based terminals have problems with latency because they're
deliberately capped at 40fps. Xterm doesn't have this problem.

Gedit gained this problem when it switched to GTK3. The forced animation for
scrolling doesn't help. Mousepad still has low latency, at least in the
version included with Debian Unstable, but I worry that port of XFCE to GTK3
will make it as bad as GNOME.

------
cat199
So, on the other side, anyone want to build a true 'terminal emulator' that
has baud-speed emulation?

top just doesn't look the same without the changes trickling down the screen,
matrix like..

Thankfully I can run GlassTTY font connected to the KVM serial console for a
near approximation.. but it's still too fast :)

Grew up in the VC/GUI transition era, but buying a vt220 and running it at
19200 on a used Vax taught me a Zen of command line that nothing else could...
Not only did you have to think about what the command would do, but also how
much text it would display, and whether you'd need to reboot the terminal
after it got hosed up...

~~~
acuozzo
> So, on the other side, anyone want to build a true 'terminal emulator' that
> has baud-speed emulation?

So this isn't exactly the same, but I improved my Vim muscle memory
considerably by running the MS-DOS version of Vim inside of DOSBox, reducing
its speed of emulation to the lowest option, and finding the most efficient
ways to edit files at a decent pace by making use of the best command
sequences possible at any given point in time.

------
cannam
Interesting article, but I don't quite get it.

I'd imagine that terminal users will often be looking at the last bit of
output as they type, and hardly looking at the thing they're typing at all
(one glance before hitting Return). They aren't going to notice a bit of
latency. And terminals are often used to communicate over networks that
introduce a lot more latency than any of the measurements here.

I think, for me, this is a bit like the sales pitch for Alacritty -- fastest
terminal ever as long as you don't mind that it doesn't scroll. Someone is
using their terminal very differently from the way I use mine.

~~~
blunte
It depends how you use a terminal. If you type a lot of commands, and you use
emacs keys to jump around within the current line you're typing (like ctrl-p
to go up to the previously entered command, ctrl-a to jump to the beginning of
the line, replace the first word, etc.), those latencies add up.

Basically we want the stuff in our brains to be able to manifest itself as
fast as we're thinking it. Already having to drive human hands to make this
happen is a big penalty; we don't need unnecessary extra latency if it can be
helped.

Still waiting for that neural interface... plug me in please.

------
iClaudiusX
I found this note in the appendix interesting with respect to why we don't
seem to notice this latency

> Terminals were fullscreened before running tests. This affects test results,
> and resizing the terminal windows can and does significantly change
> performance (e.g., it’s possible to get hyper to be slower than iterm2 by
> changing the window size while holding everything else constant).

Perhaps we don't notice because it's so much lower at window sizes much less
than fullscreen?

------
portlander12345
Slightly off-topic, but this reminded me of a mystery: Does anyone else
experience that bash takes a long and variable time to start, on many systems
and without any fancy setup of any kind? What can a shell be doing that it
takes three or four seconds to start?

~~~
cturner
Is part of this an experience of cygwin? Forking is slow under cygwin. If you
can get to a recent Windows 10 with local admin rights, you can install
windows-services for linux, and bash should perform well even on ten-year-old
hardware.

If you getting unusual slowness in linux/BSD, some things to check: (1) any
significant overhead on the system that could be making forking slow; (2) some
option in your $HOME/.bashrc that is adding latency, e.g., indexing options or
dynamic stuff in your PS1; (3) unusual filesystem stuff, e.g., your home
directory is mounted over NFS.

If you're in linux and it is reproducible, run strace against a bash launch,
and ctrl+c it when it gets stuck. Have a look at the recent system calls.

~~~
portlander12345
This is on Mac OS. I might try the tracing idea though.

------
chubot
How did he actually measure the latency in this article?

Doesn't measuring keypresses to display latency require special hardware?

 _All tests were done on a dual core 2.6GHz 13” Mid-2014 Macbook pro. The
machine has 16GB of RAM and a 2560x1600 screen. The OS X version was 10.12.5.
Some tests were done in Linux (Lubuntu 16.04) to get a comparison between
macOS and Linux. 10k keypresses were for each latency measurements._

 _Latency measurements were done with the . key and throughput was done with
default base32 output, which is all plain ASCII text. This is significant. For
example, terminal.app appears to slow down when outputting non-latin unicode
characters._

------
wallstquant
It would also be great to see how these scale with terminal size. I personally
use iterm2 but after switching to a new Macbook pro this year with two 5k
displays it's noticeably slower. I'm assuming some O(n^2) scaling behind the
scenes but I haven't measured anything myself. Still, @gnachman I love your
term especially with terminalplots.jl and drawing Julia repl images inline.

------
victorhooi
I want to try Alacritty on OSX - but the big turnoff for me is the lack of
binaries.

However, I don't know if that's intentional, because they don't think it's
ready yet for people who won't install/compile the whole Rust stack from
scratch?

~~~
steveklabnik
> Precompiled binaries will eventually be made available on supported
> platforms. This is minimally blocked on a stable config format. For now,
> Alacritty must be built from source.

[https://github.com/jwilm/alacritty#about](https://github.com/jwilm/alacritty#about)

------
wmf
I guess this isn't measuring end-to-end latency which would be discretized in
units of frames and would have a constant overhead from the display pipeline.
I wonder if the differences between terminals would look much smaller if
measured that way.

------
chuckdries
I'm surprised Hyper did as well as it did on the latency test, after all it
does run in Electron, which I'd expect to add a lot of overhead between
keypress and text display

~~~
Tyriar
You should expect latency similar to a web app in Chrome, just that this has a
backing pty processing the input and output.

------
darklajid
> even the three year old hand-me-down laptop I’m using has 16GB of RAM

Oh my.. I wish that was the case. Even current development machines tend to
have just 8 around here..

~~~
cannam
I have a (three-year-old) laptop with 16GB at work and another (three-year-
old) laptop with 8GB here, and speaking as a C++-etc developer, I never notice
any difference. Maybe if you use a substantial number of VM environments.

------
aorth
Interesting to see that both alacritty and Terminal.app are very fast but
running tmux inside them kills performance.

~~~
busterarm
I've never once worried or felt bothered by performance issues while running
iTerm2, but I have in Terminal.app and I'm a heavy/complex tmux user.

------
mrbill
I wonder how much performance hit font antialiasing in iterm2 causes, or if it
was turned on during these tests.

~~~
gjvc
font antialiasing on the original OS X completely destroyed performance in the
terminal and gave the impression that that the whole system was slow.

------
tome
How can st be so slow? It's tiny and hardly does anything!

~~~
wmf
Maybe it's doing hardly anything one character at a time.

------
adekok
Latency is why I could never use a WYSIWYG word processor. While I don't like
VI that much, it's latency is low enough that it's not a problem. i.e. I press
a key, and miracle of miracles, _the character appears on the screen_.

Using a WYSIWYG word processor, there's enough latency between keypress and
visual update that I find them impossible to use.

When Apple came out with Pages, it was apparent that they paid strong
attention to latency. That means the latency is small enough that (for me)
using it isn't an exercise in frustration.

~~~
glitch003
> I press a key, and miracle of miracles, the character appears on the screen.

As long as you press "i" first! :)

