Hacker News new | past | comments | ask | show | jobs | submit login
Terminal and shell performance (danluu.com)
404 points by darwhy on July 18, 2017 | hide | past | favorite | 204 comments

iTerm2 author here.

I'll spend some time looking into iTerm2's latency. I'm sure there are some low-hanging fruit here. But there have also been a handful of complaints that latency was too low—when you hit return at the shell prompt, the next frame drawn should include the next shell prompt, not the cursor on the next line before the new shell prompt has been read. So it's tricky to get right, especially considering how slow macOS's text drawing is.

If I could draw a whole frame in a reasonable amount of time, this problem would be much easier! But I can't. Using Core Text, it can easily take over 150ms to draw a single frame for a 4k display on a 2015 macbook pro. The deprecated core graphics API is significantly faster, but it does a not-so-great job at anything but ASCII text, doesn't support ligatures, etc.

Using layers helps on some machines and hurts on others. You also lose the ability to blur the contents behind the window, which is very popular. It also introduces a lot of bugs—layers on macOS are not as fully baked as they are on iOS. So this doesn't seem like a productive avenue.

How is Terminal.app as fast as it is? I don't know for sure. I do know that they ditched NSScrollView. They glued some NSScrollers onto a custom NSView subclass and (presumably) copy-pasted a bunch of scrolling inertia logic into their own code. AFAICT that's the main difference between Terminal and iTerm2, but it's just not feasible for a third-party developer to do.

> If I could draw a whole frame in a reasonable amount of time, this problem would be much easier! But I can't. Using Core Text, it can easily take over 150ms to draw a single frame for a 4k display on a 2015 macbook pro.

Holy cow! I wonder if iTerm2 would benefit from using something like pathfinder[1] for text rendering. I mean, web browsers are able to render huge quantities of (complex, non-ASCII, with weird fonts) text in much less than 150ms on OS X somehow; how do they manage it? Pathfinder is part of the answer for how Servo does it, apparently.

[1]: https://github.com/pcwalton/pathfinder

> Using Core Text, it can easily take over 150ms to draw a single frame for a 4k display on a 2015 macbook pro.

I'm guessing that you're somehow preventing Core Text from taking advantage of its caching. Apple's APIs can be a bit fussy about their internal caches; for example, the glyph cache is (or at least used to be) a global lock, so if you tried to rasterize glyphs on multiple threads you would get stalls. Try to reuse Core Text and font objects as much as possible. Also check to make sure you aren't copying bitmaps around needlessly; copying around 3840x2160 RGBA buffers on the CPU is not fast. :)

For a terminal, all you really need to do for fast performance is to cache glyphs and make sure you don't get bogged down in the shaper. Pathfinder would help when the cache is cold, but on a terminal the cache hit rate will be 99% unless you're dealing with CJK or similar. There's a lot lower hanging fruit than adopting Pathfinder, which is undergoing a major rewrite anyway.

(I'm the author of Pathfinder.)

Thanks for the response. I have beat my head against a wall for months trying to get to the bottom of this. I already tried and abandoned using multiple threads within drawRect:, and I agree there is a global lock :). For the case I benchmark there are no bitmaps besides NSFillRect on the background color (which is surprisingly slow, but unavoidable without doing something weird like drawing background colors on a separate layer).

I've confirmed that I always use the same NSFont in the attributed string used to create the CTLineRef.

If you're curious, the text drawing code is in drawTextOnlyAttributedStringWithoutUnderline:atPoint:positions:backgroundColor:graphicsContext:smear: and is located here: https://github.com/gnachman/iTerm2/blob/master/sources/iTerm...

If you're able to spare some cycles, please get in touch with me. gnachman@gmail.com.

"iTerm2 author here"


"(I'm the author of Pathfinder.)"

I keep thinking that sometime soon the magic of HN will wear off, or we will hit "peak hackernews" or something like that.

Not today!

Same reason I love HN. We don't have to guess what the creator was thinking. They'll tell us :-)

Another recent new contender: http://sluglibrary.com

This draws the curves directly in the pixel shader.

I have a bunch of iTerm windows open on my MacBook Pro, I don't want my battery life to suffer

I can't see how it would. The GPU has to be accessed either way, so doing it directly should be the same or better than through an OSX API.

I'm home and was able to run some benchmarks. Looks like 3.1 has significantly better latency than 3.0 did, although there's still room for improvement to reach Terminal's performance. My test results are here: http://iterm2.com/misc/latency/

Off topic — GitHub should use this tool [1] to analyze the latency of Atom.

TextEdit - http://i.imgur.com/RIDBuKP.png

Xcode - http://i.imgur.com/PYFLOxH.png

SublimeText - http://i.imgur.com/ZhnQR1v.png

CotEditor - http://i.imgur.com/J9TPiO6.png

[1] https://github.com/pavelfatin/typometer


I've had no problems with latency, but I'd like to add something not in the original article as a latency-adjacent consideration: stability. Stability is an incredibly important attribute, since the output rate of a dead terminal is zero.

I switched to iTerm2 recently after repeated Terminal.app slowdowns and crashes, and have no issues doing the exact same things I did in the previous application. Good work.

> I switched to iTerm2 recently after repeated Terminal.app slowdowns and crashes

Wow, what were you doing? I think I've seen maybe one Terminal.app crash in the past few years, and I'm a very heavy terminal user.

I second that - I've never seen Terminal.app outright _crash_. The one thing that will hang it is copying MBs of text out of the buffer, but even then it doesn't crash (just stall for a long time).

I had the same problem. After using terminal.app exclusively for literally 13 years it became unstable and unusable for me on the new Touch Bar Mac and I switched to iTerm2. That is a noticeably slower program, especially with a big scroll back, but at least it's stable!

I have no idea what the cause of the instability was but I couldn't fix it and couldn't tolerate it.

You might be happy to know that macOS 10.12.6 (released today) says it "Improves the stability of Terminal app".

I had a similar issue recently with a touch bar mac. It would crash pretty reliably when deleting lines during interactive rebases. This was in tmux with EDITOR=vim, hitting dd.

I got multiple segfaults from Terminal.app when scrolling back in bash history inside tmux for very long commands (ones which would wrap). This was only on the recent MacOS beta & I haven’t seen it since the last point release.

Looks like macOS 10.12.6 (released today) says it "Improves the stability of Terminal app".

had this problem extensively recently using ssh when text fills the entire terminal.

What text rendering approach does Terminal.app use, then? Is it also some custom thing? Maybe it does its own font caching - given that the terminal uses fixed-size fonts with simple fonts (fixed-width, no kerning, ligatures, etc.), it seems quite believable that they would avoid the overhead of CoreText (which has to handle a LOT more font complexity).

With regards to the shell prompt, I'm almost positive that Terminal.app is using some heuristic to read the prompt after a <return>. A little experiment with a tunable spinloop and a fake prompt in C suggests that Terminal.app waits for about 1ms after a return character before updating the screen: a delay of 950us produces an "instant" prompt, while a delay of 1050us shows the cursor at the start of the next line. As the article notes, a 1ms delay is not really noticeable, and that kind of delay only has to happen in a handful of situations.

Nope, they use core text, same as me. And they do support ligatures, if the font has them. 1ms delay sounds like a reasonable heuristic.

I've never seen Terminal.app use ligatures. I just tested with Lucida Grande, which has ff/ffi/ffl ligatures, and Terminal.app didn't use them at all (it just spaced the characters out on a fixed-width grid). Can you give some examples where Terminal.app will use ligatures?

It uses ligature level 1. fi, etc., are level 2. Try a font like FiraCode to see ligatures on mostly punctuation like ==.

gnachman Thanks for Iterm2. We have many developers using it every day.

As far as I keep fingers crossed for your efforts, you will not achieve pure text console speed (or even be close to it), linux one the people can easily compare to iterm.

There is a lot of reasons, mostly: font rendering, unicode handling, all effects, while pure text is simply copying memory areas and nothing more.

full answer: https://unix.stackexchange.com/questions/41225/can-a-termina...

You only need to be as fast as one screen refresh period, 1/60th of a second usually, plenty of cycles on modern cpus. Many games that are very complex graphically do it without problem. Maybe an opengl or similar renderer would alleviate the actual drawing of the glyphs on the screen.

   You only need to be as fast as one screen refresh 
   period, 1/60th of a second usually, plenty of 
   cycles on modern cpus. Many games that are very 
   complex graphically do it without problem
Sorry, no. You're totally conflating two orthogonal concepts: throughput and latency.

Rendering 60fps (or even 1,000fps) is not remotely the same as achieving low latency.

Even if a game is rendering at 60fps, input latency (the time between a user clicking a button and something happening on the screen) is often over 100ms.

This article breaks it down... there's a link to it in the original story about terminal latency posted by OP:


Some of the listed problems are throughput problems, not latency problems (both in this thread and the original article), so it's not a mistake to address both types of performance problems. And while throughput and latency might be orthogonal concepts, they're not nearly so orthogonal in practice when optimizing. Reducing e.g. an average 48ms frame time, or a 100+ms frame stall, will improve both throughput and worst case latency. There are a few optimizations which focus more exclusively on one or the other, but my experience is that they tend to be rarer.

Your reply also reads like games are getting away with introducing 100s of milliseconds of additional latency, which just isn't the case for the most part. There is hardware and kernel buffering that userland software can do little about, admittedly, that can easily add up to getting you above 100ms as you say. Even there, some improvements can be made:

Gsync reduces latency by getting rendered frames displayed as quickly as possible, by tying refresh rates to frame rates. "Time warp" reprojects rendered scenes shortly before refresh to help reduce latency between head movements and screen movements in VR - but this is also effectively "improving" throughput and worst case latency, by ensuring there's always a new head-rotated frame - even if the main scene renderer wasn't able to provide a full scene update yet. There's some talk of going back to single buffering and chasing the beam, although I'm skeptical if that'll actually happen for anything but the most niche applications.

A high quality GL renderer would be lovely. It would need to be very capable, and a lot of the time core text spends is on layout rather than rendering, but at least for the plain-ASCII case this would be a huge win. OTOH, it would always look a little off.

Nowadays, though, very few consoles actually use the display hardware in text mode.

Moreover, text mode display hardware does not absolve one of Unicode handling.

Good to read you are going to spend some time on it. Because iTerm2 is my favourite on OSX and the different vs terminal.app surprised me quite a bit.

Will you be releasing beta version or users to test or any updates will go to the main release version?

I just wanted to say thanks, for taking the time to dig into this recent issue: https://gitlab.com/gnachman/iterm2/issues/794 in regards to IO lag that is pretty relevant to this discussion. I was pretty surprised to realize how much faster Terminal was in those tmux side-by-side videos.

Alacritty has an interesting reason for tearing:

* https://github.com/jwilm/alacritty/issues/598

Thanks for iTerm2. When I moved from Linux to macOS, I installed it and loved it. It's one of those apps that I simply always have open.

Any thoughts on rewriting iTerm2 in Rust?

I don't honestly see how that would help. A rewrite is a lot of work, and it doesn't seem the bottlenecks here are language-specific anyway.

I think one thing that this really points out is just how much care Apple has poured into Terminal.app. It's very good, and every time I have to use another terminal application (ugh conhost.exe) I am reminded of this. It's got a bunch of really thoughtful little features (showing what processes are attached to the current pty, option-clicking to move the cursor rapidly, full mouse support for apps like vim, good linewrap detection, and recently support for rich-text copy/paste which is useful for showing coloured terminal output, etc. etc.), and it remains really fast and snappy despite these features.

On a related note, I am big into latency analysis and driving down latency in interactive systems. I'm quite familiar with the touchscreen work cited at the top, and having played with the system I can attest that <1ms latency feels actually magical. At that level, it really doesn't feel like any touchscreen you've ever used - it genuinely feels like a physical object you're dragging around (the first demo of the system only let you drag a little rectangle around a projected screen). It's amazing what they had to do to get the latency down - a custom DLP projector with hacked firmware that could only display a little square at a specified position at thousands of FPS, a custom touchscreen controller, and a direct line between the two. No OS, no windowing system, nada. After seeing that demo, I can't help but believe that latency is the one thing that will make or break virtual reality - the one thing that separates "virtual" from "reality". I want to build a demo someday that does the latency trick in VR - a custom rig that displays ultra-simple geometry that has sub-millisecond latency to human head movement. I will bet that even simple geometry will feel more realistic than the most complex scene at 90 FPS.

"It's got a bunch of really thoughtful little features (showing what processes are attached to the current pty, option-clicking to move the cursor rapidly, full mouse support for apps like vim, good linewrap detection, and recently support for rich-text copy/paste which is useful for showing coloured terminal output, etc. etc.), and it remains really fast and snappy despite these features."

On the one hand, I have to agree that Terminal.app is quite good and very impressive. I don't bother with a third party terminal application and I do everything in the terminal.

However, one of the very valuable things about working in the terminal is the safety and immunity that it provides. No matter what bizarro virus attachment you send me, I can text edit it in a terminal without risk. There's nothing you can paste to me in irc that will infect or crash my computer.

Or at least, that's how it should be.

But the trickier we get with the terminal - the more things it does "out of band" and the more it "understands" the content of the text that it is rendering, the more of this safety we give up.

Frankly, it bothers me greatly that the terminal would have any idea whatsoever what text editor I am running or that I am running a text editor at all. It bothers me even more to think that I could copy or paste text and get results that were anything other than those characters ...

Make terminals fancy at your peril ...

> Frankly, it bothers me greatly that the terminal would have any idea whatsoever what text editor I am running or that I am running a text editor at all.

I'm not sure what you mean by this. Terminal.app doesn't know that you're running a text editor. It does know that you're running a process called 'vim', which is kind of magic, but not too much (ps has always been able to show what processes are attached to a given tty, for example). If you're referring to the parent comment's "full mouse support for apps like vim", they just mean it supports "mouse reporting" control sequences, which date back to xterm. If anything, Terminal.app is late to support this (only in the latest release, whereas alternatives like iTerm have supported it for ages).

> It bothers me even more to think that I could copy or paste text and get results that were anything other than those characters ...

Well, the terminal has to interpret escape sequences for colors and such in order to display them, so why shouldn't it also preserve that metadata when copying and pasting? Like any other rich text copy+paste, it will only be kept if you paste into a rich text field; discarded if you paste into a plaintext field.

That said, there are a few 'odd' things Terminal.app supports: e.g. printf '\e[2t' to minimize the window. (This also comes from xterm.)

>On a related note, I am big into latency analysis and driving down latency in interactive systems.

I'm afraid there aren't many of us that care about the subtle details of interaction that mean the most to one's experience. Working in audio, I know the difference between hitting a button and hearing a sound 40ms afterwards, and 4ms afterwords. I would much prefer to use the 4ms, even if it means sacrificing half of the system's features.

I feel like such a product will never reach the market, because the market will think they need lots of features, which results in a sacrifice of latency and other UI consistencies. There's always some developer writing the weakest link of an otherwise perfect system. For example, the CPU/GPU hardware, kernel, and browser's accelerated rendering are all engineered with millions of man-hours to be as blazingly fast as possible, and then a web developer comes along and puts a single setInterval() call in their online game or something, and all the optimization benefit goes to the trash. Or, because animation is a trend in UI design right now, developers purposely put in hundreds of milliseconds of delay between common actions like minimizing windows, switching desktops, scrolling, opening/closing apps on mobile, etc.

Basically in order for your dream of true virtual reality to be achieved, the principles and respect for low latency has to be maintained across the whole system's stack, especially the higher-level parts.

Well, Apple clearly does care, or Terminal.app wouldn't have such a low input latency (it's legitimately hard to engineer!).

I've dabbled in music, and I can fully agree with the audio latency comment. Human hearing is exquisitely tuned for latency (probably since it's integral to direction-finding), so even the slightest delay is noticeable. Hitting a note in an orchestra even a few ms late makes you stick out like a sore thumb.

I can't disagree more, gnome-terminal is fantastically better and it's not the only one. Faster, never gives me disk contention like Terminal.app does (unless I explicitly cause it), has TrueColor (it's ironic that the design heavy Mac terminal does not have full color support), supports far more active sessions at a usable speed, I can go on and on.

iTerm2 is vastly better on mac, but still far inferior to gnome-terminal.

In my opinion, terminal.app with transparency feels way faster than iTerm without transparency. Gnome-terminal is fine, but I dislike the default colours etc (but that's just me).

And: what's with "disk contention"? How do terminals do that and what's relevant to the discussion?

Oh definitely, see my comment here about an excellent talk by John Carmack on this very subject (end-to-end hardware/software latency in VR):



I think both Carmack and Abrash have written about latency experiments with VR doing very simple setups with simple geometry or stripe patterns. I don't recall where, but if you can dig those things up they might give you some more ideas.

Have they gotten meta working yet? Enabling meta broke input of many normal characters especially on nonenglish layouts last I saw.

We can do better in Alacritty. For those interested, I've filed a bug on our issue tracker about where this latency is coming from and what can be done: https://github.com/jwilm/alacritty/issues/673

At the end of the day, there is a trade off to be made. Terminals (or any program, really) can have 1-frame input latency (typically 1/60sec) and give up v-sync and tearing results, or they can have a worst-case 2-frame input latency with v-sync, and then you're looking at 2/60sec or ~32ms.

Triple buffering solves the latency issue of vsync latency by trading off buring additional cpu/gpu time, bringing the worst-case latency back to 1 frame.

I don't think that's correct.

The way I understand it tripple buffering adds latency in exchange for higher sub display hz framerates.

Double buffering renders the next frame while displaying the current. That results in latency of 1 frame since input. Triple buffering adds another frame to the queue, resulting in a 2 frame lag.

With double buffering the framerate gets cut in half if it cannot meet vsync, with triple buffering it can also get cut in thirds. So double buffering is 60 -> 30, where the frame lasts 2 refreshes. Triple is 60 -> 40, where one frame is displayed for 1 refresh and another is displayed for 2.

Nowadays it's probably better to use adaptive vsync, which simply disables vsync when the framerate drops. This will reintroduce tearing, which might be preferable in fast action games.

Triple buffering confusingly means different things. Both you and the OP are correct. The OP meant something in line with what Wikipedia says: https://en.wikipedia.org/wiki/Multiple_buffering#Triple_buff...

"In triple buffering the program has two back buffers and can immediately start drawing in the one that is not involved in such copying. The third buffer, the front buffer, is read by the graphics card to display the image on the monitor. Once the image has been sent to the monitor, the front buffer is flipped with (or copied from) the back buffer holding the most recent complete image. Since one of the back buffers is always complete, the graphics card never has to wait for the software to complete. Consequently, the software and the graphics card are completely independent and can run at their own pace. Finally, the displayed image was started without waiting for synchronization and thus with minimum lag.[1]

Due to the software algorithm not having to poll the graphics hardware for monitor refresh events, the algorithm is free to run as fast as possible. This can mean that several drawings that are never displayed are written to the back buffers. Nvidia has implemented this method under the name "Fast sync"."

It annoys me that most games don't seem to offer this option - if they have this level of control, they just offer vsync on or off. The AMD gpu control panel can force triple buffering, but only for OpenGL, and I think the vast majority of games on Windows use DirectX.

Triple buffering uses more display buffer memory, but roughly the same cpu/gpu load as vsync-off. It's great for latency. It makes a whole lot of sense. It's been around for like 20 years. But you rarely see it used ...

Let's say we suffer from some form of insanity and decide to create a full-screen, GPU accelerated terminal. Would it be possible to use GSync/Freesync for variable refresh rates?

Interesting results. I have loved XTerm for a long time because it "felt snappy". On MacOS I've always preferred Terminal.app to the often recommended iTerm2 for similar reasons.

I think it's funny to have the suckless project page for st go on and on about how XTerm is clunky and old and unmaintainable, but the result of this small and clean minimalist terminal is a closer loser in terminal performance, which subconsciously and consciously detracts from the experience.

Sometimes it is easy to be faster than everyone else by simply having less code. Sometimes you need to think about how you can do as little as possible instead.

XTerm has the logic for handling partial screen updates and window obscuring other terminals don't bother with because it was written in an era that these weren't mere 10-50msec delays but 100+msec delays; Anyone who used dtterm on a sun IPX knows what I'm talking about

I'm also a Terminal.app user.

I'm amused to find that XTerm still include a Tektronix 4014 emulator.

OTOH Alacritty can supposedly redraw the entire terminal in 2 ms, so perhaps those tradeoffs have changed over the last few decades.

An interesting side effect of Alacritty's choices:

> alacritty and terminal.app are fast enough that they’re actually limited by the speed of tmux.

FWIW, We're working on adding native support for scrolling. Someone has actually made an attempt at it recently: https://github.com/jwilm/alacritty/pull/657

Initial testing has shown it not to (noticeably) impact perf in our highly unscientific benchmarks.

Ah great!

I'm a native Windows user these days, so I can't use it quite yet, and as such, have fallen behind the time on news.

That's awesome! No scrolling is what prevents me from using alacritty. Hope this gets merged!

> On MacOS I've always preferred Terminal.app to the often recommended iTerm2 for similar reasons.

When new Mac users ask for general app recommendations, they often seem to get immediately steered away from Terminal.app and into iTerm. I'd understand this phenomenon if T.app was horrible, but it's rather good!

(I have a theory about the long shadow of Internet Explorer causing the use of stock OS apps to subconsciously feel passè)

That's ecosystem inertia: in the days of yore Terminal was pretty damn terrible, it only got tabs in 10.10 (Sierra) and xterm-256 in Mountain Lion (10.8), so people got into the habit of recommending iTerm 2 and then kept doing that even as Terminal became usable (though not necessarily great).

And iTerm still has feature edges e.g. truecolor, or better multiplexer support (Terminal only supports vertical pane splitting).

> it only got tabs in 10.10 (Sierra)

I'm 90% sure that's not accurate. It's had tabs for quite a while.

Yes I distinctly remember using Terminal's tabs on snow leopard (10.6), confirmed by [0]. Also, I believe 256color was available in that release, though I'm less sure.

[0]: https://stackoverflow.com/questions/6736602/how-do-i-set-mac...

Tabs is way older than Sierra. I started using the terminal about 5 years ago and have always used tabs.

> it only got tabs in 10.10 (Sierra)

I don't think that's true. I definitely had tabs in Lion (10.7) and I'm pretty sure they're at least as old as Leopard (10.5) and maybe older.

iTerm2 also has Tmux integration: treat and control Tmux windows (and thus tabs) and panes as if they were native iTerm elements, and do so using the same keyboard shortcuts. Very very few other emulators do this (on any OS).

It got tabs in Leopard (10.5).

> "st on macOS was running as an X client under XQuartz, which may explain its high tail latency."

I think the problem with st on these charts is related macOS and XQuartz. The Linux st numbers are competitive with everything except Terminal.app and eshell.

Having st be simple does not necessarily make it fast. Sometimes there is also a tradeoff involved.

Here I don't really see what you mean since linux-st is the term-emulator with the lowest latency (only comparable to alacritty) so it looks like it's simple (though not that simple) and quite fast (on Linux).

Suckless isn't focussing on macOS, as Apple software clearly isn't targeting expert users, but rather the mass market of noob users.

If you double check the plots, you will notice that "linux-st" isn't performing bad at all. The author also suspects XQuartz as one reason for the higher latency, which makes absolutely sense.

I've really wondered that too: why is everyone recommending iterm? I'm glad it's just a matter of taste -- I'm perfectly happy with Terminal (and running a shell inside Emacs in my terminal :-).

iTerm has support for tiling terminal panes within a single window. You can drag any tab to subdivide a pane as much as you like.

It's handy for keeping "tail" or "watch" commands or similar visible — the same reasons people use tmux, tiling window managers, so on.

The UX isn't perfect, but it's useful enough that I've stuck with iTerm despite the lower performance (and bugs — it's pretty buggy, and the main author rarely seems to address Gitlab issues).

iTerm has other nice features. It can run without a title bar (saves space), it does cmd-click-to-open-file, and it has a lot of customization options. I don't really use most of the features; the tiling aspect is the main feature I rely on.

I love terminal panes. I'm not sure at this point what comes out of the box and what is custom configuration but I have keybindings for creating vertical and horizontal splits, and additional keybindings for navigating left/right/up/down.

My setup is to run MacVim on the left half of the monitor and then iTerm2 on the right half. iTerm is then split into generally three horizontal splits.

I think it's because terminal.app has caught up a lot without noticable speed hits, and some % of those recommendations might be out of date.

I love the deep aesthetic customization options (though they're really non-essential)

I love the tmux integration, I used tmux before anyway and it's honestly not that different if you used tmux's built in mouse support but focus follows mouse in terminal panes is a nice touch.

Displaying images inline is alright I guess but I don't actually use it that much.

There's a bunch of stuff listed on their features page that sound useful but I don't actually use (yet). Idk I suppose I haven't noticed any appreciable difference in speed.


Good to see these replies here because I've always thought I was crazy. A new bump would come out, I'd try it again, and immediately feel it was too slow before going back to Terminal.app almost immediately.

It's a matter of taste in that both have features that the other doesn't have (terminal being faster being it's main, but significant one).

As far as I know, in terminal you can't use cmd as meta key, which immediately kills it for me as an emacs user (furthermore in iterm2 you can set it up so that left cmd = meta, right cmd = cmd, which I find very useful).

You sure can (I assume you meant opt), and have been able to since the NeXT days. There are many Emacs users at Apple (look at the Emacs key bindings in the text widgets)

For me it allows me to make my terminal minimal or dare I say, downright beautiful, which is something I definitely cannot say for any of the other terminal emulators out there, save for a few newcomer Electron-based abominations like Hyper.

screenshot: http://imgur.com/a/mFWYC

For me it's a relatively trivial reason: my muscle memory has been trained over the years that cmd-[1-9] switches tabs, and there's no way to configure that in Terminal.app (last time I checked) without unstable SIMBL plugins.

I found terminal.app to be incredibly buggy on 10.12. I was experiencing more than one crash per day.

This is going to sound mean however I say it, so I'm not going to sugar-coat it: that's how pretty much all suckless software works. Software becomes fast via profiling and optimization, not by ritualistically applying dubious notions of simplicity (C is simple? since when? UNIX is simple? since when?).

As pointed out above, suckless isn't focussing on macOS/XQuartz users. The latency of "linux-st" isn't that bad according to the plots of the author.

In my view, simplicity often leads to better performance as a side effect -- but of course there are many exceptions.

Nevertheless, I wouldn't start optimising software unless the software is really unusable. Optimising software to look well in rare corner cases is not a good idea imho, if the price is adding a lot of complexity.

To be fair, suckless never claimed to care about performance.

"Slow and featureless" isd hardly an advertisement for good software.

"Minimalist", "easy to understand" might be.

Ironically a lot of suckless stuff kind of sucks. I don't know of any that are particularly worth using.

Seems some at least disagree but don't mention what parts of suckless are worth mention. Anyone care to elaborate?

the most common terminal benchmark I see cited (by at least two orders of magnitude) is the rate at which a terminal can display output, often measured by running cat on a large file. This is pretty much as useless a benchmark as I can think of

It's a really helpful benchmark, IMO, as it's the main problem I see with different terminals. On a chromebook, most SSH clients are effectively useless because if you accidentally run a command that prints a lot of output (even just 'dmesg'), the terminal locks up for a huge amount of time, seconds or even minutes. You can't even interrupt the output quickly.

I appreciate that it's a different problem to the latency that the OP is trying to measure, but as a benchmark, it's actually very useful.


> The closest thing that I care about is the speed at which I can ^C a command when I’ve accidentally output too much to stdout, but as we’ll see when we look at actual measurements, a terminal’s ability to absorb a lot of input to stdout is only weakly related to its responsiveness to ^C.

Later in the same paragraph he addresses your exact problem, and points out that the speed of quitting a command that prints too much output is poorly correlated with the speed of spewing output.

Given how easy it is to accidentally spew something that I don't want to wait for, even if it is spewing quickly, I'm squarely with him in not caring about the speed of display. Slow it down to just faster than my eyes can make sense of it, and make ^C fast, and my life will be better.

Can we create a metric specifically for that?

when SSHed into a remote machine, if I run a command that spews a lot of text, how quickly does the terminal respond to ^C, stop printing text, and return me to the prompt.

I looked into this quite a bit when I was optimizing xterm.js.

Based on my findings, ^C is highly related to the speed of the output because the process running in the shell may be way ahead of the terminal's parsing/rendering. Imagine you run `cat foo`, the shell could take around 1s to send the output over to the terminal, the terminal might then take 10 seconds to parse and render the output. So after 1 second a ^C will actually do nothing because the cat call has finished. This is the case with Hyper, it hangs due to slow parsing and too much DOM interaction (as hterm was not designed for this sort of thing).

There's actually a mechanism for telling the process to pause and resume (sending XOFF/XON signals), which allows the terminal and shell to stay completely in sync (^C very responsive). However, these only really work well in bash as oh-my-zsh for example overrides the signal with a custom keybinding. Related links:

Original PR: https://github.com/sourcelair/xterm.js/pull/447 Post-PR bug: https://github.com/sourcelair/xterm.js/issues/511

If the sending process blocks after filling the pipe (on Linux, 512 bytes), hitting ctrl-C should be effective—any of these terminals should be able to sink that in negligible time.

If it instead takes a long time, there are probably large buffers between:

* If you're talking about ssh to a faraway machine, the TCP layer is probably responsible. (I'm not even sure if there's anything you can do about this; the buffer (aka "window size" in TCP terminology, plus the send and receive buffers on their respective ends) is meant to be at least the bandwidth-delay product, and as far as I know, the OS doesn't provide an interface to tell it you don't need a lot of bandwidth for this connection. It'd be nice if you could limit the TCP connection's bandwidth to what the terminal could sink.)

* If you're talking about something running on the machine itself, it's probably an over-large buffer inside the terminal program itself.

Well, given that I mostly use the default shell for my system, which tends to be bash, I prefer shells that send XOFF/XON signals. And now I can pick a very responsive terminal over a fast rendering one. :-)

If this becomes popular enough, then zsh will figure out how to offer the feature that bash already does, and terminals will happily adopt it. Then everyone's lives are better! :-)

In my experience, the speed of a terminal responding to a ^C is highly correlated with the speed of its output. It's true that there's no need for a slow rendering terminal to also be slow at reacting to ^C, but it always tends to be the slow terminals that jam up.

I agree totally with the speed of display updates not really being important - if my terminal is spewing hundreds of pages of text, it doesn't matter whether it's redrawing at 50fps or 5fps. My hunch is that the slowest terminals are the ones that insist upon drawing every single character of output. They then end up with a huge buffer of text that needs to be rendered even though the ^C may have been sent and the noisy program has been terminated.

The author of the article just benchmarked a bunch of performance stuff about terminals under different conditions. He said that in his benchmarks they are not well correlated and gave a specific terminal which specifically demonstrates it.

I trust the author's benchmarks over your experience. (Doubly so since my experience matches the author's.)

This might be a dumb question, but why do terminals spew output anyhow? Past a certain point it's clearly unreadable. Once the terminal starts getting behind this is a UI failure IMO.

So why not (say) buffer it up somewhere, show a preview, show a "I would be spewing output right now, press <Space> to see what the last screen is, press Ctrl-C to stop, press 's' to just spew output". The speeds and amounts of data for which this happens could be completely configurable.

You can always put a limit on how much to buffer up (I'm not saying that this system should completely buffer output files until you run out of disk space), and sometimes 'spewing' is actually what we want.

Hardware terminals had this feature, ish.

The vt220 had a 'slow scroll' speed which would buffer text and scroll it at a viewable speed, limited by the quite small memory on the screen. It also had a 'pause' key which would pause the display and then continue the output (again limited by the terminal's memory). See also PC 'scroll lock' key.

You might want to read up on the design of mosh.

The way to tackle this is not to "buffer output up". In fact, the way to tackle this is the opposite of filling up buffers with output.

It is to decouple the terminal emulation from the rendering. Mosh runs the terminal emulator on the remote server machine, and transmits snapshots of state (using a difference algorithm for efficiency) at regular intervals over the network to the client, which renders the terminal state snapshots to the display on the local client.

Not a dumb question at all.

Various tools can and do deal with the spew though. Ofhand, script (typescript), screen, and tmux. If you're running a session through a serial terminal emulator (e.g., minicom), that would be another instance.

I'm not going to claim these are particularly mainstream uses (though I've made use of each of them, and been grateful for the ability to do so). But they do exist, and I suspect there are others.

That's what 'stty discard' was for, but apparently Linux doesn't implement it.

Reminds me of my 300/1200 baud days, when you couldn't live without it.

Was curious and tried this out a bit on Linux+X11 on an i7 6700k with the igpu:

             stdout [MB/s]  idle 50 [ms]
    urxvt             34.9          19.8
    xterm              2.2           1.9
    rxvt               4.3           7.0
    aterm              6.0           7.0
    konsole           13.1          13.0  note: stops moving when printing large file
    terminator         9.1          29.4  note: stops moving when printing large file
    st                23.0          11.2
    alacritty         45.5          15.5

Nice.. I've apparently gravitated myself to the fastest of the lot naturally..

Side note: was supervising some kid who absolutely couldn't believe that the reason his 'workstation was locking up' was that he was catting giant logfiles in a second pane of his single-process gnome terminal.. I told him to use Xterm or something else, since they spawned one process per window, and he tried for a while, but went back, and continued to complain, because just couldn't believe that the terminal could get bogged down, and further, missed his pretty anti-aliased fonts.


Can you paste or describe your measuring technique?

For the throughput I created a file with 1 GB of random source code and ran `time cat file`

Latency for input: https://github.com/pavelfatin/typometer

Could you try terminology as well?

I really like terminology :) Can you test yakuake and kmscon[0] also?

EDIT: [0] https://www.freedesktop.org/wiki/Software/kmscon/

If you're sensitive to latency and run Linux, try hitting Ctrl-Alt-F1, and do a little work in console mode at the terminal. (Ctrl-Alt-F7 to get back.)

For me this is a great illustration of how much latency there is in the GUI. Not sure if everyone can feel it, but to me console mode is much more immediate and less "stuffy".

I recommend this to every terminal user.

I'm a fast typer, but this is 2017 - computers are fast, right? Nope. On newest, maxed dell xps there is a HUGE latency difference I feel if I use pure linux console vs any graphical one (is it gnome/windows or mac).

Typing and working with pure text is really fast and you INSTANTLY feel and see the difference. Try it.

You may need to do some work to boot your console into text mode. If your linux console is using graphics mode even on modern systems it can still be a dog, especially at 4K. Presumably this is a result of using ancient lowest-common-denominator interfaces since my system has the raw power to run a snappy text-mode only terminal.

Oh do you have details on this? I recall doing this on an older machine and I would get an 80x25 terminal. Now when I do it, I get a high-res display. I guess the latter is in graphics mode?

It definitely has slow throughput at catting files, e.g. if I cat the output of "seq 100000". The latency seems better though. Probably not as good as text mode.

I honestly don't know what text/console mode even is. I know there is a VGA "spec" -- I think all graphics drivers for PC-compatible devices have to support VGA and text mode? Or is it part of the BIOS?

Not an expert, but basically 'text mode' entails something like directly putting the text characters into a memory buffer, and the monitor rendering those in hardware via onboard fonts.

The resolution change is likely the result of Kernel Mode Setting (KMS). This is the ability for the linux kernel itself to set the resolution of the display, among other things.

The easiest way to turn it off is to add "nomodeset" to your kernel commandline.

The correct term, by the way, is not "text mode", but rather "a virtual console" or "virtual tty" or such.

Hopefully that gives you enough search terms to learn more.

Actually, the correct term is text mode. It is precisely whether the display hardware is operating in text mode or graphics mode that is the subject of discussion here. With the hardware in text mode, even without CRTC tricks, scrolling is relatively fast compared to graphics mode. Scrolling the display in graphics mode involves shifting much more data around, as does rendering individual characters.

The confusion arises because people, as you have done, erroneously conflate the kernel virtual terminals with "text mode". In fact, the norm nowadays is for kernel virtual terminals to use graphics mode. It permits a far larger glyph repertoire and more colours, for starters, as well as things like software cursors that can be a wider range of shapes and sprites for mouse pointers.

Yeah, text-mode latency (or lack thereof) is incredible. For the most jarring difference, try the Ctrl-C trick mentioned in article: `cat` a huge text file that takes many seconds to read from disk in your gui console, wait a couple of seconds, hit Ctrl-C, and compare that to the same exercise in the console.

Most "modern" GUI terminals make this artificially fast by not rendering every flushed buffer; instead, they simulate what ought to be on screen, and then render at 60Hz or whatever your refresh rate is.

The point being that it's not measuring latency, nor throughput of the rendering, but rather throughput of the emulation.

You're describing exactly what makes them feel slow. All of that buffering is adding latency. Everything is buffering: the screen, the GPU, the OS, the window manager, the shell.

People notice the extra latency though. I sure do. I remember what it's like to have a CRT getting photons displayed nearly immediately after a keystroke. That's exactly what makes an old 286 feel snappier than a 2017 macbook pro while typing.

I remember how slow it was accessing text mode video RAM on the ISA bus compared to constructing a display in main RAM, and indeed buffering changes there and rendering them across en bloc to video RAM. Buffering is not a recent invention.

I would really like to use just the linux virtual terminal. Unfortunately it has no unicode support and there is an unsolved issue with amd GPUs that causes the switching from a vt to X.org to take an unbearable amount of time.

I have to look into Kmscon which seems promising.

Is it a GPU issue? I have an NVidia GPU and I still wouldn't regularly hit Ctrl-Alt-F1/F7 to switch tasks.

Estimating, it takes at least 500 - 1500 ms. I have no idea what's happening here... if it's an X thing, a driver thing, etc.

I don't recall it being fast on any machine I've used recently. It's at least 100x slower than it should be to be usable -- it should be around 5 to 15 ms, or even less.

In my case the delay is much worse: around 5s. It seems to be related to DPM. The only description of the problem I have found is here: https://bugs.freedesktop.org/show_bug.cgi?id=92982f

I wonder how many folks use linux virtual consoles for day-to-day work. I guess it's not practical for web developers given that you probably have to switch to X to check results with a graphical browser, no?

I do :)

Specifically I run kmscon on one virtual terminal + tmux for scrolling/tabs, then an X server running chromium on another. I'm not a web dev, but I am continuously having to switch between the two for tracking merge requests, ticket status, testing via our frontends etc.

It's not bad, but kmscon is pretty much abandoned at this point, and I don't know of anyway to have a setup like this run nicely on multiple monitors. It was meant to be just an experiment at an über-minimal setup, I was planning to switch back to my previous i3 + st based setup after a month or so, but now it's been most of a year and I'm still using it for some reason.

I think the big thing I really enjoy about this setup is the complete lack of window management. Even the minimal window management I had to do with i3 (1-2 terminals running tmux + 1-2 browser instances) is gone. It feels like that's removed a small unnoticed stress point from my work day. If I ever get round to setting up a window manager again I think I'm going to try and keep it limited to 1 terminal + 1 browser instance and rely entirely on their internal tab support.

That sounds interesting. I'm having trouble visualizing what you are doing though.

Are you running a distro or did you build this setup yourself?

What role does kmscon play? I think you could just run raw tmux in one VT and X in another? Although to me it seems slow to switch between the two.

It seems like Linux should support the multi-monitor setup as I said in a sibling comment -- maybe I will take some time to investigate it.

Was latency once of your considerations, or was it mainly lack of window management?

If you have time a screenshot would be helpful :)

I'm running Arch linux, I just installed kmscon and enabled the kmsconvt@tty2.service systemd service to have kmscon take over the second VT. It includes its own login manager that replaces getty. I then also login to the linux console on VT 3 and manually launch X from there (with just chromium in my .xinitrc).

kmscon has better handling for colors, fonts, etc. than the linux console; that's the only real reason I'm using it. On my laptop's builtin display I have no delay switching between any of the linux console/kmscon/X; when I plug in an external monitor I do get ~1-2 second delay switching from the linux console/kmscon -> X, no delay the other way.

There does appear to be some bug with switching from X -> kmscon, it just shows a black screen, but I've gotten used to switching X (VT 3) -> linux console (VT 1) -> kmscon (VT 2) which seems to work around that. There's also another bug where the Ctrl key seems to get stuck down in kmscon when switching to it sometimes, has only happened ~4 times in the last 8 months and I can fix it by just running `systemctl restart kmsconvt@tty2` and attaching to my tmux session again.

Since I'm not doing any frontend changes I don't ever really need to look at both my terminal and browser at the same time so haven't taken the time into seeing if I can have different VT displaying on different monitors. I prefer the portability of a laptop over having the most productive single location setup.

Latency was not at all a consideration, it was purely an exercise in how minimal a setup I could have. I spent a couple of weeks without having X installed and using command line browsers when I needed to, but using GitLab and JIRA through command line browsers was a real pain (and if I recall correctly some stuff was impossible).

A screenshot is difficult since it's multiple VTs and I don't think kmscon has any kind of builtin screenshot support. Just imagine a full-screen terminal with tmux, you hit Ctrl-Alt-F2, now it's a full screen browser; that's basically it.

One other thing I do have setup to make life a little easier are some little aliases syncing the X clipboard and tmux paste buffer so I can copy-paste between X and my terminal. And I have DISPLAY setup in the terminal so things like xdg-open and urlview can open stuff in my web browser.

One thing I just thought of: maybe you can run dual monitors, but one is text mode, and one is graphics?

That is, the X server would only know about one monitor. But the kernel would know about both, and it could run processes connected to a TTY which writes to the second monitor. Rather than a TTY connected to an xterm connected to the X server. (I think that is the way it works)

This goes back to my question: is text mode part of the graphics driver or part of the BIOS? I assume the BIOS has no knowledge of dual monitors, but my knowledge is fuzzy there.

I used to use text consoles for work and X only for the browser for many years. I switched only to terminals in X when I hacked up a rxvt enough to be very close to the text mode console (including the font and pretty closely matching GPM for cut-paste). It is generally responsive "enough", though for years I keep blabbing on about figuring out some way to check the latency.

rxvt versions newer than 2.7.1 and, more recently, rxvt-unicode seem to have some other issues that make them really slow (particularly non-bitmap font rendering), but rxvt-unicode supports mixing bitmap and other (Terminus) fonts which seems to be a reasonable solution.

As far as I know Linux VTs were never meant to be used for real work, but as a last resort in emergency situations. Fonts for example are limited to 256 glyphs, which means most non-English languages are only partially supported.

Linux VTs is all we had for a long time. Even if X was around, not everybody had computers powerful enough to comfortably run X. I had a computer that would swap just by having X open with fvwm.

Mmm. If you want an example of a console that seemingly was never meant to be used for real work, try the old Sun SPARCStation Solaris consoles. Those were implemented using the boot PROM's graphics drivers which were written in Forth and not exactly optimised for speed. https://www.youtube.com/watch?v=ntJmmI6iIEc has an example starting at about 0:50 -- it used to feel like you could practically watch the cursor moving from left to right as it printed longer lines... (I think Sun assumed you'd either be logging in remotely or using a graphical windowing system, so they never worried about the performance of the text console. Linux on the same hardware installed its own video driver written in C for the terminal so it was dramatically faster.)

Still faster than a hardware terminal... I wouldn't be surprised if it was meant to emulate these to some degree..

Meanwhile on Windows 10 it sometimes takes 2 seconds until the start menu pops up, or sometimes a full second until a right-click context menu comes up. On a clean Windows 10 install on a 3.4GHz PC with plenty of RAM :)

I do everything with a full Debian VM which I SSH into from a Javascript SSH client running in Chrome. This is generally much more performant than doing things in the Windows 10 shell.

For the start menu, I'm guessing this is either hard drive latency or network latency. The network latency can be particularly bad if you have an intermittent connection, like when you connect to a mobile hotspot on your phone from a moving train, the start menu can hang for minutes. It would literally be faster to reboot the computer than to wait for the start menu to popup.

I have a SSD and fast internet with low ping. Besides, the start menu shouldn't need to connect to the internet.

PowerShell gives you an interesting little number when you start it up:

Windows PowerShell

Copyright (C) 2016 Microsoft Corporation. All rights reserved.

Loading personal and system profiles took 542ms.

Mine does not give me that number - do I need to do something to turn it on?

>do I need to do something to turn it on?

I think it doesn't display if your profiles take less than 500ms to load.

EDIT: Just tested with a clean profile with the line "Start-Sleep -m xxxx" for various values of xxxx, and the message has shown up with times just above 500ms but not below.

I don't think so; I didn't explicitly do anything. I have a Lenovo and they put all kinds of crap on there, so maybe this had something to do with it, I dunno.

I work on the VS Code terminal/xterm.js[1].

Hyper which currently uses a fork of hterm, is in the process of moving over to xterm.js due to the feature/performance improvements we've made over the past 12 months. Hyper's 100% CPU/crash issue[2] for example should be fixed through some clever management of the buffer and minimizing changing the DOM when the viewport will completely change on the next frame.

I'd love to see the same set of tests on Hyper after they adopt xterm.js and/or on VS Code's terminal.

Related: I'm currently in the process of reducing xterm.js' memory consumption[3] in order to support truecolor without a big memory hit.

[1]: https://github.com/sourcelair/xterm.js [2]: https://github.com/zeit/hyper/issues/94 [3]: https://github.com/sourcelair/xterm.js/issues/791

> even the three year old hand-me-down laptop I’m using has 16GB of RAM

> on my old and now quite low-end laptop

Trust me, that's not a low-end laptop. Either that has the shittiest cpu ever and a terribly mismatched amount of memory, or the author's view of that is high-end or low-end is skewed; in either case, what's low-end nowadays would be ≤4GB RAM. 16GB is LOTS, useful for developers that run large builds and/or VM's regularly.

I very much like the rest of the article though, would love to see some latency improvements here and there!

It may very well be your view that is skewed.

For user-facing terminals/workstations, I would consider ≤8GB low-end ("unusable"). 16GB would be mid-low ("usable"), 32GB would be mid-high ("comfortable"), ≥64GB would be high ("good").

For servers, ≤64GB is low-end, ≥512GB being high-end.

That 8GB would be considered low-end does not mean that no one uses it, though. Some people might still rock 2GB laptops, or use the original 256MB Pi 1 as a light desktop.

Apparently that's true then. I'm in the Netherlands, so maybe the US is different in this regard. Thanks for clearing up!

No, you're completely right in that 16 GB RAM is not considered low-end. You only have to browse for laptops for about 2 minutes to discover that 16 isn't at all "low-end".

I think you missed the point I was trying to make entirely (that people's definitions of "low end" differ, although "low-end" steadily moves up), but from that perspective:

- 8GB is easily available and what most cheap laptops sport,

- 16GB is either default or an addon for cheaper laptops,

- 32GB is a premium that is not always available, and

- 64GB is usually only available in huge workstation or gamer "laptops", although there are some decent-size Dells with it.

While those are not my choice of metrics, they do seem to support the "low", "mid-low", "mid-high" and "high" labels I personally added. At most, you could argue that ≤8GB is low, 16GB is mid and ≥32GB is high. 10 years ago, 16GB would have been high-end for a laptop, but no more.

I love the analysis of terminal latencies! And I'm in full agreement with the overall goal of less latency everywhere. But, of course, I feel like picking a few nits.

> And it turns out that when extra latency is A/B tested, people can and do notice latency in the range we’re discussing here.

Yes, this is true. But the methodology is important, and the test used doesn't really apply to typing in terminals. The test isn't a "type and see if you can tell it's slow" test, it's a hit the mark hand-eye coordination test, something you don't do when typing text. Latency when playing Guitar Hero is super duper important, way more important than most other games, which is why they have a latency calibrator right in the game. Latency when playing a Zelda game is a lot less important, but they still try very hard to reduce latency.

The same people who can distinguish between 2ms of difference in a drum beat also can't distinguish between an extra 30ms of response time when they click a button in a dialog box.

I'd like to see a stronger justification for why lower latency in a terminal is just as important as it is for hand-eye coordination tasks in games.

  ~2 msec (mouse)
  8 msec (average time we wait for the input to be processed 
  by the game)
  16.6 (game simulation)
  16.6 (rendering code)
  16.6 (GPU is rendering the previous frame, current frame 
  is cached)
  16.6 (GPU rendering)
  8 (average for missing the vsync)
  16.6 (frame caching inside of the display)
  16.6 (redrawing the frame)
  5 (pixel switching)
I find this list pretty strange. It's generally right - there are a bunch of sources of latency. But having done optimization for game consoles for a decade, this explanation of game latency feels kinda weird.

Games that actually run at 60fps usually do not have greater than 100ms latency. They also don't miss vsync every other frame on average, that 8ms thrown in there looks bizarre to me. Render code and GPU rendering are normally the same thing. Both current and previous frame GPU rendering is listed, huh? Sim & render code run in parallel, not serially. The author even said that in his article, but lists them separately... ?

Consumer TVs come with like 50ms of latency by default. That's often half of it right there. Games are often triple-buffered too, that accounts for some of it. The ~2ms right at the top belongs in the 8ms wait, it disappears completely.

I just get the feeling the author of this list was trying hard to pad the numbers to make his point, it feels a like a very hand-wavy analysis masquerading as a proper accounting of latency.

It is gonna be a totally experimental feedback, but nonetheless it may help you relating to the importance of fast feedback latency ; the best stays to experience it yourself.

I noticed during my thesis on realtime systems that my brain had difficulties compensating for latency, even more for jitter (inconsistent latency). I do more typos when I have a high latency. I notice it when playing music in a high latency context, or when the ssh connection has a high ping. We did the experiment slowing down the click of the mouse by 100ms. Users hated the results.

I'm also more productive if I can notice my typos 100ms earlier, or confirm that everything I previously typed is good earlier. Even if it takes me 500ms to process the information displayed on screen, it stays on the critical path of the overall task speed performance.

I had a similar discussion with a colleague a few weeks ago. They were double counting latency in different pipeline stages when a delay in one stage causes high latency in the other stage.

My analysis would be something like this (assuming 60 fps): 1. Input latency - variable from hardware 2. Input event queuing - avg 8 ms (input event arrives in frame X but is handled in frame X+1) 3. Animation, game simulation, submitting GPU commands, other CPU work - 16.7 ms 4. GPU work - 16.7 ms 5. Display controller scanning out frame - 16.7 ms 6. Display panel latency - < 10 ms?

Steps 3 (specifically the submitting GPU commands part) and 4 can happen within one frame interval so you can be double buffered in that case. Otherwise you need triple buffering to maintain 60 fps. If you have a window compositor, then you need to account for when that latches and releases the buffer you submitted and if it performs copies or steals your buffer. This is the part that's especially murky. Games have it easy with fullscreen non compositing mode.

An interesting aspect of this is that for hand-eye coordination tasks, 60Hz provides a perceptual smoothness advantage over 30Hz. There's some benefit to having a higher frame rate even if it costs some latency. I don't know where the trade off point is, but it's not always better to reduce latency if it means a lower frame rate. It probably is always better to have both low latency and high frame rate, but that's hard.

As far as the latency analysis, I'd probably just lump display latency all in one and not try to break it down further. There's some number of ms it takes from the GPU output until pixels are on-screen. Consumer TVs are worse than computer LCDs because the TVs now are doing all kinds of filtering. OTOH, consumer TVs are also trending toward 240Hz refresh, so there is a counter-force causing latency to go down too.

Someone else here mentioned there are two kinds of triple buffering, which I didn't know, but FWIW I was talking about the 3-frames of latency kind. There's also a 2 frames of latency kind, and it sounds like that's the kind you're talking about?

Anyway, I tend to think about game latency as simply 1, 2 or 3 frames of latency, depending solely on how much buffering is going on. That explains almost all game latency, and the games are bad because they don't run at 60Hz, tons of games run at 15-30Hz with double or triple buffering, so the latency is automatically bad. No other sources of latency are needed to explain why. There is the 1/2 frame of latency between input and when the system polls the input and recognizes it, that's fair. So a typical double-buffered game has 2.5 frames of latency, and then add on the display device latency, and that's the sum total. It doesn't need to be made to look more complicated than that, IMO.

Typing text at speed is not really that different from playing an instrument at speed. When you pluck a string you're able to keep it in rhythm by using the resulting noise of your own playing as the cue to synchronize the next beat. When latency is high or varying, as sometimes happens with software synthesizers, even the simplest rhythm starts falling apart. Professional audio applications want very low latency, sub 20ms as a baseline, but preferably below 10ms.

In that light, it totally makes sense that typing latency matters. If you aim to type in a smooth rhythm and use the screen as feedback, any hiccup will slow you down.

Except that the big difference is the audio and the hand-eye coordination part.

You can more easily hear a 10-20ms delay than you can see one, it's a physical feature of our human hardware. And hand-eye coordination tasks are all about anticipating an event. Hitting or catching a baseball for example, we can see it coming, the pattern of it's trajectory is what allows us to compensate for the 100ms of delay in our nervous system & is brain and allow us to have 1ms accuracy.

Neither of those is true for typing. Don't get me wrong; I want lower latency in my terminals and editors. I just don't buy that it's particularly important until the latency hits a threshold of badness, which is probably around 100ms. People largely aren't complaining about terminal latency nearly as much as they complain about video game latency, even though both are widely used.

The reason guitar hero has latency adjustment controls and almost no other games do is because they're mixing audio with hand-eye coordination tasks. I can very easily tell the difference between 5ms of delay in Guitar Hero. But I have no idea what my terminal latencies are, and I generally don't care until it stalls more than probably 200ms. It makes a very subtle responsiveness difference when there's an extra 30ms latency while I type, but it doesn't make a large functional difference or compromise my ability to type in any easily felt or measurable way. With Guitar Hero, on the other hand, I drastically lose my ability to play the game when the latency is off by 20ms.

Anyway, I appreciate the response & discussion, but I still want to hear a stronger justification for typing latency being very important. There might be one, I just don't think I've heard it yet.

If you care about end-to-end latency, I highly recommend this talk by John Carmack:


This talk blew my mind and made me feel like a terrible engineer. He's talking about end-to-end latency in VR, which actually has a commercial motivation because VR products with high latency will make you sick. (this obviously doesn't happen with shells and terminals!)

He's talking about the buffering and filtering at every step of the way. And it's not just software -- sensors have their own controllers to do filtering, which requires buffering, before your OS kernel can even SEE the the first byte, let alone user space. On the other side, display devices and audio output devices also do nontrivial processing after you've sent them your data.

It's an integrated hardware/software problem and Carmack is great at explaining it! It's a dense, long-ish talk, but worth it.

One other thing to note is that compositors seem to add a fairly large amount of latency. I ran the app linked in the "Typing with Pleasure" post and I saw a roughly ~20ms improvement across various text editors with the compositor turned off (I'm using Termite with Compton as my compositor).


Fun test to check how fast the terminal can handle loads of text, run "time head -c 1000000 /dev/urandom"

On a MacBookPro11,1 - 2,8 GHz this shows:

iTerm2: 2.182 total

iTerm2 with tmux: 0.860 total

terminal.app: 0.135 total

terminal.app with tmux: 0.910 total

Surprisingly, iTerm2 is faster with tmux than terminal.app is with tmux. But terminal.app without tmux is the fastest.

Anyone knows why the performance with tmux is so different between the both terminals?

Maybe tmux doesn't pass the full text through to the terminal, and 0.9 seconds is how long tmux takes to work its way through the buffer.

Incidentally, while still fast enough for this test, /dev/[u]random is not very fast on macOS: ~15 MiB/s on my MBP.

My iTerm2 is taking 17s vs. 0.3s for Terminal.app. Any idea why yours is an order of magnitude faster?

Hm, have the same CPU? Was tested with build 3.0.15

It'd be cool if someone would replicate this on Linux under X11 and a few Wayland compositors, and throw urxvt, rxvt, xterm, aterm, and some others in the mix.

And alacritty (https://github.com/jwilm/alacritty) would be interesting to see as well.

And Konsole. I love it. Too bad it is tied to KDE libs. It also feels quite snappy to me.

I'd like to see Terminology perform there.

When I first switched from the terminal emulator that came with the first DE I was using (gnome 2?) to urxvt, it seemed very fast. When I later switched to Terminology, it seemed even faster. I've stayed with Terminology. I agree it'd be very interesting to see a proper comparison between Terminology, urxvt and others.

The common LibVTE-based terminals have problems with latency because they're deliberately capped at 40fps. Xterm doesn't have this problem.

Gedit gained this problem when it switched to GTK3. The forced animation for scrolling doesn't help. Mousepad still has low latency, at least in the version included with Debian Unstable, but I worry that port of XFCE to GTK3 will make it as bad as GNOME.

So, on the other side, anyone want to build a true 'terminal emulator' that has baud-speed emulation?

top just doesn't look the same without the changes trickling down the screen, matrix like..

Thankfully I can run GlassTTY font connected to the KVM serial console for a near approximation.. but it's still too fast :)

Grew up in the VC/GUI transition era, but buying a vt220 and running it at 19200 on a used Vax taught me a Zen of command line that nothing else could... Not only did you have to think about what the command would do, but also how much text it would display, and whether you'd need to reboot the terminal after it got hosed up...

> So, on the other side, anyone want to build a true 'terminal emulator' that has baud-speed emulation?

So this isn't exactly the same, but I improved my Vim muscle memory considerably by running the MS-DOS version of Vim inside of DOSBox, reducing its speed of emulation to the lowest option, and finding the most efficient ways to edit files at a decent pace by making use of the best command sequences possible at any given point in time.

You could use cool-retro-term[0] for the visual side, but that doesn't support baud rate emulation either. Maybe you could pipe things through `pv` with the `-L` flag to limit speed?

[0]: https://github.com/Swordfish90/cool-retro-term

Interesting article, but I don't quite get it.

I'd imagine that terminal users will often be looking at the last bit of output as they type, and hardly looking at the thing they're typing at all (one glance before hitting Return). They aren't going to notice a bit of latency. And terminals are often used to communicate over networks that introduce a lot more latency than any of the measurements here.

I think, for me, this is a bit like the sales pitch for Alacritty -- fastest terminal ever as long as you don't mind that it doesn't scroll. Someone is using their terminal very differently from the way I use mine.

It depends how you use a terminal. If you type a lot of commands, and you use emacs keys to jump around within the current line you're typing (like ctrl-p to go up to the previously entered command, ctrl-a to jump to the beginning of the line, replace the first word, etc.), those latencies add up.

Basically we want the stuff in our brains to be able to manifest itself as fast as we're thinking it. Already having to drive human hands to make this happen is a big penalty; we don't need unnecessary extra latency if it can be helped.

Still waiting for that neural interface... plug me in please.

I found this note in the appendix interesting with respect to why we don't seem to notice this latency

> Terminals were fullscreened before running tests. This affects test results, and resizing the terminal windows can and does significantly change performance (e.g., it’s possible to get hyper to be slower than iterm2 by changing the window size while holding everything else constant).

Perhaps we don't notice because it's so much lower at window sizes much less than fullscreen?

Slightly off-topic, but this reminded me of a mystery: Does anyone else experience that bash takes a long and variable time to start, on many systems and without any fancy setup of any kind? What can a shell be doing that it takes three or four seconds to start?

Is part of this an experience of cygwin? Forking is slow under cygwin. If you can get to a recent Windows 10 with local admin rights, you can install windows-services for linux, and bash should perform well even on ten-year-old hardware.

If you getting unusual slowness in linux/BSD, some things to check: (1) any significant overhead on the system that could be making forking slow; (2) some option in your $HOME/.bashrc that is adding latency, e.g., indexing options or dynamic stuff in your PS1; (3) unusual filesystem stuff, e.g., your home directory is mounted over NFS.

If you're in linux and it is reproducible, run strace against a bash launch, and ctrl+c it when it gets stuck. Have a look at the recent system calls.

This is on Mac OS. I might try the tracing idea though.

Yes, that happens all the time for me. And more often than not, it's because of all the crap I put in my startup scripts without realizing it until it gets too big. Sometimes it's the volume of crap other people have put in /etc/bashrc. Sometimes it's a slow filesystem causing one of the startup commands to hang a little. Sometimes it's a command that's just big and bloated.

It's instructive to dig in and debug it, you may be really surprised how much insane amounts of stuff a shell can be doing in those 3-4 seconds. Just some examples of what my shells are doing: archiving my history (diffing & zipping files), bash completion, git completion, read color definitions, run OS-specific scripts, run host-specific scripts, add (read & parse) lots of aliases & functions, run /etc/bashrc (which on my macOS laptop is almost 3k lines long)

The last performance issue I had was my history archiving junk got too big, I was saving my full .bash_history into an archive every day without clearing .bash_history, so they kept getting bigger. Diffing against the previous day, and writing a smaller file, fixed it.

You can binary search by putting a couple of "date +%s%N" in your .bashrc or .bash_profile to find out what's taking the longest.

Yep, I have profiled my shrc scripts in exactly this fashion before for exactly this reason.

I don't have this experience -- I highly suspect NFS or another network file system.

On my system, hitting Ctrl-Shift-T inside xterm is almost instant, and that starts a bash process. Likewise for Ctrl-B C in tmux.

Bash's startup files are annoying, but you can probably pinpoint the problem with strace.

Or maybe try running:

    strace bash

    strace bash --norc --rcfile=/dev/null
However this isn't the full story because bash's startup sequence is a nightmare and neither of those is likely to be exactly what happens when you're starting a new shell.

My bash always starts near instantaneously, because I have never tolerated it not doing so.

bash-completion (available in many package managers) used to be one culprit, should be less of one now due to on-demand completion loading, but still noticeable. I don't use it. Another is rvm, nvm, etc (managers of ruby or nodejs versions for development, install process typically adds to your .bashrc).

An easy way to figure this out is to add `set -x` to .bash_profile or its equivalent.

How did he actually measure the latency in this article?

Doesn't measuring keypresses to display latency require special hardware?

All tests were done on a dual core 2.6GHz 13” Mid-2014 Macbook pro. The machine has 16GB of RAM and a 2560x1600 screen. The OS X version was 10.12.5. Some tests were done in Linux (Lubuntu 16.04) to get a comparison between macOS and Linux. 10k keypresses were for each latency measurements.

Latency measurements were done with the . key and throughput was done with default base32 output, which is all plain ASCII text. This is significant. For example, terminal.app appears to slow down when outputting non-latin unicode characters.

It would also be great to see how these scale with terminal size. I personally use iterm2 but after switching to a new Macbook pro this year with two 5k displays it's noticeably slower. I'm assuming some O(n^2) scaling behind the scenes but I haven't measured anything myself. Still, @gnachman I love your term especially with terminalplots.jl and drawing Julia repl images inline.

I want to try Alacritty on OSX - but the big turnoff for me is the lack of binaries.

However, I don't know if that's intentional, because they don't think it's ready yet for people who won't install/compile the whole Rust stack from scratch?

> Precompiled binaries will eventually be made available on supported platforms. This is minimally blocked on a stable config format. For now, Alacritty must be built from source.


I guess this isn't measuring end-to-end latency which would be discretized in units of frames and would have a constant overhead from the display pipeline. I wonder if the differences between terminals would look much smaller if measured that way.

I'm surprised Hyper did as well as it did on the latency test, after all it does run in Electron, which I'd expect to add a lot of overhead between keypress and text display

You should expect latency similar to a web app in Chrome, just that this has a backing pty processing the input and output.

> even the three year old hand-me-down laptop I’m using has 16GB of RAM

Oh my.. I wish that was the case. Even current development machines tend to have just 8 around here..

I have a (three-year-old) laptop with 16GB at work and another (three-year-old) laptop with 8GB here, and speaking as a C++-etc developer, I never notice any difference. Maybe if you use a substantial number of VM environments.

Interesting to see that both alacritty and Terminal.app are very fast but running tmux inside them kills performance.

I've never once worried or felt bothered by performance issues while running iTerm2, but I have in Terminal.app and I'm a heavy/complex tmux user.

I wonder how much performance hit font antialiasing in iterm2 causes, or if it was turned on during these tests.

font antialiasing on the original OS X completely destroyed performance in the terminal and gave the impression that that the whole system was slow.

How can st be so slow? It's tiny and hardly does anything!

Maybe it's doing hardly anything one character at a time.

Because it's running inside XQuartz on macOS. There is another line on the graph, "linux-st" with much better results.

Latency is why I could never use a WYSIWYG word processor. While I don't like VI that much, it's latency is low enough that it's not a problem. i.e. I press a key, and miracle of miracles, the character appears on the screen.

Using a WYSIWYG word processor, there's enough latency between keypress and visual update that I find them impossible to use.

When Apple came out with Pages, it was apparent that they paid strong attention to latency. That means the latency is small enough that (for me) using it isn't an exercise in frustration.

> I press a key, and miracle of miracles, the character appears on the screen.

As long as you press "i" first! :)

Mellel is a word processor app with lower latency than Pages.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact