
Latency Numbers Every Programmer Should Know - ingve
https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
======
jorangreef
Since I came across Jeff Dean's tip a few years ago about back-of-the-envelope
calculations and latency numbers
([http://highscalability.com/blog/2011/1/26/google-pro-tip-
use...](http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-
the-envelope-calculations-to-choo.html)), I can't count the number of times
that sketching out a design's complexity has saved me from implementing
designs that can't possibly work, and eventually resulted in designs that are
orders of magnitude faster.

In fact, it's not just about designs that work, or designs that are fast, but
getting into the practice of estimating complexity in terms of hardware
numbers also makes for safer code, especially where validating user data is
concerned.

Just recently even, it kept me back from what might have been a potential
denial of service in
[https://github.com/ronomon/mime](https://github.com/ronomon/mime), and lead
to discovering a vulnerability in several popular email parsers
([https://snyk.io/blog/how-to-crash-an-email-server-with-a-
sin...](https://snyk.io/blog/how-to-crash-an-email-server-with-a-single-
email/)).

I think Martin Thompson summarized it well as "Mechanical Sympathy":
[https://mechanical-sympathy.blogspot.com/2011/07/why-
mechani...](https://mechanical-sympathy.blogspot.com/2011/07/why-mechanical-
sympathy.html)

~~~
mdpopescu
Great article, thanks for that (and for evidence #1236576 that I have no clue
when it comes to software security).

One rather off-topic observation: April 23 to June 25 is somewhat shorter than
the 90-day window you mentioned. ("A few days before the 90-day public
disclosure deadline...") What was the reason for that? It doesn't appear to be
because those who were going to fix it had already done so - they published
their fixes _after_ the public disclosure.

(I'm just curious, not criticizing or anything.)

~~~
jorangreef
Thanks!

Regarding the 90-day window, you are spot on. I never realized that until now.
I made a mistake with the month, it should have been July 25 not June 25, so
it came out after 60 days, not 90 days as I intended.

That's evidence #1236577 that I have no clue at all!

------
dvh
Grace Hopper explaining nanosecond:
[https://m.youtube.com/watch?v=JEpsKnWZrJ8](https://m.youtube.com/watch?v=JEpsKnWZrJ8)

~~~
pwaivers
Great video. I've never seen her speak before. She is very likable!

------
zawerf
I think the coolest thing about this visualization is that it is now "broken".

I remember when I first saw this, there were still visible red and green
squares on the top rows. Today those numbers are so small, those squares are
missing completely!

(That said whoever owns this page should update the scale of the blocks so
it's more useful going forward)

~~~
e12e
Thank you for linking to your attachment blog post. What's the status of
ronomon? It seems [https://ronomon.com/](https://ronomon.com/) assume existing
user - and gives no more information?

~~~
jorangreef
Thanks! Ronomon is in private beta. My email is in my profile if you'd like to
chat more.

------
tw04
3ms for a disk drive isn't something you should plan to. 3ms is assuming a 15k
drive with little to no workload. Under load your likely 10ms and under heavy
load you may be seeing 20+. SATA drives? 5ms best case, under load with
anything but sequential workloads and you might as well just take the
afternoon off.

~~~
bufferoverflow
Maybe it's an average and they include the SSDs.

Here's a better list:

[https://gist.github.com/eshelman/343a1c46cb3fba142c1afdcdeec...](https://gist.github.com/eshelman/343a1c46cb3fba142c1afdcdeec17646)

~~~
MR4D
Great link - it’s much more readable than the OP.

Highly recommended!

------
preinheimer
If ping times are your jam, we've collected a few between major cities across
the globe: [https://wondernetwork.com/pings](https://wondernetwork.com/pings)

~~~
Iem3ohvi
Averages are not particularly interesting. CDFs or violin graphs are since
hundreds of requests to load a site can turn your 99th percentile into what
the user actually experiences.

Edit: I just noticed that clicking on it shows box plots. Good enough!

~~~
preinheimer
Thanks!

We've got all of our data going back years in AWS Athena, we're just waiting
to have time to do something fun with it.

------
mkesper
This is a good example for bad display of data: ten lines for example don't
get recognized by the mind as having that much greater value than a single
line.

------
moftz
It will never change but another latency to think of with regards to hardware
is that electrical signals go at the speed of 6in/ns in circuit boards.

------
dmayle
This is great for server development, but I'd love to see the numbers for
mobile phones nowadays. Rather than year, mobile phone by year class...

------
jandrese
I'm surprised how cheap a branch mispredict is. I had thought the relatively
long pipelines in modern processors would have made that more painful. It
seems weird Intel devotes so much silicon to improving the branch predictor
when the penalty is so light.

~~~
OskarS
Try it for yourself! Create an array with a million random bytes from 0 to
255, then count (in a for loop) with an "if" statement how many of them are
less than 128. This branch will be essentially impossible to predict, and the
branch predictor will fail roughly 50% of the time.

Then try again, but sort the array before you begin (which will make the
branch trivial to predict). Time the difference. I think you'll be surprised.

~~~
AceJohnny2
Well, I took the bait. Time to go through an unsorted vs sorted array on an
iMac17,1 with 4Ghz Core i7 (not sure which CPU exactly).:

* 1MB: 6.52ms vs 2.16ms (3x speedup)

* 1GB: 5.73s vs 2.25s (2.5x speedup)

Tested with this C code, called as "bpredict-time [size] [sort]":

    
    
        #include <stdio.h>
        #include <stdint.h>
        #include <stdlib.h>
        #include <sys/time.h>
        
        #define SIZE (1024*1024)
        
        int cmp (const void *val1,
                 const void *val2)
        {
            return *(uint8_t*)val1-*(uint8_t*)val2;
        }
            
        int
        main (int argc,
              char **argv)
        {
            uint8_t *array;
            size_t size = SIZE;
            struct timeval start, end;
        
            if (argc > 1) {
                size = atoi(argv[1]);
            }
            
            array = malloc(sizeof(*array) * size);
            if (array == NULL) { return -1;}
        
            for (size_t i=0; i<size; i++) {
                array[i] = random();
            }
        
            if (argc > 2) {
                qsort(array, size, sizeof(*array), cmp);
            }
        
            (void)gettimeofday(&start, NULL);
            int cnt = 0;
            for (size_t i=0; i<size; i++) {
                if (array[i] < 128) { cnt++;}
            }
            (void)gettimeofday(&end, NULL);
        
            
            printf("N ints < 128: %d\n", cnt);
            printf("Time: %f\n", (float)(end.tv_sec-start.tv_sec)\
                +(float)(end.tv_usec-start.tv_usec)/1000000);
            free(array);
            
            return 0;
        }

~~~
Sohcahtoa82
Why do you use "(void)gettimeofday(&start, NULL);" instead of just
"gettimeofday(&start, NULL);"?

~~~
AceJohnny2
To avoid the compiler complaining about the unused return value, by telling it
I'm explicitly choosing to ignore the value.

The complaint doesn't show up on the default warning levels. I think only with
-W ? (fun fact: -Wall actually has _fewer_ warnings enabled than -W, at least
with Clang)

Edit: actually, the warning _is_ enabled by default, but only triggers for
functions that have the attribute 'warn_unused_result' set, which is almost
none of them, which is why this warning is not well-known. Indeed, in my
example my (void) isn't necessary, and I couldn't get the warning to trigger
even with -Wpedantic.

------
CalChris
These latencies should include register access (nominally 1 clock cycle).
Compilers work very hard to keep things in registers and it's nice to know the
relative benefit of register vs L1.

------
jdoliner
There are a few things in this diagram that always drive me a little crazy.
Why is it that 1,000ns ≈ 1μs while 1,000,000ns = 1ms (almost equals vs.
regular equals). It kinda makes sense to use almost equals for some of the
real values, but they do it for pure values that are just there for
conversions... but not with all of them, only with some.

All this being said, this is actually a very useful and well done diagram.
But, imo, it would be even better if it used equal signs in a consistent way.

~~~
drewmol
My guess is it's a typo in: 10,000ns ~ 10us = [green cube] and it's supposed
to read 10,000ns = 10us = [green cube] and this is a key as are black red and
blue, all of the speed measures are approximate.

------
akuji1993
I thought it's funny to not see the roundtrips increase at all. All the other
measurements exploding in size, the more you go back and then just the
roundtrips stay the same.

~~~
exDM69
It's the speed of light staying constant despite best efforts of physicists
and engineers. The distance between California and the Netherlands hasn't
changed either.

~~~
Iem3ohvi
Technically neutrino signalling could make it faster, but those extra
milliseconds are not worth it, even for the HFT folks... but thanks to them we
have shorter fiber routes across the atlantic at least.

~~~
supahfly_remix
> thanks to them we have shorter fiber routes across the atlantic at least.

Can you explain how HFT caused a shorter fiber route across the Atlantic? Is
this route open to the non-HFT public?

I read Flash Boys and am aware of a custom fiber link between Weehauken, NJ
and Chicago, IL, but I thought they are were moving to microwave. I thought
there were some HFT links (fiber or microwave) within Europe.

~~~
daddylonglegs
There are microwave links across Europe:

[https://sniperinmahwah.wordpress.com/2016/01/26/hft-in-
the-b...](https://sniperinmahwah.wordpress.com/2016/01/26/hft-in-the-banana-
land/)

There are apparently shortwave links across the Atlantic:

[https://sniperinmahwah.wordpress.com/2018/05/07/shortwave-
tr...](https://sniperinmahwah.wordpress.com/2018/05/07/shortwave-trading-part-
i-the-west-chicago-tower-mystery/)

[https://sniperinmahwah.wordpress.com/2018/07/16/shortwave-
tr...](https://sniperinmahwah.wordpress.com/2018/07/16/shortwave-trading-part-
iv-sleuthing-examples-research-tools-techniques-deputies-wanted/)

These should have a lower latency than fibre links although the shortwave link
will probably have rather a low data rate.

------
harry8
Seems like in keeping a thing up to date it hasn't kept pace with the way
computing has changed. Multicore, numa, extra cache level in heirarchy etc.

Lacking:

\- L3 cache reference \- major fault \- tlb miss \- syscall overhead \- Inter-
thread latency 64b - threads on same package \- Inter-thread latency 64b-
across packages \- latency to ethernet tx 64b \- latency between 2 machines
attached to local switch 64 byte transfer

Comments?

------
mmirate
With SSDs, don't the critical determiners of latency and bandwidth become the
data transfer buses, rather than the storage media themselves?

(especially if one uses an external USB SATA controller, in the vain hope of
having storage that is all three of durable, fast and easily-replaced)

------
barrystaes
Wow, it seems they did a great job in 1990 with the packet roundtrip CA to
Netherlands, it still lists ~150ms for 2018. I think its closer to half or
less than that, depending on what CA this is refering to. Assuming its not in
Tokyo.

~~~
dmayle
The trip is dominated by the speed of light (or more accurately, the speed of
electricity, which is close to the speed of light, at about 300,000 kilometers
per second)

The circumference of the earth is about 40,000 kilometers. A round trip from
CA to Netherlends and back will have electricity going roughly that distance.
Speed of electricity times 150 milliseconds is 44,000 kilometers.

That means that the number won't change over time.

~~~
hhmc
Wont the majority of the trip be over the fibre optic medium - which would
require the speed of light (albeit in a glass medium), rather than the speed
of electricity?

This also implies that you could materially change that rtt number if you used
a faster medium - e.g via air by way of microwave towers.

~~~
Xylakant
> This also implies that you could materially change that rtt number if you
> used a faster medium - e.g via air by way of microwave towers.

Which is done for high-speed trading networks:
[https://arstechnica.com/information-
technology/2016/11/priva...](https://arstechnica.com/information-
technology/2016/11/private-microwave-networks-financial-hft/)

------
ebikelaw

        function getDCRTT() {
            // Assume this doesn't change much?
            return 500000; // ns
        }
    

:thinking face emoji:

High by 10x in my experience.

------
philipodonnell
It looks like from 2006-2018 the only things that have improved are disk read
speeds and the "commodity network" packet speed. Most of this chart is
unchanged over that time. Has progress on CPUs, memory, processing speed
really stopped over the last 12 years?

~~~
exDM69
It measures latency, not throughput. CPU or memory latency has not improved a
lot, but throughput has through increased core count and instruction level
parallelism. If you look at throughput per watt, the improvement is greater.

~~~
bostonpete
I assume the "send 2000 bytes over a commodity network" is measuring
throughput rather than latency. If not, I really don't understand what it's
describing...

~~~
gpderetta
It is litterally the lenght of a 2000*8 bit backet in a serial link, in ns.
The number seems way off unless my math is wrong or 200Gb ethernet counts as
commodity today.

------
Waterluvian
Changing the scale using colour but keeping the block the same really makes it
hard for me to gain any sense of scale beyond what I already know.

------
dj43nq
This page needs some work my iPad doesn’t render it properly. The Linux box
didn’t like it either.

------
skyde
can someone explain this part to me

Send 2000 byte over network = 44ns Roundtrip in same datacenter = 500000 ns

Isn't a rountrip simply a send from A to B and then a send from node B to node
A?

~~~
detaro
The first might be just the time it occupies on the wire (although it'd be a
400G network, that's not quite "commodity" yet IMHO...), vs roundtrip being
transfer time, time through switches, time through network stacks, ... to send
a packet and reply to it as fast as possible.

~~~
skyde
not sure what you mean by "on the wire".

But even if this is assumed to be the time to transfer a UDP packet from one
network card buffer to another network card buffer directly connected by a
cable this seem extremely low.

~~~
detaro
"on the wire" as in what you'd see if you'd connect an oscilloscope to the
"wire" (which would be fiberoptics at that speed...) and watched how long it
took from packet start to end. That could work, but it could also just be an
error on the page.

------
ecesena
Is there a cloud version, e.g. what we should expect on AWS?

~~~
valarauca1
These numbers are based on values from Google's production stack. Insofar as
these numbers are based on Jeff Dean's post:
[http://highscalability.com/blog/2011/1/26/google-pro-tip-
use...](http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-
the-envelope-calculations-to-choo.html)

That being said, these numbers are roughly identical to what you can expect on
AWS.

------
Mauricio_
What about your average call to malloc() / new?

------
ssvss
Is there a similar chart for bandwidth numbers.

------
lucio
So there's no more room for improvement at the microchip level. Will software
become leaner? Is Moore's law reaching a plateau?

------
stevage
If you're a web developer, only one of these is even slightly relevant.

------
bluetomcat
Unless the programmer is programming at a very low level, the listed events
occurring are out of his control. CPU caching is mostly transparent in the
ISA, disk seeking is scheduled by the kernel or in storage controllers, file
buffers are cached in the kernel, application frameworks also provide layers
of caching.

For the majority of programmers who want to get their shit done with
straightforward code, few dependencies and acceptable performance, this is
"interesting to know" but not "should know".

~~~
dragontamer
> Unless the programmer is programming at a very low level, the listed events
> occurring are out of his control

> CPU caching is mostly transparent in the ISA

Nonsense. Even in a relatively high language like Java, you can use primitive
types like int[] to ensure that certain elements are close to each other in
memory. As such, you can have good memory access patterns even in a high
language like Java or C#.

I'm fairly certain this stuff is important when choosing data-structures:
Vector vs Linked List for instance. Linked Lists are harder to cache than
Vectors, and this chart helps explain why both of these O(1) traversals can
have dramatically different performance characteristics.

> disk seeking is scheduled by the kernel or in storage controllers, file
> buffers are cached in the kernel

But you can read a file from beginning to end. Even in a very, very high
language like SQL, you can often ensure a high-speed sequential table scan if
you write your joins properly. And knowledge of sequential scans can assist
you in knowing which indexes to setup for your tables.

Knowing that you have SSDs vs Disks can be helpful in the architecture of SQL
architectures.

~~~
bluetomcat
> As such, you can have good memory access patterns even in a high language
> like Java or C#.

CPU and data intensive heavy lifting is rarely done in such programs, it is
delegated either to specialised libraries or some middleware in the form of a
RDBMS. Most of these programs spend most of their time waiting for some IO
event, so the few microseconds gained from the vector with a few hundred
elements are negligible.

> But you can read a file from beginning to end.

That's what most programs actually do most of the time because files are
essentially a stream abstraction. Programs that jump around a file would map
it into memory, then the CPU and the kernel would do their best to cache the
hot regions, even if the access to these regions is temporally or spatially
distant.

~~~
OskarS
> CPU and data intensive heavy lifting is rarely done in such programs

That's absurd. They aren't as high-performance as C or C++, but Java and C#
both have screamingly fast JIT compilers and plenty of high-performance code
is written in them. We're not talking about Prolog here. And memory access
patterns absolutely makes a huge difference in performance in these languages.

Sure, you CAN ignore that kind of stuff if you want to, but good programmers
don't.

------
jiveturkey
Tufte should be required reading if one is publishing something like this. I
assume this is the work of an undergrad because it's horrible.

EDIT: downvote is absurd. it's a no-question horrible visualization. I don't
know which is worse, the poor presentation or the lack of credit to the
original author (Dean).

~~~
mmirate
> the lack of credit to the original author (Dean)

The linked GitHub repo's description reads "Jeff Dean's latency numbers
plotted over time".

~~~
jiveturkey
yes, the github. 5% of the target audience might read that. trivial to add the
credit on the graphical page, and no excuse not to do so.

third fault, which to me should ban this page from the internet: he has dared
to put a "2018" text on his page, insinuating that it's new data, or some new
insight, as opposed to being 15+ years old.

when the original _text_ is more understandable than your visualization, you
dun goofed

