
Latency/throughput tradeoffs, illustrated with coffee - KentBeck
https://medium.com/@kentbeck_7670/inefficient-efficiency-5b3ab5294791
======
Ixiaus
I liked this article and how it was framed for businesses but it seems
generally useful to many different types of activities:

\- Pull requests should usually optimize for latency, not throughput (i.e.
smaller PRs/changes are usually better)

\- Release frequently instead of infrequently (implying that frequent releases
will be smaller but in frequent releases will be very large)

\- Non-strictness (latency-optimized) is more composable than strictness
(throughput-optimized)

... this orbits another mental model I was exposed to a few years ago I call
"weak-signal thinking".

------
neogodless
What I find to be one of most annoying trade-offs of latency and throughput
(and etiquette) are single lane bridges.

If there are three cars on either side, the fastest way to get all six cars
crossed is to allow all of the cars on one side to go, and then all of the
cars on the other side to go. There's some increased latency on the side that
goes second, but 4 of the 6 cars end up crossing sooner (1 is the same either
way, and 1 is slower) than they would with what actually happens.

One car goes on one side. Then one car goes on the other side. They continue
to alternate. It's so slow that what was once six cars quickly becomes twenty.
But if you "sneak" in behind the car in front of you, you see unhappy faces
and the occasional middle finger!

~~~
JacobDotVI
I'm not sure where you are from, but when I encountered one of these bridge in
Kawaii, Hawaii a few years back it had specific instructions to follow the car
in-front of you. I always admired whomever had the foresight to design such
instructions and never realized it was not like that elsewhere.

From a quick google, looks like this is the the norm throughout Hawaii:

[https://www.hawaii-aloha.com/blog/2014/01/08/one-lane-
bridge...](https://www.hawaii-aloha.com/blog/2014/01/08/one-lane-bridge-
ahead/)

[https://vacations.hawaiilife.com/blog/misc/one-lane-
bridge-e...](https://vacations.hawaiilife.com/blog/misc/one-lane-bridge-
etiquette)

~~~
neogodless
As if I didn't need one more reason to move to Hawaii?

That's subtle genius. I like this version of the etiquette: 5-7 cars at a
time!

------
roland35
Making coffee is a good analogy for any kind of hardware electronics
development! Some things need to be planned out ahead of time like the overall
mechanical design, but other things probably should be iterated quickly, maybe
by using pre-made development kits and breadboards instead of fully designed
circuit boards for initial firmware.

As the article said at the end, it all basically boils down to "it depends..."
:)

------
willis936
Or do what superscalar CPUs do and have 8 coffee machines running all the
time.

~~~
dragontamer
CPUs are the latency-optimized machines of today.

If a CPU sees a line of 10 people, it will brew 10-cups of coffee, speculating
that most want coffee. If only 9-cups were needed, it will throw away the
extra coffee.

\-----------

In practice, this truly happens. CPUs perform branch-prediction over a for-
loop.

    
    
        for(int i=0; i<32; i++){
            doA();
        }
        doB()
    

The value "i" hasn't been calculated yet, but the CPU performs branch
prediction. Modern CPUs can accurately loops of size ~32 or less. Modern CPUs
will literally fill their pipelines with 32x "doA()" statements, and even the
doB() statement BEFORE the i<32 check was even tested.

Now the branch-predictor might be wrong! Lets say that doA() is:

    
    
        if(random() == 1% chance){
            i++; // Increments the loop counter
        }
    

The CPU will likely fail to predict this, and then will be forced to throw
away the work. Nonetheless, its overall beneficial for the CPU to
speculatively try to do all the loops anyway (the alternative is leaving the
CPU-pipeline empty, which has roughly the same costs as a failed speculation
anyway).

CPU-wins if it is correct, and it ties if it is wrong. So might as well
speculate.

~~~
fiter
In your example, it sounds like you mean throughput-optimized. According to
the original post, brewing hot water for 10 cups would introduce additional
latency.

~~~
dragontamer
> In your example, it sounds like you mean throughput-optimized. According to
> the original post, brewing hot water for 10 cups would introduce additional
> latency.

Nope. CPUs are latency-optimized.

The "1st cup of coffee" always takes the same amount of time in CPU-land. The
2nd-cup of coffee was speculatively made, but never "slowed down the first cup
of coffee".

\----------

A throughput optimized machine, like GPUs (and strangely enough: hard drives),
are willing to slow down the 1st-cup of coffee for better __overall
__throughput.

Hard drives are interesting: if you have the following "reads":

#1: Read location 1

#2: Read location 100

#3: Read location 50

The hard drive will re-arrange the reads into: Read 1, Read 50, Read 100,
because the hard-drive head will reach location50 before location100.
Remember, hard drives are physically moving their arms to each physical
location.

This means that Read100 is "slowed down", its latency got significantly worse.
But the three reads all together all completed at the same time.

~~~
fiter
> The "1st cup of coffee" always takes the same amount of time in CPU-land.
> The 2nd-cup of coffee was speculatively made, but never "slowed down the
> first cup of coffee".

Just to be clear, then: the analogy from the original post doesn't apply.

~~~
dragontamer
> Just to be clear, then: the analogy from the original post doesn't apply.

The analogy from the original post applies to the cases the original post
discusses.

The original "coffee latency" blogpost innately applies to a 1980s style
computer: a simple in-order machine. Its truly correct for that model of
simple computing.

I've added in complications: pipelining, superscalar, and speculative
execution, which were inventions deployed in the early 90s and 00s to CPUs. So
things work differently on modern machines, because modern machines have many,
many more features than the "original" computer designs.

The original "cups of coffee" are a good way to start thinking about latency
vs bandwidth problem. I really like the analogy. But it would take a LOT more
writing before I really cover everything going on in modern CPUs.

~~~
fiter
Your original post was missing explanation because you referenced the original
analogy without addressing how the original analogy no longer applied to the
scenario you were discussing.

For what it's worth, in all my replies I have not been confused about the
behavior of a CPU, but only about how you are trying to use the analogy to fit
your exposition.

------
ncmncm
Very often I see false tradeoffs attempting better latency. When the system is
on top of things, sure, go for latency. Once there is any backlog, favoring
throughput gets you better latency too.

------
crdrost
Joel on Software gave a better illustration of this a while back in talking
about the dangers of multitasking; he said imagine that you have no task-
switching penalties but have to perform two tasks A and B which are in theory
100 units of time each. If you perform them serially, you get the result for A
at time 100 and the result for B at time 200; if you perform them in parallel
switching between them, you get the benefit that at time 51 you can show both
of the recipients that you are 25% complete, but you deliver A at time 199 and
B at time 200. B gets the same result; A gets a strictly better result, by not
multitasking. If you imagine that your reputation is proportional to the
average of the inverses of your times-to-completion, your reputation is 50%
better in the first case due to the 100% improvement on half of your
deadlines; if you had done the same nonsense with three parallel tasks your
reputation would be 83% better or so.

With that said it seems, I don’t know, like something is missing? Throughput
in these project-engineering contexts is little more than the plural of
latency; improving latency usually works to improve throughput. So it would be
nice to figure out what the actually-perpendicular vector is, given that these
two so often go hand-in-hand.

I'd then want to think about situations where you could up-front invest in
building a clean piece of software that is dynamic and highly-adaptable later
(big wait, then lots of features can be delivered faster) vs. a clunker that
was slapped together ad-hoc in order to immediately meet business needs, and
it shows (immediate results but every new feature takes longer and longer).

Between the two of those I have a personality which favors the first; in one
of my early programming jobs I had a lot of trouble being thrown into the tail
end of a system built for years according to the second principle, and so
every little change took weeks to debug because everything was spaghetti—I got
a bit burned. On the flip-side, the second is in some sense Objectively
Correct—lower latencies are really powerful—and I started to adopt some
serious principles from that.

So with new internal tools for example, I have some baseline principles which
speak to the second vision. A new tool starts without CI/CD, it starts without
a database or data persistence, it has a repository but it does not have a
release process or code reviews; it starts without extraneous design or styles
or templates; usually it starts without tests although in theory I like test-
driven development. When I say minimum viable product, I mean that word
minimum and I am somewhat loose on that word viable. If there is supposed to
be communication with a hypothetical API, that API does not exist and instead
there is a file containing some functions which return static JSON blobs that
it might have hypothetically tossed back in response. It is a frontend-first
design that has no backend.

And I keep negotiating what this product is with my stakeholders, until that
frontend has been massaged into something that they can use. Low latency in
learning what my tool-consumer wants is key, so I can't be making it expensive
to change my data model or the like. I want the complaints that “This tool is
extremely useful, I wish it looked pretty and saved my info from session to
session and had the latest data from our HR system” and whatever else it needs
to do to actually be properly viable.

I think that what I am doing is some variant of Domain-Driven Design?
Basically I am trying to suss out major product requirements from nontechnical
folks by having them interact with the product requirements as early as
possible, to see what those requirements imply and correct them again and
again. I want to have a technical model of how they look at the world which is
correct, first—and then when I am building the backend I can actually have a
properly principled approach to what I am building because I know what the
terms mean in this system.

~~~
nitely
> if you perform them in parallel switching between them, you get the benefit
> that at time 51 you can show both of the recipients that you are 25%
> complete, but you deliver A at time 199 and B at time 200.

That's not parallelism, that's concurrency. You are basically doing round-
robin. If they were done in parallel, then both tasks would get completed at
time 100. Throughput usually improves the otherwise maximum latency when there
is _at least some_ parallelism, otherwise I agree improving throughput would
not make a lot of sense in many cases.

> I think that what I am doing is some variant of Domain-Driven Design?

Sounds like iterative and incremental software development. I dare say Agile.

------
eps
On the first graph second "heat" should start with the first "drip." It will
still be a bit slower in total, but not by much.

It's a cute analogy, but it's not 100% accurate.

~~~
ummonk
Yup that is what I would do. Slightly higher latency for the first coffee but
the higher throughput is worth it.

Pipelining for the win.

------
benjohnson
From a one-piece-flow lean perspective that plays into agile development:

If you brew on cup at a time, your customer can tell you how to improve it
before you deliver the second cup.

~~~
roland35
This is a great way to start, then once you got it down start cranking that
coffee out!

------
forrestthewoods
Nice post. Thanks for sharing.

------
peterwwillis
tl;dr choosing between several options seems to depend on the reasons for the
choices

