
Little’s Law: An insight on the relation between latency and throughput - nkurz
http://blog.flux7.com/blogs/benchmarks/littles-law
======
jbert
To me, the interesting thing in terms of building reliable, performant systems
is what do you want to do when the request rate (incoming desired throughput)
exceeds your possible throughput.

Broadly, you have to choose between quickly failing the additional requests,
queueing them up yourself (have a fixed number of requests you process at a
time) or you just attempt to process them as they come in.

\- If you don't plan for this eventuality you'll basically end up doing the
latter option.

This will overwhelm your processing step, which will then typically cause the
latency of _all_ requests you are processing to up. (e.g. because you're over
your working RAM set, CPU budget etc). Worse, it will go up in unpredictable
ways, often with a sharp hockey stick curve.

\- If you queue up requests, you're adding latency to everything right there.
Your backend is still cranking away at the same low latency, but your system
latency now additionally has the queue dwell time added in. On the plus side,
you'll avoid unpredictable performance "cliffs" where everything drops down to
nothing. On the downside, if your queue gets too long, you'll end up in the
unhappy state of very quickly processing stale requests which no-one cares
about.

\- Simply failing the additional requests protects your system, but exposes
error conditions externally. (Of course, so does having an 10s HTTP request
time...)

Hybrid approaches can work well. (e.g. running a managed queue but cull the
queue (and fail the associated request) before you get into the "queue of
death" scenario).

Aaaaand....every approach apart from "just try to process it" requires you to
have an understanding of your backend capacity.

You can to test and measure to find a static number, but an additional problem
is that the load caused by real-world requests can differ from your test load.

And your backend capacity can be affected by failing components, high load on
adjacent components, code deployment, backup, cold cache restart, etc etc.

So it's best probably to have a queueing/failing strategy which takes into
account the real-world health and latency of your backend.

Which appears to handle all cases, except that due to the sharp hockey-stick
latency in response to overload, by the time you detect a problem it may be
too late.

Fun.

------
robbfitzsimmons
Amazing how broadly applicable Little's Law is. (I suppose that's why it's a
big-L law.)

We were just recently using it in business school to look at cycle times in
assembly line output.

~~~
bake
Very versatile -- It can also be applied to drug R&D organizations (one Dr.
Jeffrey Low has proposed such an analysis)

~~~
graffitici
The reason for this broad applicability is because it holds for any queueing
system. The beauty of the law is that it holds for any distribution of
arrival, occupancy, and latency. Just multiplying the averages gets the job
done!

