
Surprising Economics of Load-Balanced Systems - MindGods
http://brooker.co.za/blog/2020/08/06/erlang.html
======
jacques_chester
This can be flipped into a simple heuristic known as the "square root staffing
rule" (sometimes "... law"). The name comes from call centres.

Basically, the number of servers you need to serve X amount of demand with Y
probability of queuing is not linear with X. It is proportional to the square
root of X.

The intuition is that a call centre agent is either talking to a customer, or
they are not. If they are not talking to a customer, then they can immediately
serve a customer. The odds that a given agent is talking to a customer is on a
probability distribution, usually assumed to be the normal distribution.

In a normal distribution, larger samples lead to more and more appearances of
outliers in the sample. More agents means more "outliers", in this case, more
idle agents. And that curve is not _linear_ , it follows the shape of the
normal distribution.

Where it sucks for call centres is that demand shows seasonality and shift
planning has relatively inflexible lead times. But in a software scenario we
can typically acquire additional capacity quickly, so precision in forecasts
is less of a problem.

A decent explanation: [https://www.networkpages.nl/the-golden-rule-of-
staffing-in-c...](https://www.networkpages.nl/the-golden-rule-of-staffing-in-
contact-centers/)

A short scenario: [https://www.xaprb.com/blog/square-root-staffing-
law/](https://www.xaprb.com/blog/square-root-staffing-law/)

A simple calculator linked from the short scenario:
[https://www.desmos.com/calculator/8lazp6txab](https://www.desmos.com/calculator/8lazp6txab)

~~~
hinkley
You can plan deferred maintenance and other tasks around seasonality.

For instance at a call center, the cost of training new people is not
something you want to pay while the phone is ringing off the hook, right? So
you hire too many new people during the preceding lull, train them up, and
hopefully the washout rate is low enough that you meet your hiring quota.

Similarly, if you own your own servers, you provision new hardware well in
advance of a predictable spike in traffic, and then you keep as many of the
old machines running as physics (eg, building thermal or electrical capacity)
or information theory allows, but you prioritize the new hardware because it
scales better vertically. Once the hype wears off you decommission them.

Blizzard reportedly used to do this for their MMO. Later they bragged about
architecture changes they made to increase capacity during peak load, but
that's not what I saw. What I saw was how much they could _decrease_ capacity
as active session counts declined. When they started, they sharded their
system. Then added more and more shards, availability zones, and regions. And
then that wasn't enough and too much at the same time.

Shards are probabilistically fair _if the incoming traffic doesn 't know about
them_. Blizzard named their shards, and so user clustering happens. So they
slowly moved a bunch of jobs to be serviced by a separate cluster of servers,
and shared those workers across multiple shards in the same AZ. Now instead of
each shard handling its own peak load, the hardware is more proportional to
the peak load of the AZ, which is going to behave more like root sum squared
(your square root) of the shards. If one shard declines or spikes it takes
from the pool.

If the pool in undersubscribed, you shut off servers to save money. But you
never shut off the servers that the customers know about, and have developed
an attachment to. That would cause a cascading failure.

~~~
jacques_chester
Seasonality is worth planning for, but the value of investing in plans for
seasonality depends very much on how quickly you can react to changes in
demand. Building data centres has a long lead time and costs a lot of money,
so spending a lot of money on forecasting demand is a worthy investment. On
the other hand, a new container process might take a few seconds to load, so
you can get away with more error.

There's one caveat, though, where planning for seasonality becomes valuable
again: competition for shared resources. Suppose you are using a cloud
provider to provision VMs on demand. If you're caught short by a stockout,
then fast reaction time now becomes a long reaction time. Setting an
approximate or safe base level of capacity on a schedule increases the odds
that you won't be caught dangerously short.

(The Blizzard observations were interesting, thankyou).

~~~
hinkley
> If you're caught short by a stockout, then fast reaction time now becomes a
> long reaction time.

I've seen a little evidence in Netflix's public statements that they have a
degree of load shedding. They've reserved enough machines to service all high
priority traffic, plus a couple of layers of buffer, and the buffer gets
parceled out for lower priority tasks as space is available, and buy either
more base load or more spot servers when that queue gets too long.

But... we have the cloud because people have been doing a simplistic version
of this for decades in private data centers, the person in charge gets build-
out fatigue, possibly before you were even hired, and you practically have to
harass them to get enough hardware, or give up and use more man-power to do
the same work a computer should be doing, while slowly you lose all respect
for the company.

Cloud is already built. You just show up and pay 140% of the amortized cost of
building and running the system, instead of the up-front cost. Now that person
is either easier to talk to or you can just go around them entirely.

As I saw someone say recently, "Love is not the most powerful force in the
universe, it's spite."

Where was I? Oh yes. So we _can_ do this, but to an extent the people who
manage to pull it off are remarkable, because so often it doesn't work out, or
they succeed in silence and go back to what they really were supposed to be
working on now that they've figured out how to get more out of their meager
capacity.

------
ezrast
If we were to extend the first graph a bit more to the right, the linear
improvement would quickly trend downwards into negative latencies, a sure sign
that it's not the right answer. But if linear is impossible, then super-linear
is just as impossible. The line the author describes as such is clearly
asymptotic.

If we were discussing overall latency, and not just queue delay, then an
asymptotic decrease towards zero could correspond to a super-linear increase
in throughput capacity. But that's not what's happening here, either: because
average latency is bounded below at one second, total throughput can never
exceed _c_. There is no super-linearity to be found here.

~~~
mjb
Right, 'asymptotic' is a better description of the behavior. I've updated the
post.

