
Load Balancing is Impossible [video] - thepumpkin1979
https://www.youtube.com/watch?v=gas2v1emubU
======
jondubois
It seems that the speaker is discussing small servers; 150 concurrent requests
per server/process sounds very small to me (but maybe this is common in
enterprise?); in this case, I can see why the variance introduced by the
random algorithm is such a problem.

But if you have an efficient system where each server can handle 10K+
concurrent requests, then the variance introduced by the random algorithm
becomes insignificant. Also, it doesn't matter that some requests take much
longer than others; if you're dealing with a lot of users per server, the slow
requests will be distributed evenly too and average out.

I implemented the balls-and-bins algorithm and I fond that when dealing with
few buckets and a large number of balls, the variance in load between buckets
became smaller. I even added randomness to the 'weight' of each request (to
account for slow vs fast requests) and the distribution was still even - The
more requests each server can handle, the more even the distribution is with
the random algorithm.

~~~
keegancsmith
10K+ concurrent requests to a single server sounds like a server that is
mostly shuffling bits around. I assume the 150 concurrent is more realistic
for a frontend server which actually does something.

Also note the speaker is a CTO of a CDN (fast.ly), I am guessing he has
experience with large concurrent requests as well :)

~~~
chrishacken
> 10K+ concurrent requests to a single server sounds like a server that is
> mostly shuffling bits around.

Isn't that all a load balancer is supposed to do? I certainly don't want my
load balancers performing computations or logic. I want it to pass that work
off to another server.

~~~
Jweb_Guru
If you actually intend to serve traffic back to the original requestor, you
can't have too many requests handled per server because you have a limited
amount of bandwidth. 150 requests per server on a 1Gbps pipe (which is what
you get on, for instance, a single ELB instance) only actually gives you about
813 KB/s per request for 150 requests/server. For 10,000 requests you'd be
down to 12 KB/s per request. For comparison, the _average_ web page size is
now up to 2048 KB!

To be clear, you can do a lot better with a better pipe, smart caching,
compression, etc. But people often have horribly unrealistic estimates about
how much traffic their servers can handle because they don't take bandwidth
into account, and load balancers are no exception.

~~~
ec109685
The HTML for an average web page is not 2048KB

~~~
Jweb_Guru
The average web page size is more like 2331KB now:
[http://www.httparchive.org/interesting.php?a=All&l=Apr%201%2...](http://www.httparchive.org/interesting.php?a=All&l=Apr%201%202016).
(I didn't say anything about HTML, by the way).

Of course, when you break it down by individual web request, _most_ responses
are still below 800KB, but you shouldn't load plan for the average case. And
clearly even the average case is well above 12KB, especially for a CDN (which
is responsible for serving the image, video, and large script content). I'm
also pretty confident the page I linked already includes compression (which
decreases size, but can increase time quite a bit; many people expect software
load balancers to be using the absolute fastest compression available, but
that's often not the case in my experience).

~~~
ec109685
But most of those resources are static and cacheable.

~~~
Jweb_Guru
Sure, if you actually set the headers correctly and the browser actually
retains them (recall that browsers are free to discard their caches at any
time). And, if they can't be cached on the user's computer, who caches them?
That's right, the CDNs. Like fast.ly. Who wrote the article.

~~~
ec109685
Regardless, the small pipe on your web server isn't serving the assets.

------
devy
One of the approaches and the last Tyler, the speaker, introduced in the talk
to improve the fat tail distribution of load balancing latency problem was
Join-Idle-Queue algorithm[1] that came out of UIUC Extreme Computing Group,
Microsoft Research and Azure in 2011.

And according to Tyler's data (fast.ly CDN data since he's the CTO there), it
beats all the current load balancing algorithms including Join-Shortest-Queue.
Six years later, I wonder:

a) was there any other novel / better algorithms tops JIQ in LB performance?

b) has Microsoft Azure LB uses JIQ internally in their LB offerings?

c) has any open source LB software implement JIQ algorithm on the horizon?
(according to Tyler, he found none.)

Can someone with expertise share some lights on these? Thanks.

[1]: [https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2011/10/idleq.pdf)

~~~
wmf
Google Scholar can come in handy here; there have been 117 papers citing JIQ
and some of them may have improved on it:
[https://scholar.google.com/scholar?cites=3283470738712232937...](https://scholar.google.com/scholar?cites=3283470738712232937&as_sdt=5,44&sciodt=0,44&hl=en)

~~~
devy
I am afraid that merely a citation reference doesn't tell the whole story. I
guess someone has to dig into those papers and implement them and then use
real world data to test and see, right?

~~~
faitswulff
If anyone _does_ want to dig in to the papers, a friend and I made a
discussion platform for research. I've added all the papers in Tyler's talk as
well as two he recommended on Twitter to a list here:
[https://www.projectcredo.com/briankung/server-load-
balancing](https://www.projectcredo.com/briankung/server-load-balancing)

I have zero background in academic computer science, so it may take me a while
to understand even one of these papers.

------
acqq
The slides are here:

[http://www.slideshare.net/Fastly/load-balancing-is-
impossibl...](http://www.slideshare.net/Fastly/load-balancing-is-
impossible-62833005)

Wrap up and Q&A:

[https://www.youtube.com/watch?v=gas2v1emubU&t=30m24s](https://www.youtube.com/watch?v=gas2v1emubU&t=30m24s)

The article from the slide 23:

[http://latencytipoftheday.blogspot.co.at/2014/06/latencytipo...](http://latencytipoftheday.blogspot.co.at/2014/06/latencytipoftheday-
most-page-loads.html)

"#LatencyTipOfTheDay: MOST page loads will experience the 99%'lie server
response"

~~~
idunno246
I thought the point from that article was a bit of a stretch, as most of the
requests are going to be async ad requests, or cdn requests, and hopefully
none of that blocks the user experience. I'd probably give it most sessions
would experience it, which is nearly as bad. overall though it was really
informative and interesting

~~~
majewsky
> hopefully none of that blocks the user experience

Let's hope for the best. _cough_

------
thomasahle
Very nice. I was expecting this to ignore all the research on things like
power of two choices and similar provable techniques, but the talk is actually
really thorough in its research.

It's very interesting to see what experiences people have with using all of
this in practice. The title seems a bit clickbaity though.

------
spydum
Good talk, I feel like I've tread exactly this same thinking and given a
similar talk internally (i accidentally ended up as the LB SME), even used
poisson distribution to model the requests and do some code demos.

He references a simple algorithm/paper on simply chosing between two servers,
and the results on the smoothing out of variance was impressive (paper:
[https://www.eecs.harvard.edu/~michaelm/postscripts/mythesis....](https://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.pdf)
).

So, no real conclusion, it's just interesting to peel back the layers of LB..
so many people treat it as magic, when really its a statistical lie - and
impossible to achieve perfect distribution (hence the title)!

------
toast0
Round robin is somewhat better than random here, because you get more uniform
distribution of requests to servers; but you do have to keep a counter and if
you have multiple load balancers, with few requests, there's a chance that
they'll get synchronized and you'll get bursts of traffic to one server, then
the next, etc.

You still end up with uneven load if the requests are not equal in load
generated (essentially always), but trying to be smart (choosing the least
loaded/queued server) doesn't really work (too much work on load balancer, too
much latency between request picking and load measuring, etc).

------
Pica_soO
Imagine a beach. A beach is made of particles of different sizes, aka length.
Now physics apply to this beach, as a line of "processors" that grab one
particle out of the array and release it to the future beach, taking the time
of length of the particle. There can be emptiness between particles.

First property of this, is "entropy" that the beach is self-sorting
interaction by length, as time taken to process if close to a different sized
particle.

Second property is the "light-speed" of this physical simulation. This is the
maximum particle length multiplied with the number of processors.

Third property is causality, causality is the range of the furthest out
particle reached by the light-speed length again multiplied by the light-
speed. Obviously causality can be reduced by reducing light-speed.

Fourth property is non-interaction, if you add traversable, unmoving particles
into the scheduler queues, this warps "time" as in, large particles are forced
to jump over them, while small particles take free particle-size time to
traverse, and as being non-interacting, are invisible to the huge particles.

Sixth property is determinism. The whole physic-simulation is precise and from
a deterministic input, we can derive a deterministic output, as long as
causality remains unviolated.

Now for scheduling, the data-physics are rather simplistic, as the processor-
front, or "time" is not repeatedly interacting with the same beach.We also can
determinate ahead of time, which processor interacts with what particle,
assuming light-speed is not altered. Also note, that big particle being
processed equal reduced light-speed.

Now what is optimal scheduling? Optimized for throughput, as in shoving huge
particles to dedicated processors, with huge fields of non-interacting
particles? Fair scheduling, as in having all particles having average
particle-size time going through? Prevent a causality buildup over lights-
peed?

PS: I once wrote a little visualization of these data-physics, and showed them
around algorithm class- nobody ever beside the prof cared for it. Nice to meet
people who are fascinated by mind games.

PS^2: You can use similar property's to compress data, via Conway's game of
life into deterministic lock-step simulations. Never saw that used in the
wild, as you need either pre-computated simulations to be fast, or some sort
of relational operator on some simple pre-set of simple simulations.

PS^3: Sorry, got carried away. This is fascinating. Should upload the
simulation to GitHub. Excuse any typos.

~~~
spacehacker
> Should upload the simulation to GitHub.

Sounds interesting. I'd fork it!

------
siscia
Great talk, however it is not explained the relationship between the Poisson
process and the log-normal distribution.

It actually skip that explanation.

------
RangerScience
Huh. It occurs to me that your brain needs to deal with a poisson process for
incoming information. How does it handle it?

~~~
crasm
Dropping packets.

~~~
witty_username
Lots. You don't notice every sensation of every part of skin and of every
sense. You're probably only focusing on specific part of a specific sense.

------
sjtrny
Why not just use a PID controller to load balance?

~~~
y4mi
thats afaik not done because of the performance overhead.

most requests take only a fraction of a second to answer. to actually monitor
threads on that frequency you'd need not only a significant portion for the
monitoring (ever looked at the CPU usage with running top?) but also for the
network communication to send that over to the lb, which needs to process the
data for load balancing

~~~
sjtrny
You wouldn't need to monitor individual threads and processes. Only average
response time for each server or total CPU load.

------
jest7325
Interesting talk and I am wondering if the op tried load-balanced algorithm
like "leastresponsetime" and "leastconnection"? Today LB are pretty advanced
and sometimes are even not even used in technology like ecmp.

------
bogomipz
In my experience given talks given by Fastly are always worth seeing or
attending.

If have the opportunity to attend a talk given by the founder Artur Bergman
don't miss it.

It will be as informative as it is funny and entertaining. There are many on
youtube from conferences past.

------
graycat
For the case of _load balancing_ as in the OP, for a statement of the problem,
over time, requests for work arrive at an Internet computer server farm. The
requests differ in what computing resources at the server farm are needed and
for each resource how much work. E.g., a request may need 10 GB of main memory
and 40 GB of virtual memory paged memory, 3 TB of I/O to rotating disk, and
four processor cores, all for 50 minutes.

So, at each arrival, have to assign the work to a server. When the assignment
is made, we do know about about the, say, _state_ of each of the servers, say,
what work has been assigned and how busy the server is.

The goal of the work is in some sense to get the best performance from the
server farm for the loads. There are various candidate definitions/measures of
performance, e.g., minimize the average elapsed time for the work of the
requests to be done, that is, minimize the average _response_ time as seen by
the users/customers.

So, the work of the load balancing is just the assignments, one work request
at a time, to the servers. This assignment is under _uncertainty_ maybe about
the details of each request and, so that can _plan ahead_ , also about what
the next requests will be.

E.g., if we had some idea that soon we would get a request that would need all
of one server, then maybe we should think ahead and have reserved an empty
server for that request. If we don't reserve such a server, then to assign
this request we might have to keep waiting until have an empty server and,
unless make an effort to have such an empty server, the request could go
unassigned for a long time thus hurting the goal.

So, we have to make decisions over time under uncertainty to achieve a goal
that is an average of the results.

Now likely we have a problem in the field of applied math called _stochastic
optimal control_. In our case the _control_ is the freedom we have when we
make the assignments. _Stochastic_ means varying with uncertainty over time.
The _stochastic_ part is the uncertainty of the next requests that arrive. The
_optimal_ part is that we want to do best as possible likely on some average
measure, only _average_ because we are working with uncertainty. Sure, if we
don't like average, we could use median, etc.

IIRC there was some work on this problem by some especially diligent
researchers at the IBM Watson lab and based in part on work of H. Kushner at
the Division of Applied Mathematics at Brown University.

IIRC the good news is that those researchers made good progress. IIRC the bad
news was that the computations for their solution were a bit too large to be
practical! Also, from the references I saw them using, the math prerequisites
for their work were a bit much!

So, maybe good work for a solution is in the literature but too difficult to
use. Maybe.

My suggestion: Be practical. First cut, assign the next request for work to
the least busy server. If the server farm is so busy that often all the
servers are too busy for more work, then have to leave requests in a FIFO
queue waiting until some server is not too busy for the first request in the
queue. Collect empirical, production data on this _control_ , the FIFO queue,
the response times, any problems etc. If see no significant problems, then
declare the problem solved for now. Else focus on where the problems are and
look for a control that also handles those problems.

This material about stochastic optimal control was heavily from R. Bellman.
Other authors include W. Fleming, D. Bertsekas, S. Shreve, E. Dynkin, R.
Rockafellar, R. Wets. The field has been of interest by parts of applied math
(e.g., Brown), operations research, and systems and electronic engineering.

This load balancing at a server farm contains as a special case some of the
topics in queuing theory.

Computer science has long been interested in the problem of _job scheduling_
on batch computer systems which, of course, is a relatively small special case
of load balancing as in the OP. IIRC the usual result of the job scheduling
work was that the problem was in NP-complete and otherwise in practice more
work to do than the work being scheduled! Maybe now computer science will also
be interested in something stochastic optimal control for load balancing.

------
abcdev
Great presentation!

------
RunawayGalaxy
Pretty sure that's a donkey.

------
barrystaes
SFW cause no stupid meme slides. So.. whats the conclusion of the video i did
not hear?

