But if you have an efficient system where each server can handle 10K+ concurrent requests, then the variance introduced by the random algorithm becomes insignificant. Also, it doesn't matter that some requests take much longer than others; if you're dealing with a lot of users per server, the slow requests will be distributed evenly too and average out.
I implemented the balls-and-bins algorithm and I fond that when dealing with few buckets and a large number of balls, the variance in load between buckets became smaller. I even added randomness to the 'weight' of each request (to account for slow vs fast requests) and the distribution was still even - The more requests each server can handle, the more even the distribution is with the random algorithm.
Also note the speaker is a CTO of a CDN (fast.ly), I am guessing he has experience with large concurrent requests as well :)
It may be less with HTTPS, depends on how efficient the CPU is at cryptographic algorithms and how many cores.
fastly is a CDN. I assume that they're talking about the performances of their CDN servers. Not the web servers of the clients that will process the request, and is indeed a lot slower operation.
Also, it'd have to be a server with good async support like Node.js, Go, Haskell, Scala, Tornado, nginx...
Considering that the speaker is CTO at a CDN company (which has to deal with a lot of different kinds of back-ends that are outside of their control), it makes sense that they would need to use and algorithm which can handle all possible scenarios - They can't force their customers to use bigger servers and more efficient systems.
Isn't that all a load balancer is supposed to do? I certainly don't want my load balancers performing computations or logic. I want it to pass that work off to another server.
To be clear, you can do a lot better with a better pipe, smart caching, compression, etc. But people often have horribly unrealistic estimates about how much traffic their servers can handle because they don't take bandwidth into account, and load balancers are no exception.
Of course, when you break it down by individual web request, most responses are still below 800KB, but you shouldn't load plan for the average case. And clearly even the average case is well above 12KB, especially for a CDN (which is responsible for serving the image, video, and large script content). I'm also pretty confident the page I linked already includes compression (which decreases size, but can increase time quite a bit; many people expect software load balancers to be using the absolute fastest compression available, but that's often not the case in my experience).
And according to Tyler's data (fast.ly CDN data since he's the CTO there), it beats all the current load balancing algorithms including Join-Shortest-Queue. Six years later, I wonder:
a) was there any other novel / better algorithms tops JIQ in LB performance?
b) has Microsoft Azure LB uses JIQ internally in their LB offerings?
c) has any open source LB software implement JIQ algorithm on the horizon? (according to Tyler, he found none.)
Can someone with expertise share some lights on these? Thanks.
The idea being that, on top of a general purpose message stream, you overlay a back-channel where the stream receiver can signal to the producer how many more messages it's ready to handle.
Effectively what this does is moving the implicit back-pressure you get from network buffers up to an application level thing, where something like an app server can reason about how much load it's willing to take on.
Shameless plug: If you think that seems cool and are into Go, I wrote a client+server impl of ReactiveSocket that I'd love API feedback on from more experienced Go devs.. https://github.com/jakewins/reactivesocket-go
I have zero background in academic computer science, so it may take me a while to understand even one of these papers.
Wrap up and Q&A:
The article from the slide 23:
"#LatencyTipOfTheDay: MOST page loads will experience the 99%'lie server response"
Let's hope for the best. cough
The end of the article has an edit saying how time-correlated events would throw off the math, but it also would be thrown off by the real-world case that all CLIENTs are not created equal, either. If there is a slow connection between a client and a server, their requests will be served slower.
It's very interesting to see what experiences people have with using all of this in practice. The title seems a bit clickbaity though.
He references a simple algorithm/paper on simply chosing between two servers, and the results on the smoothing out of variance was impressive (paper: https://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.... ).
So, no real conclusion, it's just interesting to peel back the layers of LB.. so many people treat it as magic, when really its a statistical lie - and impossible to achieve perfect distribution (hence the title)!
You still end up with uneven load if the requests are not equal in load generated (essentially always), but trying to be smart (choosing the least loaded/queued server) doesn't really work (too much work on load balancer, too much latency between request picking and load measuring, etc).
First property of this, is "entropy" that the beach is self-sorting interaction by length, as time taken to process if close to a different sized particle.
Second property is the "light-speed" of this physical simulation. This is the maximum particle length multiplied with the number of processors.
Third property is causality, causality is the range of the furthest out particle reached by the light-speed length again multiplied by the light-speed. Obviously causality can be reduced by reducing light-speed.
Fourth property is non-interaction, if you add traversable, unmoving particles into the scheduler queues, this warps "time" as in, large particles are forced to jump over them, while small particles take free particle-size time to traverse, and as being non-interacting, are invisible to the huge particles.
Sixth property is determinism. The whole physic-simulation is precise and from a deterministic input, we can derive a deterministic output, as long as causality remains unviolated.
Now for scheduling, the data-physics are rather simplistic, as the processor-front, or "time" is not repeatedly interacting with the same beach.We also can determinate ahead of time, which processor interacts with what particle, assuming light-speed is not altered.
Also note, that big particle being processed equal reduced light-speed.
Now what is optimal scheduling? Optimized for throughput, as in shoving huge particles to dedicated processors, with huge fields of non-interacting particles?
Fair scheduling, as in having all particles having average particle-size time going through?
Prevent a causality buildup over lights-peed?
PS: I once wrote a little visualization of these data-physics, and showed them around algorithm class- nobody ever beside the prof cared for it. Nice to meet people who are fascinated by mind games.
PS^2: You can use similar property's to compress data, via Conway's game of life into deterministic lock-step simulations. Never saw that used in the wild, as you need either pre-computated simulations to be fast, or some sort of relational operator on some simple pre-set of simple simulations.
PS^3: Sorry, got carried away. This is fascinating. Should upload the simulation to GitHub. Excuse any typos.
Sounds interesting. I'd fork it!
It actually skip that explanation.
most requests take only a fraction of a second to answer. to actually monitor threads on that frequency you'd need not only a significant portion for the monitoring (ever looked at the CPU usage with running top?) but also for the network communication to send that over to the lb, which needs to process the data for load balancing
If have the opportunity to attend a talk given by the founder Artur Bergman don't miss it.
It will be as informative as it is funny and entertaining. There are many on youtube from conferences past.
So, at each arrival, have to
assign the work to a server.
When the assignment is made,
we do know about about the, say,
state of each of the servers,
say, what work has been assigned
and how busy the server is.
The goal of the work is in some
sense to get the best performance
from the server farm for the loads.
There are various candidate
definitions/measures of performance,
e.g., minimize the average
for the work of the requests to
be done, that is, minimize the
average response time
as seen by the users/customers.
So, the work of the load
balancing is just the
work request at a time,
to the servers. This
assignment is under
uncertainty maybe about
the details of each request
and, so that can
plan ahead, also about what
the next requests will be.
E.g., if we had some idea
that soon we would get a
request that would need
all of one server, then
maybe we should think ahead
and have reserved an empty
server for that request. If
we don't reserve such a
server, then to assign
this request we might have to
keep waiting until have
an empty server and, unless
make an effort to have
such an empty server,
the request could
go unassigned for a long
time thus hurting the goal.
So, we have to make decisions
over time under uncertainty
to achieve a goal that is
an average of the results.
Now likely we have a problem in
the field of applied math called
stochastic optimal control.
In our case the control is the freedom
we have when we make the
means varying with uncertainty
over time. The stochastic part
is the uncertainty of the
next requests that arrive.
The optimal part is that
we want to do best as possible
likely on some average measure,
only average because we
are working with uncertainty.
Sure, if we don't like average,
we could use median, etc.
IIRC there was some work on
this problem by some especially
diligent researchers at the IBM
Watson lab and based in part on
work of H. Kushner at the
Division of Applied Mathematics
at Brown University.
IIRC the good news is that those researchers made good
progress. IIRC the bad news was
that the computations for
their solution were a bit
too large to be practical!
Also, from the references I
saw them using, the math
prerequisites for their
work were a bit much!
So, maybe good work for
a solution is in the literature
but too difficult to use.
My suggestion: Be practical.
First cut, assign the next
request for work to the least
busy server. If the server farm
is so busy that often all the
servers are too busy for more
work, then have to leave requests
in a FIFO queue waiting until
some server is not too busy
for the first request in the queue.
Collect empirical, production data on this control, the FIFO queue,
the response times, any
problems etc. If see no significant
problems, then declare the problem
solved for now. Else focus on where
the problems are and look
for a control that also handles
This material about stochastic optimal control was heavily from R. Bellman. Other authors include W. Fleming,
D. Bertsekas, S. Shreve, E. Dynkin, R.
Rockafellar, R. Wets. The field has
been of interest by parts of
applied math (e.g., Brown), operations
research, and systems and
This load balancing at a server
farm contains as a special
case some of the topics
in queuing theory.
Computer science has long
been interested in the
problem of job scheduling
on batch computer systems
which, of course, is a
relatively small special
case of load balancing
as in the OP. IIRC the usual
result of the job scheduling
work was that
the problem was in NP-complete
and otherwise in practice
more work to do than
the work being scheduled!
Maybe now computer science
will also be interested in
optimal control for