
Join-Idle-Queue: Load Balancing Algorithm for Scalable Web Services (2011) - devy
https://www.microsoft.com/en-us/research/publication/join-idle-queue-a-novel-load-balancing-algorithm-for-dynamically-scalable-web-services/
======
openasocket
This paper largely focuses on percent load and mean service latency as metrics
to optimize. But in web services you often care more about the tail-end
latency, the p90, p99 etc. The math for this paper is a lot simpler because
for mean service latency they only need to concern themselves with the mean
service time for jobs, not the whole distribution. That seems like it would
make some interesting follow up work to this paper.

Another thing is I think this paper is assuming the latency for joining idle
queues is zero. This is another simplification that makes the math easier,
because otherwise you have to worry about a processor sending a message to
join an idle queue, but before that message gets sent the processor gets more
work and is no longer idle. It would be interesting to see what impact that
might have on performance.

I'm not too familiar with queuing theory, can anyone point me to work has been
done on long tail distribution for this or other load balancing strategies?

~~~
emfree
> But in web services you often care more about the tail-end latency, the p90,
> p99 etc.

For sure. I think Theorem 2 in the paper implicitly addresses the latency
distribution in this scheme. They're saying that in the limit of a large
system, the queue length distribution at a single backend server depends only
on the service time distribution (how long it takes to actually process each
job) and the service discipline. So if for example job sizes are exponentially
distributed and handled in FIFO order, then the wait time distribution is also
exponential.

It would certainly be nice to see a more explicit discussion of the tail
latency, especially in the simulations the authors did.

------
julienchastang
This reminds me a bit of work stealing algorithms implemented (almost 10 yrs
ago, now) in Java with dequeue data structures and the fork/join framework
[1].

[1]
[http://www.ibm.com/developerworks/library/j-jtp11137/](http://www.ibm.com/developerworks/library/j-jtp11137/)

------
andyfleming
I imagine this thread was inspired by the recent "Load Balancing is Impossible
[video]" post
([https://news.ycombinator.com/item?id=13136536](https://news.ycombinator.com/item?id=13136536)).

~~~
devy
Yep

------
siscia
Believing that this link was actually inspired by the recent "Load Balancing
is Impossible [video]" post
([https://news.ycombinator.com/item?id=13136536](https://news.ycombinator.com/item?id=13136536)).

Before to implement Join-Idle-Queue we should implement "power of two choice"
algorithm.

Somebody would pay 1000 $/year to have an nginx/HAProxy implementation that
just need to be plugged in? Following a sidekiq price model?

If so, write me an email and when we come back from the holidays you would
have an actual implementation, tested and ready to use.

------
allan_s
As the algorithm is from 2011, so now 5 years old, about something trendy
(scalable web services), I'm wondering

1\. has it been implemented in some popular opensource
library/framework/service ? 2\. if not why ? because it's not as practical as
it seems, there's something more efficient exists now ?

~~~
draaglom
1\. It's not in HAProxy or nginx, at least

2\. Random / round-robin are "good enough" in a lot of cases - most people
aren't running at 70%+ utilization

~~~
devy
> 1\. It's not in HAProxy or nginx, at least

Yep. I found this from a recent talk about Load Balancing algorithms[1] and
according to Tyler, the speaker and CTO of Fast.ly, JIQ has not been
implemented in HAProxy or Nginx.[2][3]

> 2\. Random / round-robin are "good enough" in a lot of cases - most people
> aren't running at 70%+ utilization

It depends how you define what "good enough" is and the size of system. In the
aforementioned talk, Tyler had exemplified that 1% traffic with extreme
latency will be exacerbated by the real world experience while a page load
with hundreds of requests.[4]

[1]:
[https://news.ycombinator.com/item?id=13136536](https://news.ycombinator.com/item?id=13136536)

[2]:
[https://youtu.be/gas2v1emubU?t=1845](https://youtu.be/gas2v1emubU?t=1845)

[3]:
[https://youtu.be/gas2v1emubU?t=1895](https://youtu.be/gas2v1emubU?t=1895)

[4]:
[https://youtu.be/gas2v1emubU?t=1070](https://youtu.be/gas2v1emubU?t=1070)

------
euphoria
Shouldn't this have a (2011) tag added to the title?

~~~
dang
Added, and since the title was too long to fit, we took out 'novel' too.

