
Concurrency Limits by Netflix - rshetty
https://github.com/Netflix/concurrency-limits
======
jsnell
The blog post announcing this might be a better read:

[https://medium.com/@NetflixTechBlog/performance-under-
load-3...](https://medium.com/@NetflixTechBlog/performance-under-
load-3e6fa9a60581)

(And while reading that and thinking "I swear I recently read something else
talking about applying TCP congestion control to RPC queueing". And indeed I
had: [http://www.evanjones.ca/prevent-server-
overload.html](http://www.evanjones.ca/prevent-server-overload.html))

~~~
cperciva
_I swear I recently read something else talking about applying TCP congestion
control to RPC queueing_

I did this almost 10 years ago for accessing Amazon SimpleDB, too:
[http://www.daemonology.net/blog/2008-06-29-high-
performance-...](http://www.daemonology.net/blog/2008-06-29-high-performance-
simpledb.html)

~~~
Twirrim
Had a co-worker implement something similar to this dynamic throttling in a
back-end analytical process, based around access of a DynamoDB table that was
accessed by multiple front end services.

The code on the back-end processing software would throttle itself back hard
(about 50% of the speed it had reached) whenever it ran into a DynamoDB
throttle message as a response, and then would ramp itself back up steadily.
Combined with good retry logic on the front end services, it meant we could
keep the DynamoDB table humming along at near maximum usage.

------
ramchip
This reminds me a lot of the sbroker library in Erlang:

> SBroker is a framework to build fault tolerant task schedulers. It includes
> several built in schedulers based on TCP (and other network scheduling)
> algorithms.

> [...] in an ideal situation a target queue time would be chosen that keeps
> the system feeling responsive and clients would give up at a rate such that
> in the long term clients spend up to the target time in the queue. This is
> sojourn (queue waiting) time active queue management. CoDel and PIE are two
> state of the art active queue management algorithms with a target sojourn
> time, so should use those with defaults that keep systems feeling responsive
> to a user.

[http://www.erlang-factory.com/euc2017/james-fish](http://www.erlang-
factory.com/euc2017/james-fish)

[https://github.com/fishcakez/sbroker](https://github.com/fishcakez/sbroker)

------
lkrubner
I like the sounds of this:

" _Executor -- The BlockingAdaptiveExecutor adapts the size of an internal
thread pool to match the concurrency limit based on measured latencies of
Runnable commands and will block when the limit has been reached._ "

I'm often surprised this kind of auto-scaling thread pool is not a more common
thing in Java land.

~~~
tyrankh
Agreed - not trying to trivialize the work here, but this thing seems solved
with tools at hand. Perhaps I miss some of the subtlety in their usecase,
though.

~~~
otterley
Which tools are you referring to, exactly? Some citations would be helpful.

------
baybal2
PROTIP: Managing congestion control on application level is not a good
business. A better idea is to leave it to the edge and CDN, while the app
level uses computationally cheap optimistic algos since the com in between the
app and your edge will go over your own high quality infrastructure.

This adds flexibility to allow for use of multiple algorithms in different
load balancing regions (mobile as a lossy fabric is better to stay with
conventional, desktop and server clients can use a some smarter throughput
maximizing algo, and in countries with high percentages of connections being
laggy DSLs, you can use something else )

~~~
otterley
Can you refer us to any papers or analyses that support your claim?

~~~
baybal2
No

------
time0ut
This library looks very interesting. I've used a similar approach for pulling
batches of items from a queue (e.g. discover the optimal batch size and inter-
poll wait time). There are plenty of other places we could benefit from
something like this. I can't wait to try this out.

Netflix puts out some amazing Java libraries. I've had excellent results using
Hystrix [0]. It has been an excellent addition to our systems.

[0] [https://github.com/Netflix/Hystrix](https://github.com/Netflix/Hystrix)

------
Anderkent
This is a pretty cool design if your requests to a given endpoint are supposed
to all take about the same time. It's not easy to see how you'd adjust it for
things with more variance; perhaps rather than using the fastest seen mtt, you
could look at your p99 over the last N minutes, and see if it's been changing?

~~~
nvarsj
I think you bring up the main problem w/ using tcp vegas. It's not clear to me
this will work with heterogenous requests. If the typical request time
distribution is long tailed, it might never increase the window size.

~~~
elandau
Even with heterogenous workload there normally is a uniform distribution of
request types. Instead of generating complex statistics for average latency or
tail latencies, especially for multimodal distributions, we just look at the
minimum latencies as a proxy to identify queuing. So, when there is any
queuing for whatever reason (increased RPS or latency in a dependent service)
all latency measurements will show an increase, especially the minimum.

~~~
nvarsj
"uniform distribution of request types" \- okay, it makes sense in that
context. Although if that assumption breaks down, your thread limits may
become under or over provisioned.

I'm wondering though - how do you pick the right alpha and beta values? It
seems like you need to do testing/validation to ensure you use the right
values, right?

Sorry if I'm sounding critical by the way. I think this is a really cool
project - thanks for open sourcing it!

------
fwef
Would this work when connecting multiple threadpools (java executors)? Imagine
I have a microservice where first threadpool downloads large files to disk (IO
bound) and another threadpool that processes the downloaded data (CPU bound).
Those two threadpools communicate with a bounded queue. Will using the
concurrency-limit Executor allow me to get the best throughput in this
scenario?

------
bvod
Can anyone explain how they decide which requests to reject? The blog post
just mentions that excess RPS gets rejected, but couldn't rejecting arbitrary
requests cause other problems?

~~~
nvarsj
My guess is they use a 0 / small queue in front of the request pool. If queue
is full (indicating the server is at its concurrency limit), it returns a 429
(which is sort of weird - return a 503 instead). I don't think that is part of
the library though - the library just provides the low level bricks.

------
0354305
I'm not grasping the concept; Do I use this in production to manage
concurrency, or do I use this in 'testing' to fine tune my system?

~~~
michaelmior
You use this on production to allow either the server or client to self
determine optimal concurrency levels.

~~~
99300432
Thanks

------
stephen123
I would have thought reactive streams or back pressure would be the usual way
to deal with this issue.

Is this better ?

~~~
lllr_finger
Netflix's circuit breaking OSS project, Hystrix, is commonly seen alongside
Ratpack/rxjava where both reactive streams and back pressure are in play. I
don't think you're wrong, but Netflix and others using their solutions like
Hystrix and Hollow, are outside of what I'd consider "usual" problems and
solutions.

~~~
nvarsj
I don't think this library is meant to be used w/ reactive streams. It talks a
lot about limiting number of concurrent threads, so it sounds more like
traditional RPC with a request pool that they are trying to size to inform
clients to back off (by returning 429).

