
How to Design a Scalable Rate Limiting Algorithm - pidginbil
https://konghq.com/blog/how-to-design-a-scalable-rate-limiting-algorithm/#.WjlcxXHbXuU.hackernews
======
luckystarr
For a different view on the topic watch "Stop Rate Limiting! Capacity
Management Done Right"
[https://www.youtube.com/watch?v=m64SWl9bfvk](https://www.youtube.com/watch?v=m64SWl9bfvk)

The basic premise is not do do req/s limiting but rather concurrency limiting
which results in req/s limiting by itself.

Concurrency limiting is rather simple and doesn't require a lot of code
complexity.

~~~
zie
The problem here is, it depends on _WHY_ you are rate limiting. If it's just
to balance so you don't overload your backends, then absolutely this is a
great way to do it.

If however you are trying to limit client(s) because the service is an
authentication gateway for instance, then you want to limit user/pass requests
to X number then concurrency limiting isn't a good way to do that.

So you may need _both_ , depending on your use-cases, so it's not a one-size
fits all solution.

~~~
bogomipz
What is the use case for your second example? I am not following.

Wouldn't that just be rate limiting by client IP though?

~~~
dboreham
>Wouldn't that just be rate limiting by client IP though?

This won't defeat an adversary with a botnet.

~~~
heavenlyblue
But this is not something you deal with a rate-limiter. By your logic, then a
sufficiently large botnet would dwarf any live traffic whatsoever, implying
you’d probably want to filter them on network level instead.

------
jively
> A better approach is to use a “set-then-get” mindset, relying on atomic
> operators that implement locks in a very performant fashion, allowing you to
> quickly increment and check counter values without letting the atomic
> operations get in the way.

In a highly distributed system you’d probably want to avoid a centralised data
store altogether for fast moving data like rate limits. CRDTs and bucket
weighting might be a more effective strategy.

The article states that tracking per-node could cause a problem with race
conditions but that assumes it’s the counter that’s the problem. If the node
cluster is aware of the other nodes and the relative load of the cluster, you
can use this value to weight the isolated rate limiter and the only data that
needs to be shared can be broadcast between the nodes using a pub/sub
mechanism.

If some variance is permitted (+/\- a margin either side of the limit) then
having the nodes synchronise their “token weight” based on the size of the
cluster means that the nodes can then manage the rate limit in-memory without
ever needing to track in a data store.

It does trade-off accuracy, but for accuracy you can then revert to the set-
then-get centralised counter, the trade-off being performance because of
increased round trip time to the day store.

In most rate limit scenarios, at least from what we’ve seen, extreme accuracy
isn’t usually that important vs. being able to scale amd rate limit without
having to also scale a data layer to handle the counters.

------
daddykotex
Disregarding the article completely, I'll share my opinion on Kong because we
use quite a bit at my workplace. We use an old version (0.9.x as of right
now). The things I shared below might not be true anymore in new versions but
I can't tell and are regarding the plugin system.

IMO, the idea of taking things like authentication, rate limiting, etc in a
proxy is a wonderful idea ([https://2tjosk2rxzc21medji3nfn1g-wpengine.netdna-
ssl.com/wp-...](https://2tjosk2rxzc21medji3nfn1g-wpengine.netdna-ssl.com/wp-
content/uploads/2017/12/illustration2.png)). So in theory, the approach Kong
takes is wonderful, but I think the implementation is not so much.

Kong is layer over nginx and uses lua as a scripting language to add all sorts
of stuff. Quickly, you reach the limit of the plugins capability so you think,
well, I will write my own plugins. Well, to me it was a very unpleasing
experience. I found that the thing was hard to test and hard to maintain.
Maybe my criticism is more lua than Kong but since Kong relies heavily on lua,
there is not much I can do.

There is also magic happening. You declare a schema.lua files to configure the
store to hold your data. Then automatically you've got a DAO interface
available with a bunch of method on it to work with the store. You don't know
what methods are available or what arguments should be passed into these
functions.

Anyway, this is my take after spending quite a few hours working on home made
plugins in lua for Kong.

In the end, I'm glad Kong is open source and it's a great piece of software.
It helped us reduce our applications complexity but make sure you don't start
to shift to much logic into it because the plugin system can be hard to work
with.

~~~
khaledtaha
Have you found any viable alternatives to Kong?

~~~
johnbrodie
Not OP, but we are using Tyk (tyk.io) at my workplace. It's definitely more
user friendly than Kong, but we've had other issues with it. Having not used
Kong extensively but knowing what it is, I'll go out on a limb and say Kong is
likely more performant.

~~~
kylequest
Actually it should be the opposite. Definitely do your own benchmark, but take
a look at the latest numbers from Tyk and compare it with the latest numbers
from Kong to get a general idea.

~~~
coopr
This analysis was done by BBVA comparing Tyk and Kong performance
[https://www.bbva.com/en/api-gateways-kong-vs-
tyk/](https://www.bbva.com/en/api-gateways-kong-vs-tyk/) \- disclosure: I work
for Kong, but Kong (the company) had no involvement with the benchmarking in
that article.

~~~
jbramble81
Those BBVA benchmarks look very very wrong. BBVA achieved 3822 requests per
second with a 16Gb server running Tyk.

I used a 16Gb Digital Ocean box, and a separate box to run benchmarks from.

Tyk CE with keyless api.

docker run --rm williamyeh/wrk -t4 -c100 -d1m --latency --timeout 2s
[http://10.131.45.89:8080/tykbench/get](http://10.131.45.89:8080/tykbench/get)
Running 1m test @
[http://10.131.45.89:8080/tykbench/get](http://10.131.45.89:8080/tykbench/get)
4 threads and 100 connections Thread Stats Avg Stdev Max +/\- Stdev Latency
5.79ms 6.97ms 206.76ms 87.28% Req/Sec 6.35k 0.93k 10.89k 74.33% Latency
Distribution 50% 3.19ms 75% 6.28ms 90% 15.93ms 99% 31.14ms 1516573 requests in
1.00m, 185.13MB read Requests/sec: 25244.99 Transfer/sec: 3.08MB

As per above results, I got 25,244 requests per second. 90% of requests were
served with sub 15ms latency.

Another thing - you can freely scale with Tyk CE as much as you want at no-
cost. I currently run 3 gateways in prod and it doesn't cost a penny. The BBVA
article states that you need to pay Tyk to scale - simply not true:

>However, if we want to deploy a significantly larger number of microservices,
or simply ensure high-tolerance to failure, we need to scale up the number of
gateways. In the case of Kong, all it takes is adding a new node and
connecting it to the database. With Tyk we can do the same, although it will
require us to pay for the service. This could be a determining factor when
choosing what API Gateway to use.

I went through process picking API gateway about 3 months ago. Tyk is not
without it's warts, but has far more out-of-box features baked in (inc
distributed rate limiting) than Kong. And if custom plugins are required, I
can choose between lua, python, js or anything that supports gRPC. With Kong -
stuck with Lua.... no thank you.

I guess my point here is that if Tyk not performant enough for you, just add
another gateway. And if I need custom functionality - I don't need to learn
Lua.

------
forgotpassagan
This may be how Kong does it but it's not really 'high performance'. The right
way to do rate limiting is to limit by IP using a counting Bloom Filter or
Cuckoo filter along with random samples. When you hit a false positive then
you have a second normal rate limiter to 'mop up' IPs that are over the first
limiter.

This doesn't give you a hard exact limit but gets the job done storing far
less state. You also need to bucket by IP sub-ranges in IPV6 to stop people
crap flooding you with tons of unique IP's

------
thelicx
I have been using Kong since 2016 on various APIs and I have been impressed by
the continuous increase of performance after every release. I wish upgrades
were easier but I know they are working on it (the "migrations" between each
major version), but overall it's a fast and pluggable layer for any API inside
or outside the firewall, especially if you are running containers.

On this note for those of you who haven't noticed it yet, they have released
an Alpine version of their Docker image, but it's still not the default one. I
would actually recommend using it to further reduce the size of Kong
containers: [https://konghq.com/blog/kong-alpine-
docker/](https://konghq.com/blog/kong-alpine-docker/)

------
yread
Nice article (although the illustrations aren't very... illustrative). Anybody
knows how does Kong compare with WSO2 API Manager?

------
RealNeatoDude
Excellent article! One question though:

> A better approach is to use a “set-then-get” mindset, relying on atomic
> operators that implement locks in a very performant fashion, allowing you to
> quickly increment and check counter values without letting the atomic
> operations get in the way.

Can you elaborate on this? Why is it more performant? And what are the trade-
offs vs. get-then-set?

~~~
graphememes
Was passing by and saw this, a better explanation and in-depth analysis can be
found here:

> [https://blog.figma.com/an-alternative-approach-to-rate-
> limit...](https://blog.figma.com/an-alternative-approach-to-rate-
> limiting-f8a06cf7c94c)

------
dmaumenee
The performance degradation of doing strict rate limit relying on Kong Cluster
data store (Postgres or Cassandra) is amplified by the lake of database
connection pool management. Each incoming http request induce the creation of
a new database connection, this process is very expensive especially with
Cassandra when authentication is enabled.

------
dmaumenee
I use Kong EE for a client, I have read Kong EE documentation carefully and
made a lot of tests. The current implementation (0.29) have this behavior
(that not meet our needs).

1) All incoming requests are take into account, including those which have
been rejected (with 429 error). If the consumer exceed his limit during many
consecutive time windows, all the requests of the consecutive time windows
will be rejected (with 429 error).

2) For a windows size of 1 second, the computed weight of the previous windows
is always 100%. If the limit was reach during the previous second, all
requests made in the current windows will be rejected (with 429 errors).

------
jdwyah
One thing that RateLim.it uses to its advantage is calculating the
“nextPossiblePass” for each limit. This allows clients to cache forever that
something is over the limit until time X and not have to make another request.

In the bursty case this let’s clients effectively short circuit and protect
the system.

Disclaimer: I work on
[https://www.ratelim.it/documentation/basic_rate_limits](https://www.ratelim.it/documentation/basic_rate_limits)

