
How to build a distributed throttling system with Nginx, Lua, and Redis - dreampeppers99
https://leandromoreira.com.br/2019/01/25/how-to-build-a-distributed-throttling-system-with-nginx-lua-redis/
======
cobbzilla
For those working in a Java JAX-RS environment and looking for an additional
rate filter on the app server itself, here is a similar Redis+Lua rate limiter
implemented as a Jersey/JAX-RS filter [1].

It supports multiple limits, for example max 100 requests/minute and
10000/day, etc. The lua magic is here [2].

[1] [https://github.com/cobbzilla/cobbzilla-
wizard/blob/master/wi...](https://github.com/cobbzilla/cobbzilla-
wizard/blob/master/wizard-
server/src/main/java/org/cobbzilla/wizard/filters/RateLimitFilter.java)

[2] [https://github.com/cobbzilla/cobbzilla-
wizard/blob/master/wi...](https://github.com/cobbzilla/cobbzilla-
wizard/blob/master/wizard-
server/src/main/resources/org/cobbzilla/wizard/filters/api_limiter_redis.lua)

~~~
zaroth
This isn’t using the same rate limiting approach as OP which follows the
CloudFlare suggestion of a simplified sliding window which counts
lastInterval/thisInterval instead of just a current interval counter that will
reset in a single instant.

For daylong limits the single interval count may be less than ideal if a burst
of activity happens to cross the reset threshold, the allowed rate will be
double the expected limit.

Also, I think a trivial optimization would be to remove the GET call and just
call INCR.

------
rogerdonut
Something very similar can be achieved in HAProxy using a powerful feature
called stick tables. [1] [2] [3]

[1] [https://www.haproxy.com/blog/introduction-to-haproxy-
stick-t...](https://www.haproxy.com/blog/introduction-to-haproxy-stick-
tables/)

[2] [https://www.haproxy.com/blog/bot-protection-with-
haproxy/](https://www.haproxy.com/blog/bot-protection-with-haproxy/)

[3] [https://www.haproxy.com/blog/using-haproxy-as-an-api-
gateway...](https://www.haproxy.com/blog/using-haproxy-as-an-api-gateway-
part-1/)

~~~
tjungblut
Did anyone here get the HAProxy stick table with peering to work on K8s? I
have a hard time making it work.

Or is there a k8s native alternative to this somewhere?

------
bratao
Awesome post, from a fellow Brazilian!

We did a very similar implementation (although not distributed) for a similar
problem, using Redis and Laravel.

We had MANY people crawling our website, and we would prefer that they use our
API for that. Using Redis we block IPs who accessed our website more than X
times not logged-in (200 URLs right now).

We also had the requirement that all good bots(Bing, Baidu, Google) should
pass-thru without blocks or any slowdown. Another requirement, was that those
good bots should be verified(Reverse & Forward DNS Lookup) before entering out
good bot list.

It is working great for our high-traffic website ( 2 Mi hits/day). You can
check our work here: [https://github.com/Potelo/laravel-block-
bots](https://github.com/Potelo/laravel-block-bots)

~~~
tyingq
_" Using Redis we block IPs who accessed our website more than X times not
logged-in (200 URLs right now)"_

Curious over what time period, and if the 200 accesses included any asset, or
only the main HTML of a page.

Edit: Your link shows a limit of 100 accesses per day.

------
adontz
Not to say they did anything wrong, great work! But if facing the same
problem, but for inhouse solution I'd consider using auth_requrest in the
first place.

[https://nginx.org/en/docs/http/ngx_http_auth_request_module....](https://nginx.org/en/docs/http/ngx_http_auth_request_module.html)

To me, advantage is archtectural, that I would not have specify which
parameters of request are considered or how are they processed. Disadvantage
is semantic, returning 403 instead of 429. But original article states
returning 403 anyway.

And also, regarding rate limiting by IP, I think it should be done for
10x-100x of single user limit, just as first line of defense. Also nginx rate
limiting has notion of burst which helps filter out "smart" crawlers, which
unlike users, send requests for hours.

~~~
batbomb
You can change a 403 to a 429 easily in conjunction with auth_request using a
named location.

One thing I've been doing recently is to deploy oauth2_proxy in _proxy_ mode,
protecting an authorization endpoint. The authorization endpoint can be custom
code in any langauge - it can have logic for rate limiting or anything else.
Together oauth2_proxy handles authentication, and the authorization endpoint
might do an LDAP lookup and limit access based on a group or something else.
If you want to return something other than a 401 or a 403 you have to do a
little more work.

Something along the lines of this:

    
    
         location / {
            auth_request /auth;
            auth_request_set $backend_status $upstream_status;
         }
    
         location = /auth {
              proxy_pass http://oauth2-proxy:8080/authz;
              proxy_pass_request_body off;
              proxy_set_header Content-Length "";
              proxy_set_header X-Original-URI $request_uri;
          }
    
         location @process_backend_error {
             return $backend_status;
         }

------
ddorian43
A more efficient (but no histogram) way would be native redis module (rust)
[https://github.com/brandur/redis-cell](https://github.com/brandur/redis-cell)

~~~
zaroth
Interesting option for higher performance with a single command (lower
latency).

However, this does not appear to work in a cluster.

~~~
dreampeppers99
The OP solution requires a single tcp rtt.
[https://github.com/leandromoreira/nginx-lua-redis-rate-
measu...](https://github.com/leandromoreira/nginx-lua-redis-rate-
measuring#pipeline-and-hash-tag)

------
chmod775
Probably better to use a redis hash "map" instead of multiple keys. Redis will
store these very efficiently too if you only have a few keys within it.

