
Scaling a High-Traffic Rate Limiting Stack with Redis Cluster - momonga
https://brandur.org/redis-cluster
======
jihadjihad
Redis IMHO is in the pantheon of excellent open-source projects, right up
there with the likes of HAProxy in terms of code quality, speed, and downright
reliability. 100% agree with the notion that more such building blocks need to
be built.

~~~
spmurrayzzz
Agreed. I'd throw nginx into that cohort as well.

~~~
sneak
Someone recently suggested that I read the nginx source code, as it was some
of the most comprehensible and clear C he'd ever seen. I can definitely cosign
that, having now done so. It's amazing!

~~~
meredydd
Hey, can I ask you what prompted you to just go reading nginx code? As pointed
out recently, most people - even people who advocate reading source code for
its own sake - don't actually read code unless they have to.
([http://akkartik.name/post/comprehension](http://akkartik.name/post/comprehension))

So it's interesting to see a counterexample. What led you to go spelunking in
nginx?

~~~
sneak
An associate was writing lua nginx stuff and needed to refer to the c to find
some event names or something. He noticed how nice it was and told me to take
a look because the quality impressed him so much, so I read it for fun.

Also, I have seen so many problems arise from the fact that code is easier to
write than to read, so I consciously make an effort to not avoid reading code,
and I’m sure that leaked over here.

------
papercruncher
We use Redis Cluster quite extensively. The one thing to be very cautious and
load test if running in a cloud environment is failover of nodes that are very
loaded in terms of keys. If your nodes are holding multiple GBs of data, and
depending on your persistence and other configuration settings, Redis may need
to hit the disk to recover. If you don't have enough IOPS provisioned, be
prepared for a long recovery time. The other thing that used to be problem but
is getting much better now is the maturity of the different client libraries
with respect to handling Redis Cluster specific idiosyncrasies.

~~~
chucky_z
I just got back from RedisConf and antirez brought up the idea (or that it's
already in-development... he was not clear) of releasing an official redis
cluster proxy for use with older/less-featured clients.

I believe it was brought up in the keynote (which I missed unfortunately), and
also as part of one of the Redis Clients talks.

------
chucky_z
Excellent article! The use of Lua solves a lot of potential issues here with
competing writes to similar spaces for rate limiting, causing potential
bizarre errors.

The one thing I would note that doesn't seem to be covered is if you are using
a relatively large Lua script and running eval over and over it's getting
cached every time, instead `SCRIPT LOAD ...` can be ran, which spits out a
sha1 which can then be ran with `EVALSHA (sha1) (keys) (args)`. This can
potentially speed stuff up as well as cutting back on memory.

~~~
hamandcheese
But requires extra logic and possibly tooling to do that correctly. The
scripts aren’t persisted iirc, so if a node restarts the script won’t be
loaded.

~~~
simonw
Client libraries can handle this automatically: you can send the EVALSHA
command and it will either execute successfully or reply with "I don't know
what that script is" \- then the client can re-send with the full script.

------
baconomatic
I couldn't agree more with "We need more building blocks like Redis that do
what they’re supposed to, then get out of the way." Redis has become such a
foundational piece of software for me and the projects I work on.

Plus, it's just plain fun to use.

~~~
dnomad
Frankly this strikes me as really hacky. A million operations a second isn't
even that much. Something like Chronicle [1] can do million _s_ of atomic
operations a second. A cluster of 10 nodes for what are basic in-memory
counters? And the wackiness of Lua scripts to read from the cache?

It all seems a bit much. I've solved similar problems in the trading space
(processing raw market data feeds) with much less.

It's interesting how different communities have their hammers and nails. Redis
seems to have really taken over certain consumer-web-oriented communities. In
other more enterprise communities I've seen people lean heavily on distributed
cache products like Hazelcast etc. And in trading this sort of thing is so
bread and butter and common that everybody has internal solutions.

[1] [https://chronicle.software/](https://chronicle.software/)

~~~
misterbowfinger
A lot of people also make a mistake of benchmarking Redis on multi-core
machines. Redis is single-threaded, so it'll never be faster if you have
multiple cores. To properly benchmark Redis on a multi-core machine, you have
to run more instances of Redis on the same machine (1 instance/core).

~~~
GauntletWizard
Yes; A redis cluster with 10 'nodes' can easily fit on a single machine. When
comparing redis performance, benchmarking apples-to-apples in per-core
performance is important.

------
dividuum
I wonder if this would also be a use case for foundationdb. All the
"clustering" would be built-in and performance seems to be quite good
([https://apple.github.io/foundationdb/performance.html](https://apple.github.io/foundationdb/performance.html)),
although probably not comparable to redis with configuration that accepts data
loss. Anyone has experience with that?

~~~
spullara
I've used it for similar things in the past. Best practice on FDB would be to
use snapshot reads on the counters and the add atomic mutation operation so
you never have conflicts.

~~~
dividuum
Thanks for your response. Interesting to know that this is indeed possible.
foundationdb looks amazing from what I've seen and played with so far.

------
sciurus
It's nice to hear a success story about Redis Cluster. When I worked at
Eventbrite we used Redis heavily, both for the usual use cases (caching,
ephemeral storage like sessions) as well as at the core of services like
reserved seating. We did our own sharding client side as a layer on top of the
redis-py library and relied on sentinel to handle failover. After Redis
Cluster was released, we had some interest in it, but were were nervous enough
about the limitations in its capabilities and the additional complexity of
operating it that we never experimented with it.

------
ttul
I fucking love Redis. We use it inside a large scale email sending platform to
do all manner of rate limiting and real time analysis of streaming data to
make routing decisions. Could not live without Redis.

------
garganzol
Author has an enjoyable writing style. Thumbs up for quality writing.

~~~
simonw
His blog is one of my favorites - so much good stuff on there. A few recent
highlights:

Touring a Fast, Safe, and Complete(ish) Web Service in Rust:
[https://brandur.org/rust-web](https://brandur.org/rust-web)

Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads:
[https://brandur.org/postgres-reads](https://brandur.org/postgres-reads)

Redis Streams and the Unified Log: [https://brandur.org/redis-
streams](https://brandur.org/redis-streams)

------
shizcakes
Another approach to this problem is to use Twemproxy:
[https://github.com/twitter/twemproxy](https://github.com/twitter/twemproxy),
which can be used like a sidecar Redis load-balancer.

~~~
sciurus
Similarly, Envoy has redis support that looks promising.

[https://www.envoyproxy.io/docs/envoy/v1.6.0/intro/arch_overv...](https://www.envoyproxy.io/docs/envoy/v1.6.0/intro/arch_overview/redis.html)

------
abalone
Silly question but any idea what tools were used to create the diagrams in
this post?

~~~
awshepard
Hazarding a guess, it looks like it might have been Monodraw, or something
similar.

~~~
baconomatic
Yep, it's Monodraw:
[https://twitter.com/brandur/status/928368179075678208](https://twitter.com/brandur/status/928368179075678208)

------
pulkitsh1234
More details on Stripe's rate limiter(s): [https://stripe.com/blog/rate-
limiters](https://stripe.com/blog/rate-limiters). An awesome gist is given at
the bottom too, which has implementations of the different rate limiters, And
also the `EVAL` part this post talks about.

------
xstartup
In adtech, we average over a 100 million operations per second and we don't
even touch redis.

We've been using Memcache all while and have no desire to change that.

~~~
zxcmx
This would be an interesting post if you mentioned _what_ you were doing 100
million times per second. How tangled are your writes? What are your
consistency requirements?

100 million set operations per second is not the same as 100 million counter
increments etc.

------
sandGorgon
isnt this the exact usecase that kafka solves ? Its great to see redis being
able to do it just as well as kafka probably.

I'm quite interested to see how they implemented a queueing solution without
the new Redis Streams infrastructure.

