I say this frequently both online and when discussing system design with newer devs, but will repeat here: of all the production issues I've debugged, the culprit has has never been redis. In fact, redis has been a critical piece of achieving cost-effective scaling. It is one of only two pieces of software (along with postgres) that I blindly recommend without any caveats. From following along here and on your blog about how you approach things and think about the software, I think its clear that you and your vision for the project are a large factor of why it has been so reliable.
Thank you antirez!
You'll love it until you don't.
The scaling and clustering story is not nearly as nice as the quick start.
It's definitely worth recommending... with caveats.
A half-ton pickup truck works well until you need to haul bigger loads. At some point, you either need to haul smaller loads or change to a different, bigger truck.
That seems to me an argument that should be pointed at cloud providers rather than Redis itself ;)
This would be a significant down side to using redis at all, except you can get away with not caring if you out source the problem with your credit card.
@sethammons "... sharded to (currently) six clusters ..."
I wonder, when did you start sharding? How many GB memory did your previously single-machine-Redis-instance use, before it was time to shard? How much memory does each "sharded node" have?
(It wasn't the CPU (Redis single threaded) that forced you to shard? But because you needed more memory? (Or sth else?))
> "nearly 200 redis nodes"
How much memory in each node, if I may ask?
Sharding is always something you do on day 1 for every project, because a single-machine-Redis-instance is a SPOF and won't pass even the most basic operational readiness checklist.
> (capacity planning questions)
The answers to these questions don't generalize, they depend on your workload. You should figure out some approximation of your data types and request profiles, and use those to get a rough understanding of what one Redis server on a given machine class is capable of delivering. This usually means applying a few different classes of read/write loads against a small cluster at steadily increasing RPS, and documenting how CPU, memory, and latency characteristics change. From these numbers it's possible to derive a capacity plan.
The last time I did this, for my workload and machine class, one Redis instance (one core) delivered 250-500k RPS, invariant to memory used. We'd use the conservative end of that range, combined with the RPS and data set growth rates we predicted, to provision for ~12 months of growth. Operationally, we would deploy (cores-1) or (cores-2) instances per host (I forget exactly) on 32- and 64-core machines. I think they had like 64-128G of RAM, and we made sure to leave enough memory overhead so the AOF or whatever persistence option wouldn't lock up the box. But even the choices of what class of machines to use is a function of your use case, if you have really large dataset with relatively non-costly (CPU) operations, you want a totally different machine profile than a relatively small dataset with complex operations. Availability SLOs also factor in.
All of this is basic operational stuff, and with Redis the answers are pretty well-understood and highly predictable. Completely opposite to systems like Elasticsearch, which was a total nightmare to predict, provision for, and operate.
Cannot agree with that. If one does that or not, would depend on things like Service Level Agreements (SLA) — and one can have a pretty high SLA uptime %, without sharding, e.g. if there's an underlying pretty stable hosting provider that live migrates if there's a hardware failure.
Thanks for writing about the last time you used Redis. Interesting to hear that, in that case, the machines had sth like 64 – 128 RAM, and 250k – 500k RPS. Yes I agree that I'd need to benchmark and think about what type of machine(s) to use (some time later — a bit too early for that now). Sounds as if you are / were a pretty large company / project, needing that much memory and machines :- )
If you have one instance you better be damn sure that downtime doesn’t cause an an outage. IE. no redis means services still run.
The “it’s fine, the SLA covers outages” is just a) laziness and b) negligence.
You don’t do that with web servers, you don’t do it with databases. You don’t do it with your redis instance either.
...well, I suppose some people will do the it anyway; but you get what you get if you do.
If a major outage gets you fired, you have no o w to blame but yourself.
Sorry, no. A single instance is never acceptable from any risk perspective, no matter what guarantees the hosting provider claims.
Redis is one such tool--its clustering "story" may not be ideal but even a small, minimal cluster will be more than sufficient for any use-case that a distributed memory cache is fit for at a scale that will cover 99% of business' needs.
~90k/sec on a single core of L5520. Scaling Redis is a premature optimization for 99.99% of the use cases. While it may be sexy to talk about millions of ops per second that one's project does in reality the number of projects that have that as a requirement is probably barely in triple digits globally.
In any case, I've had the same experience with only two programs - CouchDB and Redis.