Hacker News new | past | comments | ask | show | jobs | submit login

Dang.

I say this frequently both online and when discussing system design with newer devs, but will repeat here: of all the production issues I've debugged, the culprit has has never been redis. In fact, redis has been a critical piece of achieving cost-effective scaling. It is one of only two pieces of software (along with postgres) that I blindly recommend without any caveats. From following along here and on your blog about how you approach things and think about the software, I think its clear that you and your vision for the project are a large factor of why it has been so reliable.

Thank you antirez!






Redis is simple. Good. Has a nice api. Has good libraries. Single threadsed. Extremely hard to scale. Impossibly difficult to cluster in containers because it uses hard coded ips to address nodes. Performs poorly with large payloads. Doesn't run on windows properly. Is extremely expensive as a hosted service (orders of magnitude in some cases, eg. azure).

You'll love it until you don't.

The scaling and clustering story is not nearly as nice as the quick start.

It's definitely worth recommending... with caveats.


Redis works well until the point it doesn't - like a lot of other tools.

A half-ton pickup truck works well until you need to haul bigger loads. At some point, you either need to haul smaller loads or change to a different, bigger truck.


> Is extremely expensive as a hosted service (orders of magnitude in some cases, eg. azure).

That seems to me an argument that should be pointed at cloud providers rather than Redis itself ;)


The point being that clustering redis is actually very difficult to do at all, never mind in a way that scales.

This would be a significant down side to using redis at all, except you can get away with not caring if you out source the problem with your credit card.


Redis is extremely simple to scale. Treat each instance in a cluster as wholly independent from other instances, and build a thin layer on top to implement whatever sharding and availability requirements your use case requires. I've done it many times and it's always been simple and worked brilliantly. Redis's predictability makes it a joy to operate.

We've done just that. Some folks at work wanted sentinel in front as a cluster management layer, but our particular use case did not work well there. Instead, we have some "logical clusters" of 3 nodes that we replicate reads and writes to, sharded to (currently) six clusters. Some logic around quorum for ensuring writes make it to at least 2/3 nodes in a given cluster, with some optimizations for reads sometimes only needing 1 node to respond. We had to do a bunch of tweaking around state and memory management, and all the details were in the docs, which was great. It does exactly what we need it to do, but it is more expensive to run, and we've not figured out a way to do this right in k8s. We don't care for the cost model going forward for when we want this model serving 10x traffic (which is not pie in the sky, it is known actual volume this solution would need to support). For 10x traffic, we'd be looking at 10x node count. At nearly 200 redis nodes, that can get expensive, esp. if we want to move to a managed solution. Anyway, not sure where I was going with this. Yes, redis can scale. Yes it can stay performant. At some point, it just becomes a lot to manage though and costs can add up. We are going to be designing a new solution to keep costs low and performance high.

@sagichmal "... scale ... implement ... sharding ..."

@sethammons "... sharded to (currently) six clusters ..."

I wonder, when did you start sharding? How many GB memory did your previously single-machine-Redis-instance use, before it was time to shard? How much memory does each "sharded node" have?

(It wasn't the CPU (Redis single threaded) that forced you to shard? But because you needed more memory? (Or sth else?))

> "nearly 200 redis nodes"

How much memory in each node, if I may ask?


> When did you start sharding? How many GB memory did your previously single-machine-Redis-instance use . . .

Sharding is always something you do on day 1 for every project, because a single-machine-Redis-instance is a SPOF and won't pass even the most basic operational readiness checklist.

> (capacity planning questions)

The answers to these questions don't generalize, they depend on your workload. You should figure out some approximation of your data types and request profiles, and use those to get a rough understanding of what one Redis server on a given machine class is capable of delivering. This usually means applying a few different classes of read/write loads against a small cluster at steadily increasing RPS, and documenting how CPU, memory, and latency characteristics change. From these numbers it's possible to derive a capacity plan.

The last time I did this, for my workload and machine class, one Redis instance (one core) delivered 250-500k RPS, invariant to memory used. We'd use the conservative end of that range, combined with the RPS and data set growth rates we predicted, to provision for ~12 months of growth. Operationally, we would deploy (cores-1) or (cores-2) instances per host (I forget exactly) on 32- and 64-core machines. I think they had like 64-128G of RAM, and we made sure to leave enough memory overhead so the AOF or whatever persistence option wouldn't lock up the box. But even the choices of what class of machines to use is a function of your use case, if you have really large dataset with relatively non-costly (CPU) operations, you want a totally different machine profile than a relatively small dataset with complex operations. Availability SLOs also factor in.

All of this is basic operational stuff, and with Redis the answers are pretty well-understood and highly predictable. Completely opposite to systems like Elasticsearch, which was a total nightmare to predict, provision for, and operate.


> "Sharding is always something you do on day 1 for every project ... SPOF ..."

Cannot agree with that. If one does that or not, would depend on things like Service Level Agreements (SLA) — and one can have a pretty high SLA uptime %, without sharding, e.g. if there's an underlying pretty stable hosting provider that live migrates if there's a hardware failure.

Thanks for writing about the last time you used Redis. Interesting to hear that, in that case, the machines had sth like 64 – 128 RAM, and 250k – 500k RPS. Yes I agree that I'd need to benchmark and think about what type of machine(s) to use (some time later — a bit too early for that now). Sounds as if you are / were a pretty large company / project, needing that much memory and machines :- )


Nope.

If you have one instance you better be damn sure that downtime doesn’t cause an an outage. IE. no redis means services still run.

The “it’s fine, the SLA covers outages” is just a) laziness and b) negligence.

You don’t do that with web servers, you don’t do it with databases. You don’t do it with your redis instance either.

...well, I suppose some people will do the it anyway; but you get what you get if you do.

If a major outage gets you fired, you have no o w to blame but yourself.


> Cannot agree with that. If one does that or not, would depend on things like Service Level Agreements (SLA) — and one can have a pretty high SLA uptime %, without sharding, e.g. if there's an underlying pretty stable hosting provider that live migrates if there's a hardware failure.

Sorry, no. A single instance is never acceptable from any risk perspective, no matter what guarantees the hosting provider claims.


Extremely hard to scale in what sense? A single redis server running on fairly commodity hardware can serve some pretty intense loads (10s of thousands of ops/sec). Once "scale" bigger than that is an issue, companies should already have in place the development staff that will ensure scaling up is done in a sane way, i.e., that have enough sense and/or experience to choose the correct tools for the job.

Redis is one such tool--its clustering "story" may not be ideal but even a small, minimal cluster will be more than sufficient for any use-case that a distributed memory cache is fit for at a scale that will cover 99% of business' needs.


> Extremely hard to scale in what sense? A single redis server running on fairly commodity hardware can serve some pretty intense loads (10s of thousands of ops/sec).

~90k/sec on a single core of L5520. Scaling Redis is a premature optimization for 99.99% of the use cases. While it may be sexy to talk about millions of ops per second that one's project does in reality the number of projects that have that as a requirement is probably barely in triple digits globally.


I think you are using it wrong. It is not meant to be run in a K8S cluster woth dynamic IPs in multiple instances. You are supposed to deploy VM mesh of fixed IPs as database store. Moreover, it is design-dependent. You can live with just multiple master-slave pairs and have your data implicitly sharded by something, e.g. by user country or continent. It's sad that some people still think "perfect db" exists: partition-tolerant, high-available, durable, horizontal-scalable, low-latency, acid-compiant, open-source database dosn't exist and never will (see the CAP theorem, see the broken guarantees found by Jepsen[1]). Live with it and design your architecture appropriately.

[1] https://jepsen.io/


I don’t think most people need a fully partition tolerant solution; even at very high scale and with high SLAs quorum based solutions are used successfully. If you remove strict partition tolerance from your above list there do exist such databases. And I’d argue it’s much easier to develop against such a database and you’re more likely to actually preserve its guarantees in your application than running an ad-hoc cluster of Redis instances with a home-spun clustering scheme where your application logic is heavily implicated in maintaining consistency.

I only used it once, but it's clustering story was bad, and I never really understood the niche it filled between Memcache and databases.

better memory model over Memcache. If one of our memcached servers dies, that data is gone. Redis can write to disk, optionally. We mostly use redis and memcache as a cache to save load from the db. Even with dozens and dozens of dedicated read hosts, we can knock over our dbs if we are not caching data.

There are also memcached-protocol implementations with persistence (MemcacheDB, couchdb server).

I read it twice trying to figure out why you were invoking @dang.

In any case, I've had the same experience with only two programs - CouchDB and Redis.



It's been a word for a long time ... we used that when we were kids (at least in earshot of our parents). Searching for "dang" in Algolia results in (by a quick count) more than 80% of the results being for @dang (the person/handle) rather than the perjorative use.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: