1. Tooling to make it simpler to start a Redis Cluster in a single host if all you want is to leverage all the cores.
2. Smartness in the AOF rewriting so that instances on the same host will do a better use of the disk.
3. A caching mode for Redis Cluster. This is useful for deployments where most accesses are idempotent and you have for instance a cluster of 10 physical hosts running, for instance, 32 Redis instances each. When an host is no longer reachable we don't want failover in that case, but just reassignment of hash slots to the other masters. Think at consistent hashing but on the server side.
Two years ago I had this private branch that I wrote in a few days that threaded Redis, with N event loops on N threads. After realizing that it worked and scaled, I considered how many bugs such Redis would have, how harder it would be to add features, and how incompatible is to have a single huge instance with Redis persistence method. In that moment and after consulting with other engineers I decided that for a few years our strategy will be Redis Cluster. Maybe I'll change idea in the future? Maybe, but first let's explore what can be done with a shared-nothing architecture that keeps the promise of continuing to deliver software that hardly crashes ever.
EDIT: Obviously the proxy itself is multithreaded :-D
EDIT2: Btw jdsully already sent a PR to the original Redis project, so I guess a smart developer exposed to the Redis core is great anyway!
One of my main reasons for doing multithreading, and FLASH in the first place was to make Redis work well for much larger value sizes.
I really think we have different use cases in mind.
I understand the point about Redis on Flash. Redis Labs is also doing a lot of things with this model. I believe it is super useful for some folks. Yet I believe what makes Redis "Redis" is that there are no tricks: it runs the same always whatever the access pattern is. I want to stress this part over the ability to work with bigger data sets. Moreover I believe that persistent RAM with huge amounts is in the horizont now.
I've always thought that one of the sensible basic data structures that Redis could support operation on is an (int->int) digraph.
There are a lot of things you can build on top of "digraphs as a toolkit object"—just see the lengths people go to to get PostGIS+pg_routing installed in Postgres, just to do some random digraph queries to their (non-geographic) RDBMS data.
IMHO digraphs very similar to Redis Streams in that sense: they're something that might seem like a single high-level "use-case" at first glance, but which is actually a fundamental primitive for many use-cases.
However, unlike a hashmap, list, set, etc., a digraph is necessarily—if you want graphwise computations to run efficiently—one large in-memory object with lots of things holding mutual references to one-another that can't just be updated concurrently in isolation. But, it is likely that people would demand multiple writers be able to write to disjoint "areas" of the graph in parallel, rather than having to linearize those writes just in case they end up touching overlapping parts of the graph data.
If I understand Redis Streams, it was created because there's no real need for things like Kafka to do their own data-structure persistence. Data-structure persistence is not the "comparative advantage" of a thing like Kafka; network message-passing with queuing semantics is. High-level infrastructure pieces like Kafka should rely on lower-level infrastructure pieces like Redis for their data persistence, and then they can just do what they do best. Is that accurate?
If so, then I would argue the same thing applies to graph databases: persisting and operating on data-structures is not really their job; it's just something they do because there isn't anything to delegate that work to. Redis could be such a thing. But not if it's single-threaded.
This is, just from the top of my head, a use-case (set of use-cases?) that needs threading (or at least "optimistically threaded, with fallback to linearization") for writes. And it's not really all that niche, is it? Not any more than message queues are niche.
What I did a year ago was just make Redis cluster able to do Lua scripting transactions on arbitrary keys across the cluster. With rollbacks, like my 2015 patch.
Instead of reinventing a bad wheel, you could have something better.
In 2015 I released a patch to allow for transactions with rollbacks in Lua scripts; it used dump/restore to snapshot keys. This idea was extended to be the cluster solution I released last year. The primary compromise is that as implemented, it more or less relied on the equivalent of a temporary key migration. You want a transaction on server X to involve keys from servers X, Y, Z? You migrate the keys to X explicitly, run your transaction, then migrate them back.
Caveat: big keys are slow to snapshot, migrate, prepare for rollback, etc.
I've got incoming changes to my own fork that allow for fast state transfer for large structures (useful on multiple machines, or just heavily threaded + transactions), more or less eliminating the caveats. But that's also not yet publicly available, and I've spent 5-6 of the last 7 weeks sick (maybe 4 full workdays in that period), so I don't have an ETA on any of it.
A really nice contrast to the typical hand-waving, defensiveness, etc, I get from a lot of vendors, product owners, etc.
The beauty of open source is choice, and getting to try new things.
Possibly not, but it's a very good reason not to use multithreaded software, when a simple single-threaded alternative is present.
It's a very practical decision in my case: I'd rather rely on something that I know eliminates entire categories of bugs, if performance is acceptable (it is for me). Also, if it breaks, I can dive in, understand it and fix it. Simplicity should not be underrated.
These are smart people, they've written a great program, but they didn't make the right locking decisions.
The approach antirez mentioned in a sibling thread of a multithreaded proxy process that communicates to multiple single threaded instances sounds much easier to implement without ending up with data in the wrong places. Although, even that doesn't necessarily need to be multithreaded either. A master process doing healtchecking and sending status to the worker processes, that are using a eventloop would probably work just as well.
"KeyDB works by running the normal Redis event loop on multiple threads. Network IO, and query parsing are done concurrently. Each connection is assigned a thread on accept(). Access to the core hash table is guarded by spinlock. Because the hashtable access is extremely fast this lock has low contention. Transactions hold the lock for the duration of the EXEC command. Modules work in concert with the GIL which is only acquired when all server threads are paused. This maintains the atomicity gurantees modules expect."
This is probably the most important part of the fork.
InnoDB’s buffer pool documentation is instructive: https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.h...
That's unlikely unless 1) your hosts are seriously underpowered and 2) you have no per-consumer quotas set
Second, even if quotas could be used - and there are plenty of sites where they cannot - there’s still a broker-replacement scenario that needs to be accounted for. Most folks don’t want to intentionally impede the recovery rate of a failed broker.
That said, Redis has always been faster than I've needed, and that's leaning on it pretty heavily, so I kind of agree with antirez that adding this complexity may not come with much benefit. Will be interesting to see where this goes.
> Access to the core hash table is guarded by spinlock. Because the hashtable access is extremely fast this lock has low contention. Transactions hold the lock for the duration of the EXEC command.
FLASH support was the first step, and we’ve removed a bunch of unnecessary copies along the way but there is still more work to do.
Also not that the benchmarks shown seem to be an apples and oranges comparison. If you want to compare the performance if redis vs a multithreaded fork the proper comparison involves running a redis cluster (one node per core) versus the multithreaded single node.
All else being equal I'd prefer a multithreaded approach as KeyDB is pursuing but that's just because it makes it easier to more easily utilise system resources of a single (virtual) machine.
It’s offloading complexity from the developers to the users which I don’t think is the right approach.
Also clusters have limitations over single instance redis. You’ll also get more Queries/GB with a Multithreaded approach.
Have you benchmarked cluster VS locks and threads approach?
Redis's codebase is a little too readable compared to the complexity of the things it does. This isn't.
This is a lock free key value store in a single file - it doesn't have the network protocol, but any threads dealing with the network IO could use it without doing anything differently.
"Well, that’s the plan: I/O threading is not going to happen in Redis AFAIK, because after much consideration I think it’s a lot of complexity without a good reason. Many Redis setups are network or memory bound actually. Additionally I really believe in a share-nothing setup, so the way I want to scale Redis is by improving the support for multiple Redis instances to be executed in the same host, especially via Redis Cluster."
The RAM would have to be a multiple of the dataset, because its shared-nothing, running right into one of the two common limits, right?
Multithreading eliminates 99.8% of snapshot overhead: https://www.slideshare.net/secret/sLTnWqcPV7G0HK
Can also help reduce snapshot create and load time by 20-70% or so: https://www.structd.com/
But hey, I'm right behind you :)
Eg: Running KeyDB on AWS, the EC2 instance fails and is replaced by another one running the same config. Will it read the last S3 dump on startup?
Hardcoding "aws" in the core seems like an odd choice though. Wouldn't it be better to make it agnostic and provide some sort of a trigger where a an external script or util to handle backups? That is, why S3 and why not any other service?
I mostly use slaves for recovery with the database dumps as a last resort. So I overlooked that.
Do you have any special config to sync to replica before accepting incoming connections in a straightforward manner?
Its currently in the unstable branch, once it gets shaken out a bit I'll release 0.9.3 with it included.
That's the TL;DR; of "What tradeoffs are required?"
From what I can tell, they extracted an embedded redis DB and wrapped it in a multi-threaded protocol layer. So execution and behavior of Redis is identical, but all the "not actually a DB" parts are faster and more concurrent.
Currently network IO and query parsing are multithreaded. Access to the core hash table is under lock. However network IO and query parsing take up most of the time in Redis queries. Hash tables are pretty fast. The goal is to add a reader/writer lock here in the future.
The lock itself is also interesting. We used a ticket lock which is a variant of a spin lock. This ensures fair locking and reduces the variance in latency dramatically. Standard POSIX locks are too slow.
This is an example that is completely lock free.