> A: It's minimum. At least better than: Node = Hash(key) MOD N
Redis is not just a cache. Let me repeat that. Redis is not just a cache. Lots of people use Redis as a primary data store. If Redis is your primary data store, you can't afford to have any of your keys invalidated, ever. "Minimum" might be good enough when you're talking to Memcached, but it's not enough when you're taking to Redis.
Consistent hashing has been a solved problem for a long time if you can afford to misplace a few keys from time to time, which will happen every time a node is added to or removed from the pool. There are Redis client libraries implementing consistent hashing in nearly every language, and most of them work just fine if you use Redis as a cache or if your pool size never changes. Solving this problem again isn't particularly interesting.
What really would be interesting is a server that sits between Redis nodes and clients and intelligently moves keys from one node to another in the background so that no key is ever invalidated even when the pool size changes. I believe that project is called Redis Cluster or something. That might be worth an extra TCP connection. But right now, I'm not seeing why I should prefer Redis-Router to any tried-and-true client library with built-in lossy consistent hashing.
"minimum" is enough for the people who use redis for cache, since a lot of people use it for just, plain, simple, old "caching".
Have you a point other than ranting that this isn't a unicorn? It really bothers me that this is the top comment at the moment.
But a standalone server to talk to other languages such as PHP? Why would I want to add yet another TCP connection, yet another point of failure, and yet another protocol to my software stack when PHP's very own Predis, for example, does consistent hashing just fine while talking to Redis directly? Many other languages like Ruby, Java, and C#/.NET also have Redis clients that support consistent hashing. Sorry Pythonists, everyone else has been having fun with turnkey consistent hashing for 3-4 years already.
Lossless sharding came to mind immediately as a possible benefit of a middle layer, because Redis users have been asking for something like that for ages. When someone says "Redis" and "sharding" in the same breath, I'm sure a lot of people will think "Finally, a way to distribute my larger-than-RAM dataset across multiple machines!" After all, durability is a big deal when it comes to Redis. I'm sorry if my comment came across as rude, but I was honestly quite disappointed because my expectations were probably too high.
Unfortunately it's the weakest part of mongodb.
(for the record = HASH(key) MOD N is not 'consistent hashing'.)
redis-router is just a library that wraps redis-py with consistent-hashing. nothing more.
I use it in production heavily since it solves the client-side sharding problem for me. When I wrote this, there was no trustable client library comes with consistent-hashing.
I don't know what do you want to see actually. "saving the world" is a todo though. wait for the new releases. you might like it. :)
Nydus uses the same Ketama algorithm that you use, but I suppose it might not have had that feature when you started to work on Redis Router.
Also, pretty much every up-to-date client library in nearly every popular language, such as PHP, Ruby, C#, and Java.
I'm not complaining about the fact that you created a neat Python library for Redis. My complaint is about the standalone server feature. I don't see why it's needed because nearly every popular language has native libraries that implement consistent hashing (not HASH MOD N), often using the exact same algorithm you're using. So I went looking for bells and whistles that might justify the standalone server, and unfortunately I found none.
When a hash table is resized and consistent hashing is used, only keys need to be remapped on average, where is the number of keys [...]
it should end "only K/n keys need to be remapped on average, where K is the number of keys [...]". Just thought I'd point this out so it could get fixed, should only take a few seconds to edit.
To put this off as long as possible, you can max out the memory in your boxes and use a large number of servers - 100s in some cases - running on those servers, hopefully with less than 1 per CPU core to maximize performance.
Then if you begin maxing out memory, you can easily split those servers onto their own hardware. My calculations showed that would allow scaling to trillions of keys without problems.
I had a quick look but I don't see an easy way to have redis-py pass forward the raw return values, though I think it could be done with some effort. So this isn't a drop in proxy for a redis server just yet.
Edit: referring to the TCP/HTTP Servers above, the library itself can be used as a drop in replacement for redis-py in python.
Redis accepts both a structured protocol, and a simple space-separated tokens protocol (called the inline protocol) that helps sysadmins to avoid a disaster just because they lack a proper redis-cli but are in need to run a Redis command ASAP. Perhaps this proxy is also supporting both forms.
You can do that with any modern enough Redis server:
Escape character is '^]'.
set foo bar
I use redis-router in production as a library. some of my non-pythoneer friends asked to use it in PHP, so I made a little wrapper with gevent.
It's probably not compatible with current clients. Needs testing :)
I re-read the README now and it explicitely states that libketama should be installed from the git repository. I'll try it again later.
I need a locking mechanism to do it safely but this will wait until I need it most likely.