
Redis (consistent hashing based) sharding made easy - emre-yilmaz
http://emre.github.io/redis-router/
======
kijin
> _Q: What about data invalidation if I move servers, change the config etc._

> _A: It 's minimum. At least better than: Node = Hash(key) MOD N_

Seriously?

Redis is not just a cache. Let me repeat that. Redis is not just a cache. Lots
of people use Redis as a primary data store. If Redis is your primary data
store, you can't afford to have any of your keys invalidated, ever. "Minimum"
might be good enough when you're talking to Memcached, but it's not enough
when you're taking to Redis.

Consistent hashing has been a solved problem for a long time if you can afford
to misplace a few keys from time to time, which will happen every time a node
is added to or removed from the pool. There are Redis client libraries
implementing consistent hashing in nearly every language, and most of them
work just fine if you use Redis as a cache or if your pool size never changes.
Solving this problem again isn't particularly interesting.

What really would be interesting is a server that sits between Redis nodes and
clients and intelligently moves keys from one node to another in the
background so that no key is ever invalidated even when the pool size changes.
I believe that project is called Redis Cluster or something. That might be
worth an extra TCP connection. But right now, I'm not seeing why I should
prefer Redis-Router to any tried-and-true client library with built-in lossy
consistent hashing.

~~~
mmcnickle
Nowhere is it suggested that this is suitable for uses where you need
durability. In fact if you are looking for a library for consistent hashing,
I'd assume you'd understand the drawbacks (invalidation on resize).

Have you a point other than ranting that this isn't a unicorn? It really
bothers me that this is the top comment at the moment.

~~~
kijin
My point is that I don't see much reason to run a server the sole purpose of
which is to perform consistent hashing. I have no complaints about using the
client library as part of a Python program. _Actually, apologies to OP because
I didn 't realize that this is like the first Redis client library in Python
with consistent hashing built in._

But a standalone server to talk to other languages such as PHP? Why would I
want to add yet another TCP connection, yet another point of failure, and yet
another protocol to my software stack when PHP's very own Predis, for example,
does consistent hashing just fine while talking to Redis directly? Many other
languages like Ruby, Java, and C#/.NET also have Redis clients that support
consistent hashing. Sorry Pythonists, everyone else has been having fun with
turnkey consistent hashing for 3-4 years already.

Lossless sharding came to mind immediately as a possible benefit of a middle
layer, because Redis users have been asking for something like that for ages.
When someone says "Redis" and "sharding" in the same breath, I'm sure a lot of
people will think "Finally, a way to distribute my larger-than-RAM dataset
across multiple machines!" After all, durability is a big deal when it comes
to Redis. I'm sorry if my comment came across as rude, but I was honestly
quite disappointed because my expectations were probably too high.

------
antirez
See also: Twitter's Twemproxy
([http://github.com/twitter/twemproxy](http://github.com/twitter/twemproxy))

~~~
emre-yilmaz
another good alternative:
[https://github.com/disqus/nydus](https://github.com/disqus/nydus)

------
unwind
The "abstract" text (copied from the Wikipedia page) lacks the variable names,
making the text incomprehensible:

 _When a hash table is resized and consistent hashing is used, only keys need
to be remapped on average, where is the number of keys [...]_

it should end "only K/n keys need to be remapped on average, where K is the
number of keys [...]". Just thought I'd point this out so it could get fixed,
should only take a few seconds to edit.

~~~
emre-yilmaz
thanks! [https://github.com/emre/redis-
router/commit/0d9307436e4c3b69...](https://github.com/emre/redis-
router/commit/0d9307436e4c3b69c257943bd6df1883e742fbc7)

------
peterhunt
Somewhat off-topic, but just wondering: is anyone using consistent hashing for
their DB masters? Every setup I've seen uses a manual sharding table.

~~~
nasalgoat
Yes, and it works fine as long as you accept that data will need to be
migrated at some point when you run out of memory and/or instances.

To put this off as long as possible, you can max out the memory in your boxes
and use a large number of servers - 100s in some cases - running on those
servers, hopefully with less than 1 per CPU core to maximize performance.

Then if you begin maxing out memory, you can easily split those servers onto
their own hardware. My calculations showed that would allow scaling to
trillions of keys without problems.

------
janerik
The TCP server example shows a different protocol than the default redis
protocol. (As I could not install it correctly I couldn't try it.) If this is
the case current redis client libraries cannot be used.

~~~
antirez
Hello,

Redis accepts both a structured protocol, and a simple space-separated tokens
protocol (called the inline protocol) that helps sysadmins to avoid a disaster
just because they lack a proper redis-cli but are in need to run a Redis
command ASAP. Perhaps this proxy is also supporting both forms.

You can do that with any modern enough Redis server:

    
    
        Escape character is '^]'.
        ping
        +PONG
        set foo bar
        +OK
        get foo
        $3
        bar

~~~
janerik
I know of that simple protocol. But the responses in the example telnet
session on that side return "True" and "13" where it should be "+OK" and ":13"
with the redis protocol.

~~~
antirez
good point...

------
stephen_mcd
It seems like you've tried to support the set/store (sinterstore, sdiffstore,
etc) methods across multiple instances in a non-atomic way - won't that lead
to potential data loss?

~~~
emre-yilmaz
Yes, these methods are dangerous to use.

I need a locking mechanism to do it safely but this will wait until I need it
most likely.

