
Redis Cluster Tutorial - anuragramdasan
http://redis.io/topics/cluster-tutorial
======
benarent
Great post. I uploaded Salvatore keynote from Redis Conf 2013 today.
[http://blog.togo.io/redisconf/a-short-term-plan-for-redis-
by...](http://blog.togo.io/redisconf/a-short-term-plan-for-redis-by-antirez/)

~~~
anuragramdasan
Thanks for the link. I have recently started exploring too much into redis at
a source code level and its incredibly fascinating.

~~~
csmuk
It's also beautiful code. It's a great project to learn from.

------
ransom1538
I have a few serious concerns: Suppose have nodes A,B,C,D running with slaves
and you have a user u1 start retrieving and storing data. The odds will be
high that the user u1 stores keys on all nodes A,B,C,D since all the keys are
based on a numerical hashed slot.

1) The user u1 will end up connecting to all nodes. Thus as you scale adding
an "E" you always end up connecting to each cluster having the same amount of
connections. Example, let's say A,B,C,D have 5k connections open and you add
"E". Now "E" has 5k connections. Now you are blocked by the number of requests
per second and no amount of hardware can save you.

2) Let's say B and B1 die. (It happens.) Now your system is completely out.
All 5k connections are blocked. It would be nice if A,C,D,E continue to run.

I usually get around things like this using an index server between user u1
and A,B,C,D. When u1 starts retrieving keys the system needs to be designed
such that u1 only connects to single node for it's keys.

How do you get around 1,2? Do other people use index servers?

~~~
seiji
_such that u1 only connects to single node for it 's keys. Do other people use
index servers?_

The redis cluster client in your programming language will know the entire
current redis cluster state. The redis cluster client will automatically place
(and read) keys directly to (and from) the correct instances. The direct
reading also reduces latency found in some other cluster implementations where
you _only_ have the option to proxy your requests. You as an application
programmer never have to worry about multiple servers.

When your redis cluster client connects to the cluster, the redis client reads
the current key slot to server map, and uses that for every action (with
updates as necessary as failures/promotions happen).

 _Let 's say B and B1 die._

Good example, _but_ you should probably be running with at a minimum B, B1,
and B2 (and three entire copies of A, C, D as well). You have to design your
deployments in concert with how your underlying architecture works. Redis
cluster instances _do not_ have to take up an entire machine. Run a couple
masters and a couple replicas per host. Running a cluster with 4 masters means
you should run with 8 replicas (minimum). See the math section of
[https://github.com/mattsta/redis-cluster-
playground](https://github.com/mattsta/redis-cluster-playground) for more
examples.

It's worth stepping back and defining "Cluster" too. A "database cluster" has
no meaning by itself. It doesn't specify redundancy, how availability works,
or how replication works. A Riak cluster has nothing to do with a Redis
cluster. You have to learn the operations on each type as you go.

~~~
antirez
About that, redis-trib, the Redis Cluster config utility, while still pretty
basic is already able to allocate masters and slaves in different IPs, so if
you give it six nodes like 10.0.0.1:6379 10.0.0.1:6380 10.0.0.2:6379
10.0.0.2:6380 10.0.0.3:6379 10.0.0.3:6380, you can expect a master for each IP
address, and replicas always allocated in nodes where there is NOT its master.

------
stiff
If Redis would now just use names for databases instead of numbers and get rid
of the limit of number of databases... Managing multiple application instances
(production, test, staging for multiple countries) using a single shared Redis
instance but requiring separate databases is a huge pain.

~~~
guywithabike
Numbered databases are (de facto) deprecated. Run separate processes, instead.

(Redis is single-threaded, anyways. You probably don't want to run production
and staging on the same Redis process.)

~~~
dabeeeenster
Still not a suitable solution IMHO. Now you have to deal with multiple ports,
startup scripts etc etc

Every other database on the planet has database names pretty much.

I LOVE Redis but this is just a bonkers limitation IMHO...

~~~
janerik
antirez once commented on this: [https://groups.google.com/forum/#!msg/redis-
db/vS5wX8X4Cjg/8...](https://groups.google.com/forum/#!msg/redis-
db/vS5wX8X4Cjg/8ounBXitG4sJ)

~~~
stiff
I have seen this explanation and frankly I find it ridiculous. Multiple
databases as a dictionary layer?

------
heterogenic
One thing I'd love to see (perhaps in a future iteration?) is a way to have
overlapping ranges for redundancy, instead of mirrors.

The big benefit to this is when balancing write-heavy workloads, when the
slave would not be getting its share of the total load.

I believe this could be accomplished by running a second DB on each machine as
a slave to the next in the cycle, and that's what I'll probably try when we
scale up to needing a cluster. But having it be a supported scenario (or
facilitated somehow) would make me feel much better.

( _edit_ : Just a feature request of course, exciting to see this release!
Thanks for all your hard work antirez!)

------
taf2
I wish redis had synchronous write option to guarantee that when a write
response is received that all nodes in the cluster have also received a copy.
Even if this was just an ack that data is replicated into memory instead of
disk. Sure there are cases where you'd rather operate with async, but really I
would prefer to put haproxy in front of 3 - 6 redis nodes and let it load
balance both reads and writes similar to galera cluster with keepalived and a
vip I would not need to be concerned with split brain assuming synchronous
writes...

~~~
seiji
That's in the plan, it's just not implemented yet. (It's implied under the
_Note: Redis Cluster in the future will allow users to perform synchronous
writes when absolutely needed._ )

Something like:

    
    
       SYNC 3
       SET key value
       [replies with success only after it's ACK'd on three cluster instances.]

~~~
taf2
I would definitely take slower write speed to have this extra assurance of
high availability with data consistency... as it is today - it's pretty hard
operate redis in a HA way even with sentinel... I'm sure others have figured
it out but for me the best I have is a 30 second window when sentinel is
switching the backup slave to master and either writes are lost haproxy can
hold but it works sometimes... I can imagine with synchronous replication we'd
have slower write speed but at least failover could be sub-second and possibly
zero downtime... using a vip/keepalived - Still redis is such a good
datastore, it's difficult to imagine a world/service without it. for what it's
worth this looks to be the best setup i've found:
[http://failshell.io/sensu/high-availability-
sensu/](http://failshell.io/sensu/high-availability-sensu/) would be
interested if others have other setups that possibly work better...

~~~
antirez
Hey, going a bit off topic about Sentinel, this should not happen. Sentinel +
Redis + Clients as a distributed system is not able to guarantee strong
consistency with arbitrary partitions clearly, but in the scenario you
describe, that is, the master crashes or is partitioned away alone, the window
should be like a fraction of a millisecond (the replication link latency) most
of the times.

EDIT: to clarify:

1) Sentinels will tell clients master is A.

2) A fails.

3) Sentinels will still tell it is A before failover.

4) Failover happens.

5) Sentinels will finally tell master is B.

~~~
taf2
could be something that haproxy is causing - when switching masters... maybe I
need a script that handles switching the vip when sentinel detects failure in
master instead of haproxy with a backup... btw - thank you for redis.

~~~
seiji
You shouldn't need haproxy in the loop at all.

If your redis client is Sentinel-aware, you ask your client directly "give me
the redis master server" and it asks Sentinel which server is master. The only
ip address involved for your configuration is the address of the sentinel(s).

haproxy will take 30 seconds to detect your failure. Sentiel will notify you
immediately. Sentinel is the one determining the exact state of the master
election, and it'll be very chatty when a new master replaces the old one.

------
dberg
seems like redis cluster is about turning redis into riak

~~~
antirez
Redis and Riak are from the point of view of distributed systems extremely
different beasts. Both are useful in my opinion but the semantics is very
different since Riak is basically an implementation of Amazon Dynamo paper.
Dynamo-alike stores like Cassandra and Riak are key -> small_value stores that
are available even under severe partitions, and where there is not a single
node responsible for a given key. Application-assisted merge operations are
used in order to merge values later when the network partition heals.

Redis have instead a master - replicas setup that can't usually resist severe
partitions. In general only the side of the partition with the majority of
masters survive _assuming_ that there is at least a slave for every master
that is not in the majority partition.

In practical terms this means that a few nodes going away usually result in
the cluster being still available (unless you are really unlucky and all the
replicas for a given hash slot go down at the same time), but that under
severe partitions the system is likely unavailable.

The tradeoff makes Redis cluster able to work without help from the
application, and without merge operations: this is very desirable because of
the kind of complex-to-merge and big (millions of elements are common) data
structures that a _single_ key can hold in Redis.

