

Automatic Redis Failover for Ruby - ryanlecompte
https://github.com/ryanlecompte/redis_failover
I just released redis_failover, a new Ruby gem that provides automatic Redis failover support for Ruby. From the project README:<p>Redis Failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not provide an automatic failover capability when configured for master/slave replication. When the master node dies, a new master must be manually brought online and assigned as the slave's new master. This manual switch-over is not desirable in high traffic sites where Redis is a critical part of the overall architecture. The existing standard Redis client for Ruby also only supports configuration for a single Redis server. When using master/slave replication, it is desirable to have all writes go to the master, and all reads go to one of the N configured slaves.<p>This gem attempts to address both the server and client problems. A redis failover server runs as a background daemon and monitors all of your configured master/slave nodes. When the server starts up, it automatically discovers who is the master and who are the slaves. Watchers are setup for each of the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unreachable" state. If the node that went offline was the master, then one of the slaves will be promoted as the new master. All existing slaves will be automatically reconfigured to point to the new master for replication. All nodes marked as unreachable will be periodically checked to see if they have been brought back online. If so, the newly reachable nodes will be configured as slaves and brought back into the list of live servers. Note that detection of a node going down should be nearly instantaneous, since the mechanism used to keep tabs on a node is via a blocking Redis BLPOP call (no polling). This call fails nearly immediately when the node actually goes offline.<p>This gem provides a RedisFailover::Client wrapper that is master/slave aware. The client is configured with a single host/port pair that points to redis failover server. The client will automatically connect to the server to find out the current state of the world (i.e., who's the current master and who are the current slaves). The client also acts as a load balancer in that it will automatically dispatch Redis read operations to one of N slaves, and Redis write operations to the master. If it fails to communicate with any node, it will go back and ask the server for the current list of available servers, and then optionally retry the operation.
======
antirez
In case you wonder, after Redis 2.6 RC1 this is my #1 commitment, to provide a
standard Redis failover tool. Note that Redis Cluster that will be also be one
of the big focus after 2.6 is not the real fix for this: many users just have
two instances, one master and a slave for failover, or multiple instances that
are coupled this way and are conceptually single servers. Often they don't
actually need Redis Cluster, or even _can not use_ Redis Cluster (because does
not implement the full Redis API, but a subset). So what happens? All this
users have to invent an HA system for Redis again and again.

It's still a work in progress but the idea is that the standard Redis failover
will be based on a stand-alone daemon that is called redis-sentinel that you
can place at different positions in your network. It talks with other redis-
sentinels, and if there are the right condition the failover is performed. So
there is no proxy or alike, nor the server itself will be touched.

------
ryanlecompte
FYI, redis_failover has now been rewritten to sit on top of ZooKeeper to deal
with network partitions, stability, and data consistency. From the README:

redis_failover attempts to provides a full automatic master/slave failover
solution for Ruby. Redis does not provide an automatic failover capability
when configured for master/slave replication. When the master node dies, a new
master must be manually brought online and assigned as the slave's new master.
This manual switch-over is not desirable in high traffic sites where Redis is
a critical part of the overall architecture. The existing standard Redis
client for Ruby also only supports configuration for a single Redis server.
When using master/slave replication, it is desirable to have all writes go to
the master, and all reads go to one of the N configured slaves.

This gem attempts to address these failover scenarios. A redis failover Node
Manager daemon runs as a background process and monitors all of your
configured master/slave nodes. When the daemon starts up, it automatically
discovers the current master/slaves. Background watchers are setup for each of
the redis nodes. As soon as a node is detected as being offline, it will be
moved to an "unavailable" state. If the node that went offline was the master,
then one of the slaves will be promoted as the new master. All existing slaves
will be automatically reconfigured to point to the new master for replication.
All nodes marked as unavailable will be periodically checked to see if they
have been brought back online. If so, the newly available nodes will be
configured as slaves and brought back into the list of available nodes. Note
that detection of a node going down should be nearly instantaneous, since the
mechanism used to keep tabs on a node is via a blocking Redis BLPOP call (no
polling). This call fails nearly immediately when the node actually goes
offline. To avoid false positives (i.e., intermittent flaky network
interruption), the Node Manager will only mark a node as unavailable if it
fails to communicate with it 3 times (this is configurable via --max-failures,
see configuration options below).

This gem provides a RedisFailover::Client wrapper that is master/slave aware.
The client is configured with a list of ZooKeeper servers. The client will
automatically contact the ZooKeeper cluster to find out the current state of
the world (i.e., who is the current master and who are the current slaves).
The client also sets up a ZooKeeper watcher for the set of redis nodes
controlled by the Node Manager daemon. When the daemon promotes a new master
or detects a node as going down, ZooKeeper will notify the client near-
instantaneously so that it can rebuild its set of Redis connections. The
client also acts as a load balancer in that it will automatically dispatch
Redis read operations to one of N slaves, and Redis write operations to the
master. If it fails to communicate with any node, it will go back and fetch
the current list of available servers, and then optionally retry the
operation.

------
ryanlecompte
From the README:

Redis Failover attempts to provides a full automatic master/slave failover
solution for Ruby. Redis does not provide an automatic failover capability
when configured for master/slave replication. When the master node dies, a new
master must be manually brought online and assigned as the slave's new master.
This manual switch-over is not desirable in high traffic sites where Redis is
a critical part of the overall architecture. The existing standard Redis
client for Ruby also only supports configuration for a single Redis server.
When using master/slave replication, it is desirable to have all writes go to
the master, and all reads go to one of the N configured slaves.

This gem attempts to address both the server and client problems. A redis
failover server runs as a background daemon and monitors all of your
configured master/slave nodes. When the server starts up, it automatically
discovers who is the master and who are the slaves. Watchers are setup for
each of the redis nodes. As soon as a node is detected as being offline, it
will be moved to an "unreachable" state. If the node that went offline was the
master, then one of the slaves will be promoted as the new master. All
existing slaves will be automatically reconfigured to point to the new master
for replication. All nodes marked as unreachable will be periodically checked
to see if they have been brought back online. If so, the newly reachable nodes
will be configured as slaves and brought back into the list of live servers.
Note that detection of a node going down should be nearly instantaneous, since
the mechanism used to keep tabs on a node is via a blocking Redis BLPOP call
(no polling). This call fails nearly immediately when the node actually goes
offline.

This gem provides a RedisFailover::Client wrapper that is master/slave aware.
The client is configured with a single host/port pair that points to redis
failover server. The client will automatically connect to the server to find
out the current state of the world (i.e., who's the current master and who are
the current slaves). The client also acts as a load balancer in that it will
automatically dispatch Redis read operations to one of N slaves, and Redis
write operations to the master. If it fails to communicate with any node, it
will go back and ask the server for the current list of available servers, and
then optionally retry the operation.

------
clofresh
So just to clarify, the client code still connects directly to the actual
redis instance, it doesn't connect to the failover daemon as a proxy? It would
be useful to describe the connection algorithm in the README.

Also, the split of the reads and writes, that's done via opening connections
to both the master and slave?

~~~
ryanlecompte
That's right. The client still maintains direct connections with the actual
master/slaves. It's only when it fails to connect with one of them that it
goes to the failover daemon to ask for the current set of available nodes. The
split of the reads/writes is handled by the client, as it knows where to
dispatch commands (to master for writes, and to one of the slaves for reads).
I'll make this clearer in the README.

------
cheald
Awesome. I was implementing my own version of this with doozer and
eventmachine, but I might just use this one instead!

------
DanWaterworth
What happens in the case of a network partition?

~~~
antirez
Can't reply for the solution posted in this article, but well, I think this is
one of the main design concerns. For redis-sentinel (I described it in another
comment in this thread) the trick is that you place the sentinels where you
want and select a minimum number of agreement for failover, so what happens
depends on where you place the sentinels and the min agreement you configure.
It's easy to have the desired behavior this way.

~~~
salimane
something to keep in kind is that sometimes, the redis server can't accept
connections anymore because of limits etc...but the server is still serving
old connections. so in that case, i think you don't want to just failover...
the tricky part is to know if the server is really down

~~~
antirez
There is no sane condition in Redis that will make it not replying at all
AFAIK, even if you set maxclients to 1 the next clients will have an error
returned (and the connection closed ASAP). But yes, it is important to
understand what down means. I think one of the safest things to do is "down ==
unreachable". So if you don't get any reply at all, for the configured amount
of time consecutively, the server is down. And of course the other redis-
sentinels have to agree for the fail over to start.

~~~
ryanlecompte
The gem has a configurable --max-failures option that can be passed to
failover daemon. The daemon will only mark a node as being unreachable if it
fails to ping that amount of times (default 3). This might be something that
can be improved too, but it was meant to avoid false positives.

