
Setting Up a RethinkDB Cluster on Docker Swarm Mode - nwrk
https://github.com/stefanprodan/aspnetcore-dockerswarm/wiki/RethinkDB-Swarm
======
jondubois
I'm about to start doing the exact same thing for RethinkDB using Kubernetes
instead of Swarm so this article is very useful to me.

Has the author considered this scenario:

If the host which is running one of your RethinkDB instances dies, will Swarm
reschedule it on a different host? What will happen to the data which was
sitting on the now-dead host?

Is there a way to force stateful containers to stick to specific hosts (and
wait for it to come back up) instead of moving them around in Swarm?

Or... Does RethinkDB offer a setting to allow you to automatically remove a
RethinkDB instance from the cluster when it dies (and elect a new master for
the shard)?

Last time I did some testing, I found that RethinkDB didn't let me write new
data while the master replica for a specific shard was down and I couldn't
find a way to tell RethinkDB to just forget the dead instance completely and
elect a new master (among the replicas). I understand there may be some small
data loss with this approach but it still seems reasonable in a lot of
scenarios. Is there a way to achieve this?

Is this achieved by having 2 rdb-primary and 2 rdb-secondary instances? I
can't remember what my setup was but I think I had 3 replicas. Maybe I had to
have the rdb-primary and rdb-secondary clusters join each other (maybe that's
what I was missing).

~~~
RaitoBezarius
> Last time I did some testing, I found that RethinkDB didn't let me write new
> data while the master replica for a specific shard was down and I couldn't
> find a way to tell RethinkDB to just forget the dead instance completely and
> elect a new master (among the replicas). I understand there may be some
> small data loss with this approach but it still seems reasonable in a lot of
> scenarios. Is there a way to achieve this?

According to Automatic Failover [1], it should elect a new master among the
replicas, if you meet the requirements (> 3 nodes, non-transitive network
failure and so on).

> Is this achieved by having 2 rdb-primary and 2 rdb-secondary instances? I
> can't remember what my setup was but I think I had 3 replicas. Maybe I had
> to have the rdb-primary and rdb-secondary clusters join each other (maybe
> that's what I was missing).

For the "join each other" technique, it might be more implementation detail
rather than something else.

If Docker rdb-primary container fails, the swarm will reboot a new one. But as
its command is "run a RethinkDB master", it won't work. You have to make it
join the current master (which is among the replicas I guess).

So we make the primary join the secondary and the secondary was already
joining the primary, so in the end. No matter which pools of containers we
crash, they can always join the cluster.

The point of failure can happen if too many containers fail faster than Docker
Swarm being able to spawn new ones, in addition to RethinkDB trying to make
sense of the clusterfuck happening around. (that should be a scenario to
explore).

[1]:
[https://rethinkdb.com/docs/failover/](https://rethinkdb.com/docs/failover/)

~~~
williamstein
In case you're interested, here are the scripts I use to run RethinkDB on
Kubernetes for SageMathCloud:
[https://github.com/sagemathinc/smc/tree/master/src/k8s/rethi...](https://github.com/sagemathinc/smc/tree/master/src/k8s/rethinkdb)

