In Redis Cluster there is no cluster data communication if not for resharding that only works when the whole cluster is on and is done by the sys administrator when adding a node.
So in normal conditions, a node will either:
1) Accept a query, or
2) Tell the client: no, ask instead 18.104.22.168:6380
All the nodes are connected only to make sure the state of the cluster is up. If there are too much nodes down from the point of view of a single node it will reply to the client with a cluster error.
What I'm sacrificing is only consistency because in every given time there is only a single host that is getting the queries for a given subset of keys.
The exception is in the resharding case that is also fault-tolerant. Or slave election (fault tolerance is obtained via replicas).
As a side note, the clients should cache what node is responsible for a given set of keys, so after some time and when there are no failures/resharding in act, every client will directly ask the right node, making the solution completely horizontally scalable.
Dummy clients will just do always the ask-random-node + retry stage if they are unable to take state.
Edit: there are little fields like this that are totally in the hands of academia. My contribution is from the point of view of a dummy hacker that can't understand complex math but that will try to be much more pragmatic.
This sacrifices availability. Remember, the cluster doesn't include only the servers; it also includes the clients, since ultimately the point of a database server is to provide the data to the clients upon request.
What I did was to stress the tradeoffs that my data model was forcing itself, as Redis handles complex aggregate data and an eventual consistent solutions sucks in this context.
So Redis Cluster users will be a fast scalable consistent solution that will start trowing errors if the network will go down badly, but that will survive if a few nodes will go bad or if there are small network problems affecting a small number of nodes. If this sounds too little available please explain me this:
We have a network with 10 web servers and 10 DB nodes.
The netsplit will split 8 nodes from all the rest, so 10 web servers will be able to talk with 2 nodes.
I wonder how this two nodes will be able to handle all the traffic usually handled by 10 nodes. Netsplit tolerance is a myth if you don't specify very very very well under what conditions.