Bravo to antirez and the Redis team.
30 seconds is set as 3 times the biggest period we have in info collection (INFO itself is sampled every 10 seconds, while PING is every 1 second), so that if there was a problem with the timer, in 30 seconds we are sure the new state will get new readings for every kind of request and information we collect, so when the TILT mode is exited, and the function to evaluate the state is called again, it should see clean values.
Note that from the point of view of Sentinel it is ok that the new time is wrong compared to what the real time is, we never use the absolute time. All we need is that we have a computer clock that more or less advances regularly.
I would love to see a generic tool for handing the clustering/failover problem.
1) The ability to use the master as a message bus to auto-discover things. This is possible because every Redis instance is also a Pub/Sub server.
2) The idea that after every restart of every Redis instance we have a "runid" that changes.
And in general the logic of the failover itself, the fact that the failure detection is precise (some specific reply codes are considered in a way, some others in another way), makes a non specific solution much harder to implement with the "methods" to perform the service-specific tasks that may end to be complex, or sometimes forces to completely change the logic of the system (lack of Pub/Sub).
"In which monitoring agents rely upon correct, truthful, answers about cluster state from the system they're monitoring"
"@cscotta Hi, you misunderstood how it works: Pub/Sub is only used for discovery on startup. Sentinel-to-sentinel p2p for critical stuff."
Pub/Sub is used to make the configuration simpler when you start a Sentinel cluster at a cold time when everything is working and your master is ok.
This allows us to auto-discover the other Sentinels, to check the slaves, and so forth.
Instead in order to understand if a system is down, who is the Sentinel that performs the failover, and for all the critical stuff, Sentinel to Sentinel messages are used without caring if the master Pub/Sub works.
Surely if a bug in a client can crash your server that's a bug in the server by definition even if the client is also buggy?