
Introducing pg_auto_failover - ibotty
https://www.citusdata.com/blog/2019/05/30/introducing-pg-auto-failover/
======
keypusher
Given that there is no fencing/stonith mechanism, I’m curious how this avoids
possible split brain scenarios. Let’s say you have one primary and two standby
in sync replication, as recommended. The primary experiences a network
partition from the other two. Is one of the standbys auto-promoted, or not? If
so, there is now the possibility of two running primaries and a divergent
timeline. If not, then you don’t have auto-failover. What am I missing?

Edit: After looking at the architecture page, it seems that the monitor would
ask the failed node kill itself? That is polite but wouldn’t work if monitor
is partitioned with the standby nodes. Does this expect the primary will self-
suicide if it cannot connect to the monitor? Such an approach could be
problematic if the keeper process is hung but pg itself is still running.
Would love clarification.

------
btown
Is this available on Citus Cloud or any hosted Postgres as of today?

