using zalando's patroni operator in k8s at scale for years (mainly OCP but pure k8s as well).
Features like in place major version upgrade are no match for any of the alternatives checked.
Close to it is CNPG (cloudnative-pg) which is 2nd best and in 1yr might take the crown.
(for companies, best part is that cnpg has enterprise support for it (named pg4k, a fork of cnpg).
But, above all, I would warmly recommed anyone to first do their best to use cockroachDB (or yugadb if you like more) instead.
The benefits of distributed/horiz scaled DB usually overcome the effort of moving to it (which should not be big as it's using same pg client/protocol). And it's free if you don't need enterprise features like partitions, etc.
Spilo has, in my experience, been poorly maintained for a while. There has been some slow progress, but it doesn’t seem to be anyone’s priority right now (happy to be corrected).
I was running my own Spilo builds for a while, which was hit-and-miss. For my new (Kubernetes bare metal) cluster deployment I’ve moved over to Stackgres. I also evaluated CNPG (promising, but still early-ish days), as well as one other IIRC.
I found Stackgres to work most reliably. And it solves the biggest pain with Spilo, which is building an image with the required PG extensions. Stakgres instead has its own repository of extensions that it can install from, which is a huge help.
A false positive CVE list is an issue elsewhere as well, and it's important to understand that there's not much a Docker Postgres maintainer can do if the problem lies in the Debian or Ubuntu package and isn't getting fixed for some reason.
Very cool. How does failover work? I see that Spilo expects a standard load balancer in front of the cluster, but that load balancer won’t be able to track who the writer instance is on its own.
You can put HAProxy in front with a write- and a read-frontend with a backend each and all servers in the backend. To determine which server is a write instance or a standby you can provide a `external-check command` to the backends. That command can be a bash script, that connects to the server and executes `SELECT pg_is_in_recovery();`.
But, above all, I would warmly recommed anyone to first do their best to use cockroachDB (or yugadb if you like more) instead. The benefits of distributed/horiz scaled DB usually overcome the effort of moving to it (which should not be big as it's using same pg client/protocol). And it's free if you don't need enterprise features like partitions, etc.