
Patroni: A Template for PostgreSQL HA with ZooKeeper, Etcd, or Consul - vishesh92
https://github.com/zalando/patroni
======
ZalandoTech
Zalando open source evangelist here. Some of my colleagues are on this thread,
taking your questions. Meanwhile, wanted to plug a few other Postgres/K8s
projects we're working on:

\-- Spilo: HA PostgreSQL cluster using Docker
([https://github.com/zalando/spilo](https://github.com/zalando/spilo))

\-- kube-ingress-aws-controller: Configures AWS Load Balancers according to
Kubernetes Ingress resources ([https://github.com/zalando-incubator/kube-
ingress-aws-contro...](https://github.com/zalando-incubator/kube-ingress-aws-
controller))

\-- external-dns: a Kubernetes Incubator collaboration. Configure external DNS
servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and
Services. ([https://github.com/kubernetes-incubator/external-
dns](https://github.com/kubernetes-incubator/external-dns))

We also have several older Postgres-related tools listed here:
[http://zalando.github.io/](http://zalando.github.io/).

Users and contributors very welcome; just drop a line in our projects' Issues
Trackers.

~~~
jchw
Looking forward to external-dns being merged. DNS records are probably the
last thing in our Kubernetes cluster that isn't automated.

I'm still not sold on using ALBs for Ingress resources. ALB is clearly
superior to ELB when using them for the same things, but it seems like ALB
costs will soar when using ALBs as Ingress controllers since they increased
the rule limit and subsequently started charging on them. Still, the
alternative of having four layers for every route seems nearly as undesirable.
I really don't want ALB -> nginx -> kube-proxy -> container.

~~~
hjacobs
This is a bit off topic, but the mentioned Ingress controller
([https://github.com/zalando-incubator/kube-ingress-aws-
contro...](https://github.com/zalando-incubator/kube-ingress-aws-controller))
is not using the ALB rules, it just provisions ALBs with the right SSL
certificate and points to some HTTP proxy doing the actual host/path routing
(e.g. Skipper). See [http://kubernetes-on-aws.readthedocs.io/en/latest/admin-
guid...](http://kubernetes-on-aws.readthedocs.io/en/latest/admin-
guide/kubernetes-in-production.html#ingress)

------
Dowwie
Could any contributers to Patroni opine on what impact Postgresql 10's logical
replication will have? Are you excited about it?

~~~
hintbits
While logical replication is a long-awaited and exciting feature, I'd use it
to replicate only a subset of my database, not as a replacement for HA. At
present, it cannot follow the master in a physical replication after promotion
(basically, you have to initialize your logical replicas from scratch), and it
has a bigger lag then the physical replication, especially for long-running
transactions, as one has to wait until the commit before sending the data.

Nevertheless, I think Patroni can help configuring logical replicas that will
never be part of the HA (if one can tolerate reinitializing logical replicas
after failing over). Pool requests are welcome!

------
eikenberry
I've been burned to many times by 3rd party HA solutions for Postgres. I'm not
touching it again until they either have an official solution or if it is
someone else's problem (eg. RDS).

~~~
hjacobs
We also use RDS in Zalando (and it works great!). With running Patroni on
Kubernetes we get several benefits over RDS which in our opinion warrant the
effort. Some examples include:

* Real superuser access to the cluster

* Installing custom PostgreSQL extensions not available on RDS, PgQ (Queues in PostgreSQL), PgDecode (logical decode to Kafka/Json), PgJobs (minimalistic jobs in PostgreSQL)

* Easier migrations via S3/Streaming replication from and to AWS - RDS currently offers no way out of RDS to another PostgreSQL cluster

* Deployment model allows for more availability/flexibility with less costs - EBS + S3 allow us to run single node PostgreSQL with high uptime

* Multi node PostgreSQL can run on different node types

* OAuth login for PostgreSQL (via PAM)

------
FooBarWidget
I wonder how much things like this are needed going forward. From my
(rudimentary) knowledge about Kubernetes and stateful sets, I think that
Kubernetes is able to solve a lot of the issues surrounding failover, recovery
of the old master, and partitioning; providing that we use Kubernetes in
combination with networked storage that guarantees reliability.

It appears that the way to setup PostgreSQL (or any replicated database) with
Kubernetes is to treat index 0 in the stateful set as the master, and
everything else as slaves. Each pod is to be connected to some networked
storage. If the master goes down, then Kubernetes detects that through the
health check, and simply reschedules the master on another node. The storage
volume that the old master was using, is simply reattached to this other node.
The master going down does not imply storage failure. Storage reliability then
becomes a separate problem which is solved in another system (e.g. the RAID
system or whatever).

In this kind of setup, there is no failover support in the database itself.
From the point of view of the database itself, it looks as if the underlying
hardware/OS "automatically" recovered from failure. This way we don't have to
mess with promoting slaves and stuff.

Kubernetes already assigns static IPs to services, so pgpool and similar tools
-- in so far they are only used to provide a stable network address for
PostgreSQL -- become redundant. Network partitioning is "solved" by treating
the Kubernetes state as the single authoritative description of the network
state.

What do people think about this? Obviously this setup won't work if you don't
have networked storage that can be reattached to another node, but I'm
thinking that maybe reattachable networked storage _should_ be the future.

~~~
tokenizerrr
> providing that we use Kubernetes in combination with networked storage that
> guarantees reliability

What kind of networked storage do people like with kubernetes? I've recently
set up a small cluster not in any cloud, and persistent cross-node storage is
a concern. There's quite a few options such as glusterfs, but I'd be curious
to know if anyone here knows about the tradeoffs.

~~~
takeda
If you're in AWS then EFS is what is designed to do that, it is not cheap, but
setting up a reliable solution is not easy either.

If your goal is file based storage (not block based), application can make api
calls to get data and is mostly read, then S3 might also be an option.

Alternatively, roll-your-own on EC2 instances, which might not be as easy.

------
cies
This is very interesting. I thought it was best to keep yr db out of the k8s
cluster. But this seems to make Postgres "cloud native".

The main reason for keeping dbs out of k8s was that the storage (persistence)
solutions in k8s were not up to the task yet. Now I wonder how is Patroni
doing persistence? I cannot find anything on it in the Patroni docs. Maybe
Patroni is so "self healing" that a proper storage solution is not an issue --
dunno, just guessing here.

~~~
vishesh92
I am not really sure Patroni has anything to do with the persistence of data.
It just uses etcd, zookeper or consul to elect a master in case of a failover
and uses their key value store to save the information about the current
master. Its upto you whether you keep your database on a container or not. And
how the data is managed by the container. This talk by Josh Berkus explains
how Patroni works pretty well.
[https://www.youtube.com/watch?v=OH9WSEiMsAw](https://www.youtube.com/watch?v=OH9WSEiMsAw)

~~~
jan_berlin
The great thing today is there are lots of options with Patroni, it is very
much up to you and your requirements what kind of storage you select and how
you want to recover from node and storage failure.

Patroni with Spilo e.g. relies on EBS or local discs and can also ship WAL to
S3. On Google it uses their equivalent solutions.

On Kubernetes earlier posts are correct that you may rely on Kubernetes
bringing your pod back with the same underlying volume thus providing some
kind of availability, but for others this is not good enough, e.g. remember
EBS is single AZ only. Patroni can help here, running and orchestrating either
slaves or giving you automated recovery from S3.

------
616c
An interview with the development team. I thought the project namw sounded
familiar!

[https://talkpython.fm/episodes/show/72/fashion-driven-
open-s...](https://talkpython.fm/episodes/show/72/fashion-driven-open-source-
software-at-zalando)

------
geoka9
What's zalando/patroni's approach to performance scaling/load balancing? For
example, pgpool2 can distribute SELECTs among multiple slaves which makes it
useful for heavy analytics loads. Can you guys do that, or you only handle HA?

~~~
CyberDem0n
Patroni has REST API providing a health-checks for load-balancers.
patroni:8008/replica will respond with http status code 200 only if node is
running as replica.

In the HAProxy config file you just need to list all Patroni nodes, specify a
health-check and it will do a load-balancing for you.

The same approach works for example with AWS ELB.

~~~
geoka9
But writes need to be directed to the master, right? Or Patroni is smart
enough to do it by itself?

~~~
CyberDem0n
Correct. Writes need to be directed to master. May be not only writes but some
reads as well. Only application can know which statement can be executed on
replicas and which must be executed on master.

------
hjacobs
Here is a more recent talk about Patroni and Kubernetes (KubeCon Berlin 2017):
[https://www.youtube.com/watch?v=CftcVhFMGSY](https://www.youtube.com/watch?v=CftcVhFMGSY)

------
eicnix
How does this compare to other HA postgres products like stolon or Postgres-
XL?

~~~
vishesh92
As per my understanding, Postgres-XL is not comparable to Patroni. Postgres-XL
is comparable to PostgreSQL. Postgres-XL shards the data across multiple data
nodes. Where as Patroni uses etcd, consul or zookeeper to provide HA for any
Postgres cluster using replication (Data is written only to a single instance
and replicated further).

I think you meant Stolon. Stolon is quite comparable to Patroni and has more
features than Patroni, but I am not sure how many people are using it. I found
more resources for Patroni as compared to stolon.

~~~
CyberDem0n
> Stolon is quite comparable to Patroni and has more features than Patron.

Let me as a question, what feature Stolon has and Patroni doesn't? Can you
give an example?

From my side I can provide list of features available in Patroni, but not in
Stolon. For example:

* there is no way to do a controlled failover (switchover) in Stolon.

* in Patroni it's possible to exclude some nodes from a leader race.

* Patroni supports cascading replication

* Patroni can take basebackup from replicas if they are marked with a special tag

* it's even possible to configure Patroni to use a custom backup/recovery solution instead of pg_basebackup

* Patroni can give you a hint that postgres must be restarted to apply some configuration changes

* with Patroni it's even possible to schedule switchover or restart restart of postges on some specific time (for example 04:00 am, when traffic is minimal)

* with Patroni you can control High-Availability / Durability ratio. I.e.: what to do if master postgres has crashed? Either you will try to start it back or failover to a replica. But start of a crashed postgres may take a long time. With Patroni you can configure that if postgres didn't started in some amount of seconds - please do failover.

Stolon provides cloud-native deployment and Patroni doesn't? This is not
really true. The main idea of Patroni is to be not dependent on some specific
technologies. It can work as on bare-metal with the same success as on
Kubernetes. It's very easy to build a custom solution with Patroni for your
use-case. For example there is a Spilo project:
[https://github.com/zalando/spilo](https://github.com/zalando/spilo), a docker
image build with Patroni and wal-e for a cloud deployments.

~~~
vishesh92
Thanks for this. As I mentioned in previous comment, I couldn't find much
about Stolon. One of things I likee in stolon is the proxy, _it enforce
connections to the right PostgreSQL master and forcibly closes connections to
unelected masters_. Which is not present in Patroni, but I guess it should be
doable using consul's service discovery and dynamic DNS. (I am not sure if
this can be done using etcd and zookeper).

~~~
CyberDem0n
Doesn't HAProxy provide such functionality? Patroni has REST API which can be
used by HAProxy for a health-check.

patroni:8008/master will return http status code 200 only if the node running
as elected master

patroni:8008/replica will return http status code 200 if node running as
replica.

And final missing bit is a automation of generation of haproxy.cfg - it could
be done with confd:
[https://github.com/kelseyhightower/confd](https://github.com/kelseyhightower/confd)

And here is an example of template file:
[https://github.com/zalando/patroni/blob/master/extras/confd/...](https://github.com/zalando/patroni/blob/master/extras/confd/templates/haproxy.tmpl)

~~~
vishesh92
I am not sure, will HAProxy forcibly close connections to old master in case
of a failover?

~~~
CyberDem0n
on-marked-down shutdown-sessions

should do the trick

