Recommended Architectures for PostgreSQL in Kubernetes

wmf · on Sept 29, 2023

Related discussion about CNPG from last week: https://news.ycombinator.com/item?id=37616033

mslot · on Sept 29, 2023

One challenge with running PostgreSQL in production on Kubernetes is that it does not behave nicely under high memory pressure. PostgreSQL likes to get NULL back from malloc such that it can gracefully abort transactions (overcommit_memory = 2 in Linux). However, this model of tracking and dealing with memory pressure is still not available at the cgroup level. Instead, processes will be OOM killed. In PostgreSQL, that triggers potentially lengthy crash recovery and therefore downtime. On the flip side, the cost of deploying a hot standby is often lower on Kubernetes, but failing over due to high memory pressure can also lead to pathological behaviour.

A good discussion of this issue by Joe Conway: https://www.crunchydata.com/blog/deep-postgresql-thoughts-th...

Hence, not completely the same as running on a VM.

alexeldeib · on Sept 29, 2023

The article mentions avoiding overcommit and oom score adjust. You can avoid overcommit by always specifying requests == limits and can use priority class for oom score adjust.

There are definitely improvements like memory qos/pressure handling, but not sure what you mean about those specifics, they can be handled.

You can always oom the whole node, I don’t think(?) the fact that there’s a non root oom matters? So you could arguably set no pod limit, set priority class critical, and let another pod in the workload cgroup get killed.

westurner · on Sept 29, 2023

From TA:

> My advice for running Postgres in Kubernetes is then to:

> - Rely on PostgreSQL replication to synchronize the state within and across Kubernetes clusters – in Kubernetes lingo: choose application level replication (Postgres), instead of storage level replication

> - Fully exploit availability zones in Kubernetes, instead of “siloing” data centers in separate Kubernetes clusters, in order to automatically achieve zero data loss with very low RTO high availability within a single region, out-of-the-box [deploy across multiple availability zones within at least one region]

westurner · on Sept 29, 2023

FWIW, this will create a podman pod with a raw disk mapped through to the whole pod, which systemd can log and respawn: `podman pod create --device=/dev/sdc:/dev/sdc:rwm` https://docs.podman.io/en/stable/markdown/podman-pod-create....

If you need encryption of data at rest for industry compliance for example, you would need to provision said storage device(s) and then manage key rotation and unattended upgrade and reboot for the hosts and/or containers that hold the drive keys.

Maybe this is where they explain that part of the iSCSI SAN;

> If you are reading this blog article, and you are thinking about using Postgres in a Cloud Native environment, without any Kubernetes background, my advice is to seek immediate professional assistance of a certified service provider from day 0 – just because everyone is going on Kubernetes doesn’t mean that’s necessarily a good idea for your organization!