Hacker News new | past | comments | ask | show | jobs | submit login

That looks very interesting and super complex.

I wonder how many companies really need this complexity, I bet 99.99% of the companies could get away with vertical scaling the writes and horizontal scaling the read only replica which would reduce the number of moving parts a lot.

I have yet to play much with kubernetes but when I see those diagrams it just baffles me how people are OK with running so much complexity in their technical stack.

I generally work with smaller companies, but early on (Kubernetes 1.4 ish) I found that hosting mission-critical stateful services inside Kubernetes was more trouble than it was worth. I now run stand-alone Postgres instances in which each service has its own DB. I’ve found this very reliable.

That being said, I think Kubernetes now has much better support for this kind of thing. But given my method has been so stable, I just keep on going with it.

> stateful services

yeah either these services support natively partitioning, fail over and self recovery or you have to be extremely careful not to cause any eviction or agent crash ever.

even something born for the cloud like cockroachdb can fail in interesting ways if the load order varies and you can't just autoscale it because every new node has to be nudged into action with a manual cluster-init, and draining nodes after the peak means manually telling the cluster not to wait for the node to come alive ever again for each node, wait for the repartitioning and then repeat for as many nodes as you need to scale back

This is the kind of work an operator is supposed to manage, just like if one were dealing with standard HA deployments of any stateful service that doesn’t ship with built-in orchestration (like PostgreSQL).

I've come to the conclusion that, much like how purchasing decisions seem irrational until you realize that different kinds of purchases come out of different budgets, there are different "complexity budgets" or "ongoing operational maintenance burden" budgets in an organization, and some are tighter than others.

It actually is not that complex. I'm using Crunchy Postgres Operator at my current employer. You get an Ansible playbook to install an Operator inside Kubernetes, and after that you get a commandline administration tool that let's you create a cluster with a simple

pgo create cluster <cluster_name>


Most administrative tasks like creating or restoring backups (which can be automatically pushed to S3) are just one or two pgo commands.

The linked pdf looks complex, because it:

a. compares 3 different operators

b. goes into implementation details that most users are shielded from.

And I'm actually not sure which one of the three operators is the author recommending :)

Not the author of the slides, but know him well. A number of things to chime in on. First thanks for the kinda words on the Crunchy operator.

Second, on the earlier question higher in the thread about why would you choose to run a database in K8s. In my experience and what I've observed it's not so much you choose explicitly to run a database in K8s. Instead you've decided on K8s as your orchestration layer for a lot of workloads and it's become your standardized mechanism for deploying and managing apps. In some sense it's more that it's the standard deployment mechanism than anything else.

If you're running and managing a single Postgres database and don't have any K8s anywhere setup, I can't say I'd recommend going all in on K8s just for that. That said if you are using it then going with one of the existing operators is going to save you a lot.

I agree that the k8s ecosystem isn't quite as complex as it seems at first, but specifically running stateful apps does come pretty close to earning the bad reputation.

(Disclaimer: I've tried and failed several times to get pgsql up and running in k8s with and without operators, so that either makes me unqualified to discuss this, or perfectly qualified to discuss this)

If the operator were simple enough to be installed/uninstalled via a helm chart that Just Worked, I'd feel better about the complexity. But running a complicated, non-deterministic ansible playbook scares me. The other options (installing a pgo installer, or installing an installer to your cluster) are no better.

Also, configuring the operator is more complicated than it should be. Devs and sysadmins alike are used to `brew install postgresql-server` or `apt install postgresql-server` working just fine for 99% of use cases. I'll grant that it's not apples-to-apples since HA pgsql has never been easy, but if the sales pitch is that any superpower-less k8s admin can now run postgres, I think the manual should be shorter.

I run multi terabyte, billions of rows HA postgres in kubernetes using a helm chart and Patroni (baked into the chart), which uses the native k8s API for automatic failover and pgbackrest for seamlessly provisioning new replicas. It's a single helm chart and is by far the easiest DBA I've ever done in many years.

I realise this is possibly asking you to give away secret sauce, but is this written up anywhere? Having an example to point at to be able to say "Look, this isn't scary, we can contemplate retiring that nasty lump of tin underneath that Oracle instance after all" would be quite a valuable contribution.

agreed re: configuring the operator. cockroach labs (full disclosure: I'm an employee) are building a HA pgsql alternative that Just Works with k8s to solve exactly this problem: https://www.cockroachlabs.com/blog/kubernetes-orchestrate-sq...

spencer kimball and alex polvi deploy a scalable stateful application on cockroachdb and k8s in 3 min: https://www.youtube.com/watch?v=PIePIsskhrw

Usually things become complex once something isn't going as planned. Like if your database slows down because the pods get scheduled on a weird node with some noisy neighbour, your backups failed because the node went down or other more hidden issues that take a lot longer to debug compared to some Postgres running on a normal compute instance somewhere.

It's just additional layers to dig through if something goes wrong, if everything works even the most complex systems are nice to operate so I wouldn't call it less complex just because someone wrote a nice wrapper for the happy path.

Building systems on top of complexity doesn't shield anyone from it. The author acknowledges this explicitly:

> High Effort - Running anything in Kubernetes is complex, and databases are worse

By definition, it's more stuff you need to know.

Even if the K8s operator saves time for 95% of the use cases, the last 5% is required. For instance, how do these operators handle upgrading extensions that require on-disk changes? Can you upgrade them concurrently with major version PG upgrades? When the operator doesn't provide a command line admin tool that fits your needs, how do you proceed?

Crunchy PGO is super cool but I'm not sure how we got to the idea that it's not that complex compared to a managed service like RDS.

Coming from someone at Crunchy I don't disagree on the notion of managed service being easier than running/managing yourself inside Kubernetes. Clicking a button and having things taken care of you is great.

Though personally I do feel like much of the managed services have not evolved/changed/improved since their inception many years ago. There is definitely some opportunity here to innovate, though that's probably not actually coupled with running it in K8s itself.

I don't think anyone would argue that RDS isn't vastly simpler. If it weren't, there'd be no reason to pay such a premium for it.

btw. zalando operator is more rough, but still pretty easy to use. crunchy operator does not work in every environment but is extremly simple (btw. the crunchy operator uses the building blocks of zalando) used zalando operator since k8s 1.4, no data loss, everything just works, ok major upgrades are rough, but they are rough even without zalando operator.

"But it has to run in Kubernetes!"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact