Hacker News new | past | comments | ask | show | jobs | submit login

There is no good reason to run non-test database workloads in Kubernetes or Docker. Databases are designed to sit close to the hardware and have a stable, dedicated chunk of resources for a long time, whereas Kubernetes pods are subject to vaporization at any moment. Databases traditionally have fought the operating system to try and maintain enough control to remain performant. Introducing additional layers into this would be dubious at the best of times, but when it's something fundamentally contrary to the application's nature like stateless orchestration, it's pure farce.

There could not be an application worse-suited to running in Kubernetes et al than a traditional database. Anyone claiming something that rams this square peg into that round hole is "production ready" is showing that they're an empty husk and shouldn't be trusted near anything important.

Note the downvotes already rolling in less than two minutes after I posted this. This subject is a major third rail here. It goes against the agenda of very powerful people and my account has been censured in the past specifically for making this particular argument, that database workloads and Kubernetes don't mix. Keep that in mind when you're asking HN for their experience on this (or any other topic that YC considers critical to the interest of their investments -- they've shown that they're willing to taint the discussion if it gets too dicey).

Using Kubernetes doesn't imply using Docker, even. K8s is 99% an orchestration system, like Terraform or CloudFormation. One resource among many that it orchestrates is containers. It can also orchestrate regular VMs.

That being said, I also disagree that Docker isn't suited to running a DBMS, assuming you actually have a large enterprise (or cloud) datacenter backing your Docker daemon. In such cases:

• You'll probably have a large enough pool of Docker machines (k8s or not) that you're going to be deploying your DBMS container in a way that reserves an entire instance just for it (or it + its accessory containers);

• You'll probably have a SAN, and you'll have many enterprise-y reasons (e.g. live VM migration) to prefer backing your DBMS with said SAN, rather than with local instance storage.

If both of those are true, then Docker has no disadvantages compared to deploying your DBMS as a raw VM.

As an "enterprisey" person, I disagree. I've seen a lot of enterprise infrastructure that looks like toddlers built it out of lincoln logs. And I've seen SANs lose connectivity much more often than a pool of independent local disks all going bad at once. On top of that, databases run on VMs that aren't on hypervisors dedicated for running databases results in shitty adminning and overcrowded VM pools destroying database performance+reliability.

Cloud-ish infrastructure is often good for running distributed decentralized databases, but try running Oracle in a bunch of Docker containers on a crappy OpenStack cluster and soon you'll be crying into your scotch.

These efforts to make people think it's a good idea to run databases on K8s are misleading people, and god help those poor teams that waste years trying to stabilize something that a fancy web page and a youtube tutorial said was a great idea.

I assume you also consider databases on cloud with mounted block storage as not production ready too?

For the record, I have to use my reply allocation sparingly, since usually when I start talking about this I'm mysteriously throttled for long periods.

That said -- no, that's not the same thing at all. Barring anomalous conditions, VMs run as long as you keep them running. They won't be reaped and rescheduled onto some other node in the cluster, whether by automated rebalancing processes or by manual `kubectl delete po...` or `kubectl drain`. You can easily set up a VM that will behave more-or-less like conventional hardware if we ignore the perf hit.

This is a pretty simple thing. The reason people say you need to make your apps "12 factor" when you go to k8s is because it doesn't work well if your app cares about state. Databases care deeply about state. You can't just kill a DB server and spin up a new one to pick up where it left off. You can't parallelize a DB workload by spinning up 8 little DB nodes. It's not a web server and it just doesn't work like that. Things like CockroachDB exist specifically because normal databases don't work like that.

This is where people usually bring up things like annotations, labels, StatefulSets, etc. First, note that the facilities that accommodate stateful workloads are not priorities for Kubernetes and are generally not well-tested or consistent. This wouldn't be a news story or an independent project if they were.

Second, please realize you're doing all of that work to try and make Kubernetes do something it's not really designed to do, with potential negative impact on the availability and scheduling processes for the applications that do work well on Kubernetes, when you could just spin a VM and avoid all of these issues entirely. There's no reason to put a production DB on k8s other than cargo culting.

As someone who designed kubernetes, I completely disagree. We designed it to run stateful workloads. From V1 we set very strong safety guarantees around how pods work and are scheduled. However, like any software infrastructure, you are vulnerable to a lot of possible failure modes. The kernel can hang on NFS mount disk operations. The SAN can go into a gray failure. A cleaning guy can pull a power cord out. Bad code in the kubelet can result in volumes failing to detach. Random people on the internet can open PRs that remove pod safety protections (happens about once every few months).

Just like any other tool that makes some things easier, Kubernetes also makes it easier to shoot yourself in the foot. Just like any solution, you have to know the system well enough to reason about it. There is still a lot that can be done to improve how we explain, document, and describe the system. But people run stateful workloads on Kube all the time, and they do it because it makes their lives easier on the balance.

If Kubernetes was designed with stateful workloads in mind why did it take until version 1.3 to introduce petSets as an alpha feature?

Because we prioritized stabilizing the core and having something shipped. Pod safety guarantees were part of 1.0 and ensure “at-most-one” pod with a given name at a given time on any node, which allows us to build a compute model that can be used in higher level primitives. Persistent volumes, reserving space in the DNS schema for services to have subnames, and headless services were all designed in specifically so we could do stateful sets.

A database is an application like any other. Containers are about managing the lifecycle of the process and container managers assist in getting the right state to a container. Wether or not a container has state or not doesn't make it easier or harder to run in a container.

If you aren't managing your state, then yeah you will run into a nightmare when trying to containerize stateful apps... or running them at all. You will literally have the same problems with a VM or physical hardware.

It's important to separate state management from process management. A stateful application is absolutely not harder to contaknerize than a stateless one. Rather it is simply just harder to run stateful applications in any regard.

I would personally argue that it is easier to run a stateful app with a container manager. I know it sounds crazy but... keep in mind container tools are cenetered around what each individual application requires and the tooling tends to make it easier to express and assist in managing the state requirements of that application.

For that matter you can even prevent the scheduler from scheduling your stateful app on a new node, which seems to be the answer for the crux of the argument against containerizing a stateful app.

> Wether or not a container has state or not doesn't make it easier or harder to run in a container.

I agree, which is why I specifically avoided that language. Containers don't have to be implemented without regard for state -- but if you're talking about Docker or k8s, they are. Docker throws away anything not explicitly cemented in the image or designated as an external volume.

LXC, zones, and jails are containerization techniques that respect state. It's fine to run a database in these if desired. They behave just like real VMs; they have an init process, they get real IPs, they don't automatically destroy the data written to them, and they generally don't mysteriously shut down or get rescheduled. You can't be confident about any of that with Docker or k8s.

Statefulness is not a primary use case for Kubernetes. It took two years for StatefulSets to leave beta and there was a substantial false start in PetSets. As recently as April, which is the last time I seriously looked, there were still competing APIs for defining access to local volumes.

If you want to run a production database workload in a jail or a zone, that sounds fine to me. It's not about containerization in the abstract. It's about the way that Kubernetes and Docker do it.

(I mention Docker and k8s together because for most of k8s history Docker was the only supported runtime. It supposedly can use other runtimes now, but they're not widely used afaik, and behave similarly re: state anyway)

StatefulSets are PetSets. We renamed them. The core API designed hasn’t materially changed since the alpha.

No. That's the point. Docker and k8s provide a means to express your state requirements and splits state management from process management.

The trick is to express your state requirements. And yeah, you will be burned badly if you don't do this... and maybe docs and such should call this out better to make sure people don't set themselves on fire just because they didn't dig in deeply enough.

But docker and k8s do provide a means to assist in managing this state for you (swarm... not so well just b/c the work hasn't been done).

So actually kube and docker throwing away your state (that you haven't specifically persisted) is basically a good thing, because it makes you very aware of where your state is.

I'm on board insofar as bosses, regulators, and customers find "heightened awareness of state" an acceptable substitute for the production data that was sacrificed to the cause.

Module the development cycle of push and run a new container being incompatible with state. Sure google containerizes everything but they invested effort.

Again, separate the app layer from the storage layer.

Why would an app dev be pushing changes to the db deployment (outside of data manipulation itself)?

Just because the app dev wants to spin up a db in dev to shove their data into doesn't mean that's how it should be deployed in prod.

EC2 instances do go down, EBS volumes fail, hardware fails. Maybe not as frequently as pods in Kubernetes get evicted but at sufficient scale it does occur frequently enough overall that you do need to find ways to handle this automatically without human intervention.

Once you’ve achieved that whether your database runs on a VM or in Kubernetes doesn’t make a difference really.

Granted, if your not at that scale, running a database in Kubernetes is probably not the best of ideas. That has nothing to do with Kubernetes though, that’s because running a stateful service with decent working backup, recovery and automated failover is difficult in any case. If that’s not your job, you’re probably better off using RDS or something equivalent.

At the end of the day, when you can give pods in the form of statefulsets static IPs, static names, static labels, indexs, and consistent storage, and give them strong guarantees of running, then I'm not really sure you have a strong argument that it's vastly different from the IaaS layer.

Honestly, it sounds like you're arguing that since the kubernetes API is easier and more accessible to use, then it's more dangerous to run state on that layer. That, and a community attitude of being more willing to accept failure, which some would argue is a good thing, others not so much, but I prefer to subscribe to the thought process discussed in the SRE book that failure is inevitable, and that putting your databases inside their kube equivlent saved toil time and harderns your setup.

That said, I would argue most folks being on cloud anyways should just use a managed postgres, but we're not always on cloud and I don't think claiming putting state in kube it's inherently wrong is fair.

> They won't be reaped and rescheduled onto some other node in the cluster, whether by automated rebalancing processes or by manual `kubectl delete po...` or `kubectl drain`.

I take it you've never managed a large VM hypervisor (e.g. vSphere) cluster. If your VMs aren't being pinned to particular hypervisor nodes by persistent claims on local instance storage or the like, they end up "floating around" on each restart in pretty much the same way k8s containers do. Especially so if you have live VM migration enabled, in which case you're probably doing the equivalent of `kubectl drain` all the time to deprovision and repair hardware.

Funny you should mention that. I run a large vSphere cluster (that I inherited) now -- large meaning several hundred VMs. Live VM migration is different because it happens totally transparently; from the guest's perspective, there is no disruption at all. On k8s, pods are recycled all the time and afaik there is no "live" migration of pods that doesn't involve killing and restarting the process. k8s's "vaporize the pod first" culture is basically the opposite of enterprise-grade hypervisors, which exist in large part to minimize incidents that would require the destruction of state, even in the face of hardware failure.

True enough, though I would posit that k8s’s strategy (no live migration) makes sense if you assume that you’re running k8s on top of a VM cluster that has its own live migration, such that you’ll never need to talk to issue an API call to the k8s manager for hardware-related reasons. In such cases, the only time you’re doing a `kubectl apply` is for release management reasons—and it’s nearly impossible, in the general case, to automatically compute a “live migration” between e.g. two different versions of a deployment where the architecture is shaped differently.

(It’s not impossible in specific cases, mind you. I’m still waiting on tenterhooks for the moment someone introduces an Erlang-node operator where you can apply hot-migration relups through k8s itself.)

FYI: there is no "reply allocation". HN adds an increasing wait time before you can reply to replies on your posts in a thread to prevent deeply nested rapid fire arguments.

EBS support in Kubernetes was considered experimental as of last year. Not sure about now.

I recall a few nasty issues in the GitHub with data loss or unmountable volumes for the early adopters, with the official answer along the lines of "implementation is in progress".

I do not think EBS support in Kubernetes was experimental in 2017. I am one of the maintainers of in-tree EBS driver and we have tried our best to iron out any bugs reported.

There are still bugs, I do not disagree. Data loss bugs are considered top priority and I am not aware of any open such bugs against EBS driver.

Pretty sure the bugs in question were closed.

You'll excuse me but no time to go through the history and dig up the tickets for reference.

I downvoted you because of that whole conspiracy theory you tacked onto the end of your post.

But I fully agree that kubernetes and containers are not well suited to running production databases. In theory they could achieve parity with a dedicated machine or VM, but they're still a long ways from that - and it makes it very easy to lose your data. I was recovering a database where the persistent volume wasn't setup right and the container got killed and restarted. It was just before the holidays and it was a nightmare because everyone was on vacation.

Yeah you could get into that kind of problem with a VM or dedicated machine, but the bar is a lot higher, you'd need some kind of hardware failure. Kubernetes makes it really easy to shoot yourself in the foot when running databases.

"I was recovering a database where the persistent volume wasn't setup right and the container got killed and restarted."

In other words, your database application was using scratch storage instead of persistent volumes?

What this anecdote shows is that the developers or admins responsible to setup the database didn't do it properly.

Also, testing failures and data recovery should be your priority before going to production.

I don't see how you could blame that on software.

Not 100% sure that it was using scratch, but something went wrong with the persistence.

The point is not to say it wasn't human error - clearly it was, but it's an error that wouldn't have been as easy to make without kubernetes. There's a cost to running a database on k8s that largely people ignore. That's before you start talking about backups and recovery which also get harder and require more manual work with more potential for error.

The same good reasons for running any workload on k8s apply to databases as well, it's just that they are more complicated than stateless services due to the (no surprise) state and the clustering/control protocol things that often accompany HA data stores. Kubernetes has the tools available to manage state now, and many (maybe most) databases now offer some support for its native discovery model. So all in all my current preferred strategy for databases is to prefer hosted if its available (cloudsql, elastic db), k8s if it isn't, and vms if it won't work on k8s.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact