Hacker News new | past | comments | ask | show | jobs | submit login
Thou shalt not run a database inside a container (patrobinson.github.io)
59 points by swyx 3 days ago | hide | past | favorite | 42 comments

I remember when this article came up a long time ago. We had just set up an in-house Kubernetes cluster running on AWS.

We were attempting to run Cassandra on top of it and let's just say the majority of our time for a few months was spent fine tuning the setup (iirc this was early days StatefulSets). In the end we gave up and went to EC2 for Cassandra.

Persistent volumes have come a long way, and container technology in general has rapidly evolved. I have yet to try any databases on K8s in a production environment since that attempt, but I don't believe that it's as bad as it was back then.

Many people are scared of it, one of the biggest reasons usually given is that containers are meant to be ephemeral. While containers allow for applications to be ephemeral, and quite frankly makes it a lot easier, I don't believe that they're meant to be ephemeral. At the end of the day, containers are just somewhat isolated processes. Processes don't need to be ephemeral, and often aren't. With the right setup, databases can be run from containers, and I believe the rapid evolution of container technology has allowed for this to be possible today.

Containers are no more ephemeral than any other process. People who say this I think are just saying that because of default docker storage settings.

>>Containers are no more ephemeral than any other process.

You're missing the point. It's not that containers cannot be ephemeral. The whole point is that containers are meant, and expected, to be ephemeral by design. Pets vs cattle and all.

Containers, and cattle-vs-pets, is about making dirty system state disposable and making apps cookie-cutter and repeatably deployable.

But that's not the same as saying containers are only suitable for ephemeral workloads/data. With persistent storage it's not just possible, but I think makes a lot of sense to run something like a database inside a container. My default is to run Postgres etc. in Docker instead of 'apt install postgres', because it's so much easier to get the right Postgres version with Docker than tracking down whatever unofficial apt repo has the Postgres version I'm looking for.

The key is I do this in Docker, with a basic folder bind mount on the exact same kind of machine I would use if I installed Postgres via 'apt'. I don't do this in Kubernetes or use cloud storage volume adapters. Kubernetes brings a lot of complexity and risk into ops, and that's not what you want for a database. Docker is mostly a layered filesystem format plus a cgroups isolation wrapper around native Linux processes. So as long as you're not mixing in some other tool that e.g., adds on network traffic routing or something, it's basically just Linux with convenience features.

Switching from 'apt' to Docker with maybe a basic docker-compose.yml is an underrated infrastructure pattern. Operating Postgres from Docker feels about as simple as apt-installed Postgres, with no downsides I've seen besides needing Docker at all. It's also great for deploying in-house apps; moving workloads from vanilla OS processes with systemd and Ansible or whatever over to plain ol' Docker (or maybe Dokku) simplifies things every time I do it. It's fancy tools that build on top of Docker that make things complicated or risky, but someone rejecting vanilla Docker because they don't want to use K8s feels like throwing the baby out with the bathwater.

> But that's not the same as saying containers are only suitable for ephemeral workloads/data.

No one said that.

> With persistent storage it's not just possible,

There's some confusion on your remarks. A container can be ephemeral even if it has access to persistent data. I mean, think about it: isn't a container still ephemeral even if it has a database connection?

Ephemeral is about internal state, and how recreating a container after deleting it will get it in the exact same state.

I mean originally all EC2 instances were ephemeral by default. Shutdown and all your Bitcoin goes bye-bye.

What's the functional benefit of using containers over EC2 images for a database? I can't see why anyone would reasonably consider trying such a thing.

Containers make a lot of sense when you're the one developing the software, and you have a lot of micro-services to support, none of which really scale to EC2 levels of utilization.

But EC2s seem better suited to running pre-installed software that can, and will, fully utilize an entire VM. Such as a database.

My intuition on this subject is, docker is the latest, trendy hammer and so everything becomes a nail / container. Maybe people aren't familiar with the tooling around EC2 image creation and deployment.

It is attractive to have everything in your production environment to be of the same type and managed through the same handles.

> What's the functional benefit of using containers over EC2 images for a database? I can't see why anyone would reasonably consider trying such a thing.

Version upgrade speed.

How so?

With a VM image, the process can be as simple as, yum update, snapshot, then update the configuration to point to the new AMI ID. And if you're using a tool like packer to create AMIs, the process might be to kick off the CI/CD pipeline again.

Versus, in the simplest scenario, docker run -e ... -v ... image:<new-version> which can be executed via Ansible or similar. Another method is to simply bump a version in marathon app definition / kubernetes config and reapply.

4 years old. The world is completely different today. We run a number of HA Postgres setups on k8s and it works beautifully. Local nvme acccess with elections backed using k8s primitives.

By nvme access ... could you expand on what the means exactly? I would like to run some disk heavy workloads in a similar fashion.


2017 Followup: https://patrobinson.github.io/2017/12/16/should-i-run-a-data...

Tone softens a bit, but still comes to the conclusion it’s not a good fit. Four years on, I can’t imagine a benefit to running a prod DB out of a container either.

I don't see the benefit either. I believe Netflix made this idea popular with sidecar containers as a way to add

> “non-intrusive platform capabilities” [0]

to container stacks.

I understand that the idea might be popular among developers as it is easier to just add a database container to your stack rather than dealing with the db admin. I don't know if Netflix ever recommended databases as sidecar containers. But I have seen it in the wild where dev followed the Netflix model. I sometimes hear people arguing that they have to manage containers anyway so it would be less overhead to manage the db as a container as well.

0: https://netflixtechblog.com/prana-a-sidecar-for-your-netflix...

The advantage is you deploy your entire application stack the same way you would anything else. Zero effort to roll out new environment. And if you need to scale it up, you just increase your k8s request/limit on CPU/Memory and restart it and it gets scheduled to a node as appropriate.

I'm very familiar with an environment that has about 120 TByte of databases - Mongo and Postgres - 100% run in containers over about 75 namespaces in GKE. Handles being restarted without issue.

What benefits do they get from running databases in Kubernetes?

Once you containerize every part of your application stack, you can simplify support/deployment to a single model, and you get the advantages of everything that containerization has to offer - dynamic scaling, robust recovery, trivial application migration, etc... without having to build in special rules/processes for one-off elements. The database then become yet one more component in your environment that isn't treated any differently than any other component - with the possible exception of requesting that it be scheduled less ephemerally than other components.

I've seen it work in production for 3+ years - And I don't recall there ever being an issue with the databases being in a container - it's hard to imagine them being anywhere else.

After reading that I start thinking everything would be better containerized. So why not run kubernetes nodes in containers?

You can!

KinD[0] is a project to do just that, although production workloads are an explicit non-goal.

[0]: https://github.com/kubernetes-sigs/kind

Since the parent is running on GKE, they’re already doing that. Postgres in a (kube) container on a (GCE) VM in a (borg) container on a computer.

> So why not run kubernetes nodes in containers?

Plenty of usecases are already covered by that. Also, running containers in containers is also a thing.

It's containers all the way down.

The beauty of containers is how ephemeral they are — need more? Here’s more. Need less? Bye. Oh, this one died? Kill it.

Terrible fit for databases, inherently.

That's only one way of looking at containers. Another way is to see them as a modularization tool since they abstract most of the machine away while being much lighter weight than a VM.

I've recently been looking into containerizing my personal server. I use the cheapest VM on AWS (I work on Azure now, but vendor lock in is real). Adding another VM would double my costs, but with containers I would more isolation and can upgrade sites I run on this server independently.

I usually use sqlite, but I can see why containers would be nice for similar reasons. Even if you aren't resource constrained with VMs, running a DB via containers might be nice to ensure a consistent dev setup even if you don't use containers in prod.

I do this using Lightsail to run my personal blogs, each blog is running in its own container w/sqlite mounted from a local for persistence (too lazy to do actual Docker volumes).

Ditto for Nginx, it's also containerized.

It's a great setup b/c I can also deploy the containers locally (and SCP the sqlite DB locally) to run the blogs on my laptop if I need to.

You’re making a feature into a disadvantage when you could just not use the feature.

But, don't you want to code your application stack such that your database being killed is a no-op? Shouldn't the state mostly reside in your persistent store (which decidedly needs to be robust)

Maybe I'm not newfangled enough but isn't the database your persistent store? If not, maybe this is where the db in vs out of container folks are taking past each other.

I'm calling out the difference between the storage mechanism (SSDs, Block Storage, etc...) - where transactions and journal-logs are dumped, so as to allow recovery if the database server goes down.

Once you design your database server/application/transactions such they don't care whatsoever if the database server goes offline - you not only get the ability to run fine in containers, you also make your application stack just significantly more robust for all sorts of reasons.

I have colleagues who bemoan a major application server going down - which is a no-op in my world where the ephemeral nodes rarely last more than a day or two in GKE - so we're completely used to them going down.

Building transactions, idempotency, tasks in rabbitmq/celery rather than memory - not only do they make deploying in containers straightforward, they also give an overall robustness win for all applications.

I mean, physical servers go down as well.

Sorry, I'm not quite piecing together what scheme you're proposing. Like, where are you storing things like your user/password tables that aren't persistent databases?

I'm familiar with the topics around HA and how you could ignore individual hosts going down, but the whole database?

Not who you replied to, but: you're probably thinking Frontend->Backend->DB or similar, right? Frontend runs in browser and is either served by its own container or the backend, backend is in a containerized server, DB in something more persistent.

Except, the reality is it's Frontend->Backend->DB->Storage. To some extent, the DB is just a standardized (and very quick) caching/sorting mechanism in front of an SSD, HDD, network drive, whatever. Storage can be separate and persistent, even if the DB is in a container and gets killed/dies.

If that's the case, then even if the DB is in an ephemeral container the backing storage should always survive. With it, you can bring up a new DB container with all the same data being available.

So yes, your passwords etc are in the "DB", but really they're in storage on disk. Yeah, your app won't work well if the DB disappears, but that's a given no matter whether your DB is in a container or somewhere else. Once it's back up, everything should work fine.

When the database is down, no transactions occur. But as long as the storage medium (SSD, Spinning disks, PVC, whatever) - doesn't go down - then when the database comes back up - anything that needs to be rolled back is rolled back, and things just keep running along without any problem.

So - the user will see a "Pause" when using the application if the database is down - but that's it.

The great thing about architecting your application stack in such a way that every one of the servers/applications/databases can disappear in a moments notice, is that your application becomes super robust to hardware failure - because having a sever go "down" is Business As Usual in Kubernetes.

While this is somewhat true, most databases are not designed for _fast_ recovery.

Isn't Vitess all about sharding and autoscaling mysql instances with containers and kubernetes?

Yeah, joining the chorus but this is outdated info. We're running a CockroachDB cluster on top of GKE and the management aspect is excellent. For something like Postgres it's more difficult and we still run that outside of containers, but I'm sure it can be done well also.

Not true anymore. We have a massive Vitess deployment in Kubernetes and it runs very well.

I help support SQL Server on Linux for which we provide a number of official containers. There are some limitations, but these have been pretty useful for a number of customers. Also, my understanding is that volumes are pretty mature these days.

Why wouldn't you put the 2016 date in the title? That's standard courtesy.

If you do just make sure you create volumes to the host for the data to be saved, but yeah as a general rule no.

It works perfectly fine with rexray + ebs like volumes or kubernetes PVs by ebs like volumes. So why not?

2016! can tell by the reference to petsets and being in alpha.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact