We were attempting to run Cassandra on top of it and let's just say the majority of our time for a few months was spent fine tuning the setup (iirc this was early days StatefulSets). In the end we gave up and went to EC2 for Cassandra.
Persistent volumes have come a long way, and container technology in general has rapidly evolved. I have yet to try any databases on K8s in a production environment since that attempt, but I don't believe that it's as bad as it was back then.
Many people are scared of it, one of the biggest reasons usually given is that containers are meant to be ephemeral. While containers allow for applications to be ephemeral, and quite frankly makes it a lot easier, I don't believe that they're meant to be ephemeral. At the end of the day, containers are just somewhat isolated processes. Processes don't need to be ephemeral, and often aren't. With the right setup, databases can be run from containers, and I believe the rapid evolution of container technology has allowed for this to be possible today.
You're missing the point. It's not that containers cannot be ephemeral. The whole point is that containers are meant, and expected, to be ephemeral by design. Pets vs cattle and all.
But that's not the same as saying containers are only suitable for ephemeral workloads/data. With persistent storage it's not just possible, but I think makes a lot of sense to run something like a database inside a container. My default is to run Postgres etc. in Docker instead of 'apt install postgres', because it's so much easier to get the right Postgres version with Docker than tracking down whatever unofficial apt repo has the Postgres version I'm looking for.
The key is I do this in Docker, with a basic folder bind mount on the exact same kind of machine I would use if I installed Postgres via 'apt'. I don't do this in Kubernetes or use cloud storage volume adapters. Kubernetes brings a lot of complexity and risk into ops, and that's not what you want for a database. Docker is mostly a layered filesystem format plus a cgroups isolation wrapper around native Linux processes. So as long as you're not mixing in some other tool that e.g., adds on network traffic routing or something, it's basically just Linux with convenience features.
Switching from 'apt' to Docker with maybe a basic docker-compose.yml is an underrated infrastructure pattern. Operating Postgres from Docker feels about as simple as apt-installed Postgres, with no downsides I've seen besides needing Docker at all. It's also great for deploying in-house apps; moving workloads from vanilla OS processes with systemd and Ansible or whatever over to plain ol' Docker (or maybe Dokku) simplifies things every time I do it. It's fancy tools that build on top of Docker that make things complicated or risky, but someone rejecting vanilla Docker because they don't want to use K8s feels like throwing the baby out with the bathwater.
No one said that.
> With persistent storage it's not just possible,
There's some confusion on your remarks. A container can be ephemeral even if it has access to persistent data. I mean, think about it: isn't a container still ephemeral even if it has a database connection?
Ephemeral is about internal state, and how recreating a container after deleting it will get it in the exact same state.
Containers make a lot of sense when you're the one developing the software, and you have a lot of micro-services to support, none of which really scale to EC2 levels of utilization.
But EC2s seem better suited to running pre-installed software that can, and will, fully utilize an entire VM. Such as a database.
My intuition on this subject is, docker is the latest, trendy hammer and so everything becomes a nail / container. Maybe people aren't familiar with the tooling around EC2 image creation and deployment.
Version upgrade speed.
With a VM image, the process can be as simple as, yum update, snapshot, then update the configuration to point to the new AMI ID. And if you're using a tool like packer to create AMIs, the process might be to kick off the CI/CD pipeline again.
2017 Followup: https://patrobinson.github.io/2017/12/16/should-i-run-a-data...
Tone softens a bit, but still comes to the conclusion it’s not a good fit. Four years on, I can’t imagine a benefit to running a prod DB out of a container either.
> “non-intrusive platform capabilities” 
to container stacks.
I understand that the idea might be popular among developers as it is easier to just add a database container to your stack rather than dealing with the db admin. I don't know if Netflix ever recommended databases as sidecar containers. But I have seen it in the wild where dev followed the Netflix model. I sometimes hear people arguing that they have to manage containers anyway so it would be less overhead to manage the db as a container as well.
I've seen it work in production for 3+ years - And I don't recall there ever being an issue with the databases being in a container - it's hard to imagine them being anywhere else.
KinD is a project to do just that, although production workloads are an explicit non-goal.
Plenty of usecases are already covered by that. Also, running containers in containers is also a thing.
It's containers all the way down.
Terrible fit for databases, inherently.
I've recently been looking into containerizing my personal server. I use the cheapest VM on AWS (I work on Azure now, but vendor lock in is real). Adding another VM would double my costs, but with containers I would more isolation and can upgrade sites I run on this server independently.
I usually use sqlite, but I can see why containers would be nice for similar reasons. Even if you aren't resource constrained with VMs, running a DB via containers might be nice to ensure a consistent dev setup even if you don't use containers in prod.
Ditto for Nginx, it's also containerized.
It's a great setup b/c I can also deploy the containers locally (and SCP the sqlite DB locally) to run the blogs on my laptop if I need to.
Once you design your database server/application/transactions such they don't care whatsoever if the database server goes offline - you not only get the ability to run fine in containers, you also make your application stack just significantly more robust for all sorts of reasons.
I have colleagues who bemoan a major application server going down - which is a no-op in my world where the ephemeral nodes rarely last more than a day or two in GKE - so we're completely used to them going down.
Building transactions, idempotency, tasks in rabbitmq/celery rather than memory - not only do they make deploying in containers straightforward, they also give an overall robustness win for all applications.
I mean, physical servers go down as well.
I'm familiar with the topics around HA and how you could ignore individual hosts going down, but the whole database?
Except, the reality is it's Frontend->Backend->DB->Storage. To some extent, the DB is just a standardized (and very quick) caching/sorting mechanism in front of an SSD, HDD, network drive, whatever. Storage can be separate and persistent, even if the DB is in a container and gets killed/dies.
If that's the case, then even if the DB is in an ephemeral container the backing storage should always survive. With it, you can bring up a new DB container with all the same data being available.
So yes, your passwords etc are in the "DB", but really they're in storage on disk. Yeah, your app won't work well if the DB disappears, but that's a given no matter whether your DB is in a container or somewhere else. Once it's back up, everything should work fine.
So - the user will see a "Pause" when using the application if the database is down - but that's it.
The great thing about architecting your application stack in such a way that every one of the servers/applications/databases can disappear in a moments notice, is that your application becomes super robust to hardware failure - because having a sever go "down" is Business As Usual in Kubernetes.