generally though in production, you're not going to be taking down DBs on purpose. If it's not supposed to be ephemeral, it doesn't fit the model
Anecdotally, keeping stateful components outside of K8s makes running your cluster and application so much simpler and it is much easier to maintain and troubleshoot. The burden is increased configuration friction though, so often you don't want to do it for your ephemeral deployments (eg. dev environments, integrated test runners, temporary staging instances).
You can use tools like kustomize to keep your configuration as clean as possible for each deployment type. Only bring in the configurations for the stateful services when needed.
I feel like this is the "right" way for smaller teams to do K8s, assuming it's already a good fit for the application.
Has that changed? (It may well have, but once burned, twice shy and all that).
I've never had a problem with Postgres either in Docker or in k8s. Docker Compose local volumes, and k8s persistent volume claims work really well. But I'm no veteran at this so I can only speak for what little time I've used them.
The whole reason I do this is because it lets you put your entire stack in one config, and then spin up local dev environment or deploy to remote with zero changes. And that's really magical.
In production I don't use an in-cluster Postgres, and it's a bit of a pain in the ass to do so. I would rather use an in-cluster service, but the arguments you hear about being responsible in the event of a failure, and the vendor assuming the work for HA and such seems hard to refute.
Probably you could run Postgres in production k8s and be fine though. If I knew what I was doing I likely wouldn't be against it.
Why oh why did I ever leave silicone...
Here’s a different point to think about: is your use of Postgres resilient to network failures, communication errors or one-off issues? Sometimes you have to design for this at the application layer and assume things will go wrong some of the time...
As with anything, it could vary with your particular workload... but if I knew my very-stable-yet-cloud-hosted copy of Postgres wasn’t configured with high availability, well, you might have local performance and no update lag but you also have a lot of risk of downtime and data loss if it goes down or gets corrupted. The advantage to cloud storage is not having to read in as many WAL logs, and just reconnect the old disk before the instance went down, initialize as if PG had just crashed, and keep going... even regular disks have failures after all...
A container is just a collection of namespaces.
We transition to k8s, with PG and other data stores in cluster, specifically RabbitMQ, and Mongo, which runs surprisingly well in k8s. In any case, after the whole adoption period and a great deal of automation work against the k8s APIs, we were able to get new dev environment provisioning down to 90 seconds.
There was clearly some pent up demand for development resources as we went from a few dev environments to roughly 30 in one month's time.
Following that, the team added the ability to "clone" any environment including production ones, that is, the whole data set and configuration was replicated into a new environment. One could also replicate data streaming into this new environment, essentially having an identical instance of a production service with incoming data.
This was a huge benefit for development and testing and further drove demand for environments. If a customer had a bug or an issue, a developer could fire up a new environment with a fix branch, test the fix on the same data and config, and then commit that back to master making its way into production.
These are the benefits of running data stores governed by one's primary orchestration framework. Sounds less threatening when put that way, eh?