Use for: Stateless applications (mostly web applications). Ban for: Databases.

derekperkins · on Feb 19, 2017

I know that banning databases from k8s is the popular point of view now, but Google has been running databases in containers for nearly a decade. YouTube runs on top of http://vitess.io/, which is a cloud native approach to MySQL.

Certainly approach with caution, but there's no reason for a blanket dismissal of dbs in containers.

smarterclayton · on Feb 19, 2017

I think it's even simpler:

1. Know what it takes to run a database (including storage, backup, upgrade, lifecycle, failure modes)

2. Know how a containerized cluster manager manages process, storage, lifecycle, and failure modes

If you know both of those, running databases on Kubernetes (can't speak for swarm or mesos) is not hard or surprising, and you can get the benefits of both. If you don't know both of those in detail, you don't have any business running databases in containers.

The intersection of folks who know both is still small. And the risk of problems when you understand only one is still high.

bogomipz · on Feb 19, 2017

>"If you know both of those, running databases on Kubernetes (can't speak for swarm or mesos) is not hard or surprising,"

Are you speaking from experience when you say it is not hard? Could you elaborate on what databases your are currently running on Kubernetes and how they are configured? Also are these production?

If I know number 1 and number 2 does that mean that I automatically understand all of the of the potential failure modes I might experience from combining 1 and 2? I certainly wouldn't think so.

smarterclayton · on Feb 19, 2017

I'm one of the engineers on OpenShift, and there have been different production databases (sql and nosql alike) running on OpenShift in very large companies for almost 2 years now, as well as many databases in staging and test configurations.

Your point about 1/2 is fair, I was trying to convey that Kube follows certain rules w.r.t. process termination, storage, and safety that can be relied on when you internalize them. What's lacking today is the single doc that walks people through the tradeoffs and is easily approachable (although the stateful set docs do a pretty good job of it). In addition, we've made increasing effort at ensuring that behavior is predictable (why StatefulSets exist, and the changes in 1.5 to ensure terminating pods remain even if the node goes down).

Storage continues to be the most important part of stateful apps in general. On AWS/gce/azure you get safe semantics for fencing storage (as long as you don't bend the rules). On metal you'll need a lot more care - the variety of NAS storage comes with lots of tradeoffs, and safe use assumes a level of sophistication that I wouldn't expect unless folks have made an investment in storage infrastructure. I expect that to continue to improve, with things like Ceph and Glusters direct integration, VMWare storage, and NetApp / other serious NFS integration.

And it's always possible to treat nodes like pets on the cloud and leverage their local storage if you have good backups - at scale that can be fairly effective, but when doing one-off DBs using RDS and Aurora and others is hard to beat.

bogomipz · on Feb 20, 2017

I am not very clear on the differences between running Kubernetes via OpenShift vs metal or a cloud provider. I even just looked at the RH page and it still wasn't that clear to me. Can you elaborate? Is there a different story for stateful things like running datastores on K8+OopenShift?

bogomipz · on Feb 19, 2017

>"I know that banning databases from k8s is the popular point of view now, but Google has been running databases in containers for nearly a decade."

But containerizing a workload isn't the same thing as handing it off to a cluster scheduler to manage. Google hasn't been running databases via K8 for nearly a decade. Who knows how Borg handles volume management internally at Google. I realize K8 has foundations in Borg but its still not apple to apples I don't think.

ithkuil · on Feb 19, 2017

FYI: https://en.m.wikipedia.org/wiki/Google_File_System

GFS (now colossus) is not mounted as a legacy volume, but instead is accessed via a userspace library.

user5994461 · on Feb 19, 2017

Google has never run any database in Docker.

They use internal proprietary technology that doesn't have the same characteristics and flaws than Docker.

clebio · on Feb 19, 2017

Parent didn't say Docker, fwiw.

halbritt · on Feb 19, 2017

Uber runs Cassandra in Mesos:

http://highscalability.com/blog/2016/9/28/how-uber-manages-a...

Seems to work pretty well. DCOS has lots of database options.

gaius · on Feb 19, 2017

What about running the DB process(es) in a container mounting non-containerized storage from outside? All the "state" is then external to the container? This is very do-able with Docker and it's where my thoughts are heading for a "best of both worlds".

atombender · on Feb 19, 2017

That's what you do. If you're on GCP or AWS, you typically mount a network volume (Persistent Disk on GCP, EBS on AWS).

Kubernetes ensures that the container always has this volume mounted, and of course only one container at a time can claim the volume for itself.

What you should avoid doing is to use a host mount and pin a pod to run on a specific node, because then that pod can only run on that node, and you have no way of migrating without manually moving the mount and unpinning the pod. With Kubernetes, you really want to avoid thinking about nodes at all. State follows pods around; pods don't follow state around.

sougou · on Feb 19, 2017

You also have the option of using local storage and combine it with a consensus protocol to keep the data distributed. You can actually achieve better durability than mainframes.

Spanner uses cross-datacenter Paxos. Your data won't be lost even if an entire datacenter goes dark.

For Vitess (http://vitess.io), we use semi-sync replication that always ensures that at least one other machine has the data.

gtirloni · on Feb 19, 2017

In K8s that's called "Stateful Sets" (beta in v1.5).

https://kubernetes.io/docs/concepts/abstractions/controllers...

atombender · on Feb 19, 2017

You can solve this without StatefulSets, it's just a manual process, and requries that you (1) don't use a replication controller (edit: rather, you use a controller with "replicas: 1"), and (2) can ensure that a single pod claims the data volume.

StatefulSets are more geared towards apps that manage their own redundancy, such as Cassandra or Aerospike, where adding another instance is a matter of just starting it. One of the things that a StatefulSet permits is to preserve the network identity of a pod. For example, if you wanted deploy Cassandra without StatefulSets, you'd deploy each instance as a separate Deployment + Service pair, called, let's say, cassandra-1, cassandra-2 and so on. You would not be able to use Kubernetes' toolsets to scale the cluster. Each instance would use a persistent volume, so effectively it would be almost exactly like a StatefulSet, except Kubernetes would not be handling the pod replication.

In the case of something like Postgres, you'd probably not get any benefit from using a StatefulSet for the master (since only one instance can run), but you can use a StatefulSet to run read-only replicas.

gtirloni · on Feb 19, 2017

Without Stateful Sets, or a replication controller, doesn't it mean that if my host dies, the DB won't get started anywhere? Who's managing where that pod should spawn next (with its associated storage)?

atombender · on Feb 19, 2017

Right, I simplified a bit there: You would use a replication controller, but it would have "replicas: 1". This way, the pod is rescheduled and the service repointed, and the volume management ensures that the database gets mounted.

gtirloni · on Feb 19, 2017

Got it, thanks!

smarterclayton · on Feb 19, 2017

Note that replicas: 1 does not actually guarantee "at most 1". If you have block storage with locks (AWS/gce/ceph/cinder), then he second replica won't start until the first is gone. If you try to use "replicas: 1" with a shared filesystem you can have 2 pods running against that filesystem at once.

StatefulSets guarantees "at most one"

atombender · on Feb 20, 2017

That's solved by setting "strategy.type" to "Recreate", isn't it? This will disable rolling deploys. The replication controller wouldn't then attempt to have two pods running at the same time.

pgrs · on Feb 19, 2017

Yep. Even works in Docker Swarm (1.12 an up) via Volumes.

ricw · on Feb 19, 2017

From what I can tell this is largely outdated advice. We've been running multiple 800GB postgres instances in docker for over a year now, with not a single problem. Not one.

amq · on Feb 19, 2017

Would be an interesting write-up. How do you handle upgrades? What is the reason for Docker in your case? Are you managing the individual containers manually or using some tool?

user5994461 · on Feb 20, 2017

It is very up-to-date advice.

Just because you were lucky to not experience massive issues doesn't mean they aren't present.

ricw · on Feb 20, 2017

In what way is it up-to-date? From what I can tell the majority of horror stories were related to running "outdated" linux kernels, which ubuntu 16.04 fixed for us (we purposefully only started using docker with the beta version of ubuntu 16.04, hence a year).

In fact, running the postgres instances in isolation has given us far more confidence than if they were run "natively". backing up docker instances is trivially easy in comparison to running native instances, as you already know what data volumes you need to back up. all our instances use exactly the same backup and restoration script. all our instances get rolled into staging using the same script on a daily basis. no failures so far. zero.

would be interested in actual "up-to-date" reasons, other than "docker's engineering department is not dependable", which btw I can emphasise with if you were burned in the past.

user5994461 · on Feb 20, 2017

You're the perfect example of the problem.

The typical dev who thinks running on a beta version of Ubuntu is the norm and calls anything else "outdated".

Yes, docker may be up to your standards.

No, docker is not up the standards of real businesses, who use stable OS and sometimes even paid support for it.