> Kubernetes is designed specifically for stateless applications
Is that actually true? The Borg paper[1] mentions that Google uses it to run stateful things like Bigtable and GFS nodes, and as far as I can tell, Kubernetes has the exact same goals as Borg.[2]
I don't think there's a direct quote of k8s devs saying "this is designed for stateless applications" (although there very well may be), but it's pretty evident if you're following.
Blog post on the official site, about 9 months old, whose title expresses surprise that k8s may be reasonable for stateful applications. The title is "Stateful Applications in Containers!? Kubernetes 1.3 Says “Yes!”" [0]
k8s's "StatefulSets", which emerged in 1.5 (4 months ago, Dec 2016) and were transitioned from the previous feature "PetSets", are still marked a "beta" feature in the newest release (1.6, made a couple of weeks ago). [1]
In Nov. 2016, CoreOS issued a blog post with the subheading "Stateless is Easy, Stateful is Hard" while introducing a new pattern called "operators". Operators are intended to help Kubernetes better handle stateful applications (since the habit of k8s admins is to terminate pods casually). These are complicated to implement and haven't been widely adopted; basically, you register a third-party resource type, tell your operator to listen in for pod deletions, and hijacks these commands to ensure that everything is shut down in the correct order and with ample opportunity to save. [2]
-----
This is all just the k8s level -- let's not forget that the container engine underlying k8s will delete all changes made to the image by default, and has similar weird issues with stateful applications. Accidentally stop the wrong container on the docker cli? You have quite the problem now.
And on top of ALL THAT, let's also not forget that these databases are designed to sit on whole boxes and consume almost all of the memory, have kernel-level parameters tweaked to their liking, and so forth.
Though databases can co-habitate if forced, production deployments almost always have these guys with dedicated hardware, for various reasons related to performance, reliability, and data safety.
Oh, just use Rook, yet another distributed filesystem, you say? They have a nice k8s operator? Please no.
What's the reason to risk what is presumably important data this way? Just so you can say "Hello LITERALLY EVERYTHING runs in k8s now, can I have an award?"
You wanna do it for dev data or something that doesn't matter, go nuts. But please do not do this for something important. The thought that there are people out there doing this to themselves right now, all to gratify their own vanity by being part of the fad despite their obligations to their colleagues and customers, is so depressing.
It's the container shaped VC bonfire at work. Hype new things to drive adoption
K8S and Docker swarm/compose seem to be trying to replace VMs completely as a general purpose compute abstraction. Google certainly uses them that way, though they have a unique situation arguably. Mesos is doing it too but in a somewhat different way.
Given that, I'd say one of the main features k8s or container platform vendors promote is stateful workloads even though they're not baked at all. most stateless cloud platforms like Heroku or cloud foundry have tended to shy away from that for good reason. But I can't count how many unrealistic and unreliable "run this distributed database... on a single docker host! Slowly!" examples there are out there. "12 factor apps" is now used as a disparaging term for "you can't even persist, bro" rather than something to aspire to, which to me seems seriously misguided.
Disclaimer, I work for a software company that competes in this crazy cloud world
Hey man, I get that you need something NOW and I am sorry about that, but I have to say this is a teeeeensy bit over the top.
Yeah, StatefulSet is still beta. We're getting miles on it before we tell people that we 100% back it. But you know what? People ARE using it. In production. With real data. And they mostly are just fine.
I run a (tiny) database against k8s. I trust it with my own data.
What you say "the container engine underlying k8s will delete all changes made to the image" is true - don't write files straight to your image FS! This is containers 101. We have PersistentVolumes for this very reason. Data that has a lifetime of its own.
Does this absolve you from backups? No. Does it mean you don't need to think about upgrades? Hell no, you still have to know what your apps are up to.
Nobody is making people use containers for databases, but the power of systems like Kubernetes is pretty addictive, and a lot of people are pouring a lot of energy into this problem.
I understand that k8s is experimental and young -- and IMO that makes it exactly the wrong kind of thing to be running production workloads, Google association notwithstanding. None of this is meant to assault or attack Kubernetes for what it is, it's meant to highlight the absurdity of how it's being used and promoted.
>What you say "the container engine underlying k8s will delete all changes made to the image" is true - don't write files straight to your image FS! This is containers 101. We have PersistentVolumes for this very reason. Data that has a lifetime of its own.
Yeah, I'm not disputing this. It's just the most immediate and shocking example of how Docker can be tricky for stateful apps. On most systems, if your program is writing something to the filesystem, it's expected to persist. Any potentiality for lost files/data is normally treated as opt-in (writing to a temp folder). Are you sure you got every nook and cranny where your program expected to read/write from disk, and set all of your symlinks up right, etc.? Why deal with this?
>Does this absolve you from backups? No. Does it mean you don't need to think about upgrades? Hell no, you still have to know what your apps are up to.
The problem with every buzzword or hyped-up piece of software is that everyone just assumes it has magical powers. They have to get some mileage on it to realize that while it may offer some improvements, we still live in the real world.
I know the Kubernetes authors are aware of this, but I wish more Kubernetes users were.
>Nobody is making people use containers for databases, but the power of systems like Kubernetes is pretty addictive, and a lot of people are pouring a lot of energy into this problem.
But why? Kubernetes is cool but it doesn't seem any more "addictive" than what I had before, which effectively did the same thing: a script that spun up an instance from an image, connected to it with Ansible, automatically provisioned everything, and turned it on, and similar scripts that allowed me to view the state of my other instances and apply transformations on them. You can argue that they're different scripts and k8s is one program, but it's really just a difference in invocation.
Obviously I know that Kubernetes operates on containers and not VM instances, but in terms of how it affects our daily lives, k8s is, more or less, just another interface into management/automation technology that we've had since virtualization went mainstream.
What are the truly innovative or unique things it brings to the table? It mostly seems to consolidate these management things into one binary, which, don't get me wrong, is a totally fine thing to do. But it doesn't sound like it would give one "addictive powers".
I don't understand why someone wants to take a square peg and pound it through a round hole. Database servers are designed for dedicated machines. They want all the RAM, they want all the CPU, they want kernel parameters tuned to their liking, sometimes even kernel versions (which you can't replace on a container). On production, if you have any significant amount of traffic, you want to give the database what it wants.
So what practical value do I get by putting PgSQL in a container and/or in Kubernetes? It just sounds like a massive headache for no real benefit.
If they're expressing surprise at being able to run stateful applications, developing complex new patterns, and dispensing multiple iterations of experimental features to provide for this featureset, doesn't that show that it wasn't designed for it?
>But your previous comment implied that Kubernetes was designed inherently for stateless tasks only.
I would argue that there are things that make conventional databases an inherently bad fit for containerization, but I didn't say k8s was designed specifically to preclude the possibility of ever running any type of stateful app.
"Omega, an offspring of Borg stored the state of the cluster in a centralized Paxos-based transaction-oriented store that was accessed by the different parts of the cluster control plane (such as schedulers), using optimistic concurrency control to handle the occasional conflicts"
Seems like Google never stored state within the containers.
Is that actually true? The Borg paper[1] mentions that Google uses it to run stateful things like Bigtable and GFS nodes, and as far as I can tell, Kubernetes has the exact same goals as Borg.[2]
I would love to see a source for your claim.
[1] https://pdos.csail.mit.edu/6.824/papers/borg.pdf
[2] http://queue.acm.org/detail.cfm?id=2898444