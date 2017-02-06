reply
Additionally, Docker is pretty handy when you're attempting to manage clusters consisting of thousands of nodes. In that instance enforcing best practices, automating workflows, scaling teams, auditing and preventing configuration drift are much bigger problems than a single server failing.
> I’ve seen DBMS containers running on the same host with service layer containers. But these service layers are not compatible according to hardware requirements.
Some things are of "fixed" size, plus temporary workload-dependent growth (like application-layer processes, most of the time.) Some things take up all the space available to them (like DBMSes.) The latter are what resource quotas are for.
Containers are not meant to be treated like "Unix binaries but more easy to deploy." Containers are just lightweight VMs that don't have to do screwy things with memory balloon drivers to efficiently pack many of those "fixed plus temp growth" workloads onto a host. But containers also allow for the "all the space available" use-case with quotas, which effectively turn containers into regular VMs. (They even cost similarly; quota tracking is expensive!)
> Putting your database inside the container, you’re going to waste your project’s budget. Why? Because you’re putting a lot of extra resources to the single instance. And it’s going out of control. In cloud case you have to launch the instance with 64GB memory when you need a 34. In practice some of this resources will stay unused.
If you're designing "instances" and running dedicated workloads on them, you're very likely "doing containers wrong." (This is probably a provocative statement; stay with me.)
Containers are to container hosts as VMs are to hypervisors: in both cases, their architecture assumes that if you want resource-efficient deployment, you've got a big generic cluster of hosts, and your guests are loaded onto them using a bin-packing algorithm (taking into account which guests need what extra resources that are only available on certain hosts, etc.)
If you don't have a big generic cluster of hosts, then your only packing options will be necessarily sub-optimal. If your container hosts are real hardware, you're out of luck; if your container hosts are themselves VMs, running on some cloud provider, then costs will be heavily in favor of taking advantage of the cloud-provider's bin-packing by wrapping each of your containers in a separate VM and then deploying those VMs.
(Which is, coincidentally, what Amazon's Elastic Beanstalk does for you, and why it's not the same as Amazon ECS. ECS is for setting up your own "big generic cluster" of container hosts to bin-pack across; Elastic Beanstalk is for wrapping containers in VMs so that AWS will bin-pack at their abstraction level.)
Is that actually the case? Is there a serious risk that a database will be corrupted by a container crash, as the article claims? A regular crash of the computer should not be able to corrupt a database, is a container more dangerous in this regard?
I have run database clusters on kubernetes in production without running into this particular problem.
The current state of container orchestrators for running databases is not optimal because one size does not fit all database types like with stateless applications.
One solution for this problem are coreos operators which introduce third party resources into kubernetes that are specific to the database type and contain logic to manage this specific database type on kubernetes.
Corruption occurs on data drives even without docker - you still have to plan for it. This is why you enable replication. This is why you snapshot/backup your data daily and have disaster recovery plans.
There are some major reasons why I actually think running databases in docker containers, even if you are mounting a volume for the data.
1) Development environments can be similar to production. Ensures everyone runs the same version that is running in prod.
2) You don't have to worry as much about what is installed on the host machine.
3) In a clustered setup, it's easier to ensure each node is running the same configuration, version, etc...
One of my issues with all the gripes about docker are the assertions that it causes issues. In all of my time of using docker, 99% of the time when there is an issue it has nothing to do with docker itself. Everyone loves to blame it when things go wrong though.
This article doesn't really back up any of the claims about any of its issues. It just makes blanket statements without backing them up. Don't like docker's networking? Use host networking then.
What people don't think about is the countless issues that will never come up when using containerization. I never have to worry about whether or not python 2.7 is installed on a server that I'm going to deploy a python 3 app on. I also have MUCH higher confidence that if things work on my local development env (which runs the same containers), then there is a high chance it will work in production.
YMMV
However, if you are going to be running docker on real tin (because thats where the value/speed comes in, if you're on AWS thats a whole 'nother issue) Then you might as well use device mapper for what it was originally designed for: mapping fibre channel. (or iscsi, or SAS [another scsi])
That is assuming you want speed, and have paid enough cash to overcome SPoF in your storage layer (it'll be cheaper and faster than trying to software your way out of it.)
reply