I've run extensive benchmarks of Hadoop/HBase in Docker containers, and there is no performance difference. There is no stability difference (oh a node might crash? Welcome to thing which happens every day across a 300 machine cluster).
Any clustered database setup should recover from failed nodes. Any regular relational database should be pretty close to automated failover with replicated backups and an alert email. Containerization doesn't make this better or worse, but it helps a lot with testing and deployment.
That being said we currently use docker for some of our production databases, mainly for almost-idle services (mongodb for graylog, zookeeper for kafka), but I have had no problem using them for some moderately sized services with a couple thousands writes per second on redis/kafka (which is nothing for them).
We're still using non-containerised versions of the databases that needs dedicated bare metal servers mostly because I don't see the risk-benefit being worth it, but I'd love to hear someones war stories about running larger scale databases in docker.
For development, I don't think there's anything better for databases, it beats manual setup, vagrant boxes, and shared development servers by a long shot. I feel that educating everyone on your team in how to use it is well worth the investment. docker-compose makes setting up even a fairly complicated development environment a breeze.
I've been wondering this for a while. I'm sure some of the big players do it, but I'd really like to see a case study from one of them.
> Any regular relational database should be pretty close to automated failover
In my experience, most people who work with mysql would not enable automated failover. And I believe the concerns in the article are valid and important if considering a container for mysql.
Edit: Though I do think it conflates containers and things like kubernetes or mesos in an awkward way. The good arguments are more about running relational dbs in containers on some sort of cluster orchestration system.
At the end of the day, the data and database is run on some production instance that runs on a fully bare hypervisor at TopGuy (TM) cloud provider. This is enough so that everyone feels more or less good about their situation.
Or do you mean that Docker etc. are as reliable as bare metal?
Heck, you can go on Docker's webpage (https://www.docker.com/what-docker). Notice that while they make the claims that Docker is-
1. More lightweight
2. 'Open'/run anywhere
They don't say 'more reliable'.
I use Docker for systems I design; I also recognize current container technologies have their limitations. It is my job to know and avoid these pitfalls.
A lot of issues that people encounter with Docker specifically disappear if you run Kubernetes (such as volume management), simply by ignoring what Docker does and doing something sane instead.
> Or do you mean that Docker etc. are as reliable as bare metal?
This doesn't really mean anything, Docker the product has a lot issues - sure. Container technology in general? No. Where do you draw the line here? Is a chroot 'as reliable as bare metal'? At what point is a container not running on bare metal anymore?
As you said, a lot of problems would disappear if people use Kubernetes instead of Docker. At the same time, a lot of replication problems would disappear if people use PostgreSQL instead of MySQL. My point is, when a novice mixes immature technology with immature technology, he is going to have more issues than what's necessary.
Unless you've gone out of your way, Docker (as well as other Linux container systems) are just namespacing your process. There's no extra abstraction layer, it's just a more restricted execution environment.
Docker is ok when it works, hell when it doesn't, has lots of bugs and regularly regresses. I don't understand why you'd run production infrastructure on that and not on any of the alternatives.
Run like this, there is no disk or network performance difference between running the db process directly on the host or via a Docker container.
As others have mentioned, this is a really poor article.
Because the way to change anything in a container is to kill it and restart it. That's a fundamental difference compared to managing/maintaining a database not in a container.
The fact `docker kill` defaults to using SIGKILL instead of SIGTERM is unfortunate, and something one should be aware of before deploying a process with docker, but again, this does not make the process running within the container inherently less reliable.
edit: Looks like `docker stop` does the right-ish thing -- sends a SIGTERM, then only resorts to SIGKILL after a timeout has expired.
did you publish your findings anywhere?