Hacker News new | comments | show | ask | jobs | submit login

This article is terrible. It's a lot of wishy-washy explanations devoid of technical detail - because their isn't a technical explanation or justification for this list.

I've run extensive benchmarks of Hadoop/HBase in Docker containers, and there is no performance difference. There is no stability difference (oh a node might crash? Welcome to thing which happens every day across a 300 machine cluster).

Any clustered database setup should recover from failed nodes. Any regular relational database should be pretty close to automated failover with replicated backups and an alert email. Containerization doesn't make this better or worse, but it helps a lot with testing and deployment.




While I agree with you I'd like to caution some users about rushing into dockerising everything in their production environment. If your environment setup is not repeatable and you don't have your configuration management under control then you have other problems and using docker is just going to add another layer of abstraction on your mess that your DBA doesn't know how to deal with when things hit the fan. In particular I can imagine improper understanding of docker volumes could bite some people, but they also have some questionable defaults for networking (user land proxy, rewriting iptables)

That being said we currently use docker for some of our production databases, mainly for almost-idle services (mongodb for graylog, zookeeper for kafka), but I have had no problem using them for some moderately sized services with a couple thousands writes per second on redis/kafka (which is nothing for them).

We're still using non-containerised versions of the databases that needs dedicated bare metal servers mostly because I don't see the risk-benefit being worth it, but I'd love to hear someones war stories about running larger scale databases in docker.

For development, I don't think there's anything better for databases, it beats manual setup, vagrant boxes, and shared development servers by a long shot. I feel that educating everyone on your team in how to use it is well worth the investment. docker-compose makes setting up even a fairly complicated development environment a breeze.


Yeah volumes skip unionfs. This article is full of FUD. The author demonstrates they don't really have enough experience to make these claims. I wonder if google has database nodes in containers? Kubernetes is adding the features for containers now. I think it is stable now.


> I wonder if google has database nodes in containers?

I've been wondering this for a while. I'm sure some of the big players do it, but I'd really like to see a case study from one of them.


YouTube runs MySQL on Borg too, and they open sourced their management solution. - http://vitess.io


Google runs MySQL on Borg internally.


Same for Bigtable and Spanner.


I assume these guys have their own network controller and kickass optic fiber links. Network attached storage in poor cloud environments leads to issues.


Yes stateful sets became beta in k8s 1.5 - it's very much a win to run your test suite against a recent (within seconds) production database container that was spun up by CI. Yes you can do this with VMs but that would take 30 seconds :-)


> I wonder if google has database nodes in containers?

Yes


Kubernetes


Google doesn't use Kubernetes internally (except for their google cloud hosted Kubernetes offering - GKE)


Amazon has fully managed database containers: https://aws.amazon.com/rds/


They aren't containers, these are VMs.


Hadoop and HBase are very different from mysql. Those run on yarn and are designed for containers.

> Any regular relational database should be pretty close to automated failover

In my experience, most people who work with mysql would not enable automated failover. And I believe the concerns in the article are valid and important if considering a container for mysql.

Edit: Though I do think it conflates containers and things like kubernetes or mesos in an awkward way. The good arguments are more about running relational dbs in containers on some sort of cluster orchestration system.


I've run Oracle in a container, ugh what a pig, though it can be done. It's great for development, since you can checkpoint state, pass it around and have 99 containers of bugs on the ground.

At the end of the day, the data and database is run on some production instance that runs on a fully bare hypervisor at TopGuy (TM) cloud provider. This is enough so that everyone feels more or less good about their situation.


> I've run Oracle in a container

Danger, danger

https://www.theregister.co.uk/2016/02/24/oracle_vmware_licen...


For me, alarm bells were already ringing at "I've run Oracle" ;)



I'm going to have to agree with the other reply. One hand doesn't know what the other is doing and the team that does audits to prop up revenue isn't going to care about that blog post unless it has legal language in the license that allows you to bypass licensing restrictions. That said, there's free versions of Oracle's databases for development and they may have exceptions for development purposes so if that's what someone is using containers for it might not be the end of the world.


Yep, licensed Oracle instance for production and docker for developers. If you're the sort to run Oracle, the expectation is that you're going to be paying.


Just because they write a blog on something does not mean they won't sue you for using it (and then offer to drop the suite if you subscribe to buying a cloud license)


Are you telling me that you'd trust MySQL auto replication failover enough to have it activates multiple times per day (even with Percona)? On a busy cluster with, say, 300GB?


Seems like a straw-man. Why is a process running within a container failing over more frequently than a process running directly on bare metal -- seems like more of a resource/process scheduling issue than anything to do with containers.


Leaving aside there's now yet another abstraction layer to have bugs, it is not a straw-man to imply that current container technology is not as reliable as bare metal.

Or do you mean that Docker etc. are as reliable as bare metal?

Heck, you can go on Docker's webpage (https://www.docker.com/what-docker). Notice that while they make the claims that Docker is-

1. More lightweight

2. 'Open'/run anywhere

3. Secure

They don't say 'more reliable'.

I use Docker for systems I design; I also recognize current container technologies have their limitations. It is my job to know and avoid these pitfalls.


This conversation should distinguish between Docker as a product and container technology in general.

A lot of issues that people encounter with Docker specifically disappear if you run Kubernetes (such as volume management), simply by ignoring what Docker does and doing something sane instead.

> Or do you mean that Docker etc. are as reliable as bare metal?

This doesn't really mean anything, Docker the product has a lot issues - sure. Container technology in general? No. Where do you draw the line here? Is a chroot 'as reliable as bare metal'? At what point is a container not running on bare metal anymore?


Of course Namespace in Linux kernel is very mature. But if that's all people use, then there wouldn't be a need for Docker and its extra features- people would still be using LXC (no disrespect to LXC). People have to evaluate the entire software as a whole, instead of just looking at the core technology. I personally feel that Docker is still unproven in terms of maturity. Stateless? Hell yes. Stateful app? Well...

As you said, a lot of problems would disappear if people use Kubernetes instead of Docker. At the same time, a lot of replication problems would disappear if people use PostgreSQL instead of MySQL. My point is, when a novice mixes immature technology with immature technology, he is going to have more issues than what's necessary.


At no time was the claim made that containers were more reliable than any other method of running a process, only that running a process within a container is not inherently less reliable than un-contained.

Unless you've gone out of your way, Docker (as well as other Linux container systems) are just namespacing your process. There's no extra abstraction layer, it's just a more restricted execution environment.


The numerous performance and stability issues (not just in the containers, but also affecting the main cgroup / "host" rather badly) I had with Docker, but never with LXC, which, according to you would be pretty much the same thing - "just namespacing your process". But it isn't.

Docker is ok when it works, hell when it doesn't, has lots of bugs and regularly regresses. I don't understand why you'd run production infrastructure on that and not on any of the alternatives.


FYI: The networks and the disks are entirely abstracted, with multiple extremely complex abstraction layers.


Not necessarily. You could use --net=host and -v /db_data:/db_data (equivalent of `sudo mount --bind /db_data /the/containers/root/fs/dir/db_data`)

Run like this, there is no disk or network performance difference between running the db process directly on the host or via a Docker container.

As others have mentioned, this is a really poor article.


Volume mounts inside containers bypass unionfs. There isn't anything inherently different about using a partition from container or from a host.


> Why is a process running within a container failing over more frequently than a process running directly on bare metal

Because the way to change anything in a container is to kill it and restart it. That's a fundamental difference compared to managing/maintaining a database not in a container.


Unless you've written very poorly behaving software, you kill it by sending it a SIGTERM, and waiting for it to exit. This is true of software both within and outside of containers.

The fact `docker kill` defaults to using SIGKILL instead of SIGTERM is unfortunate, and something one should be aware of before deploying a process with docker, but again, this does not make the process running within the container inherently less reliable.

edit: Looks like `docker stop` does the right-ish thing -- sends a SIGTERM, then only resorts to SIGKILL after a timeout has expired.


Also worth noting that the timeout is a configurable parameter with `docker stop`.


You don't have operate your container like that. If you need to push configuration to it, you can make it writeable.


Hi XorNot,

did you publish your findings anywhere?


Sadly no (and I'd have to clear it with my employer). We're going to be doing some much larger scale testing in the next few months (going from 4-5 nodes to 18) in preparation for the docker rollout on said cluster.


Exactly, and thank you.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: