Hacker News new | comments | ask | show | jobs | submit login

Running production databases in docker last year: https://thehftguy.com/2016/11/01/docker-in-production-an-his...

Performance issues should be the least of your concern. The docker deamon and container simply hanged because of filesystem issues on CentOS 6.

I worked at a company that was dockerizing their stateless services, then planning to dockerize their cassandra databases. Multiple contractors involved.

Stateless services failed periodically because of the above issue. Load balancers can failover automatically, broken nodes are rebooted from time to time, limited impact. Noone cared, just a daily deployment routine.

I fear the day the cassandra dockerization would happen. They'd lose their entire customer data (hundreds of millions of customers) once two nodes would fail simultaneously, which happened a lot on the stateless services.

Thankfully the project never started and the company didn't go bankrupt. Pretty sure employees moved around and plans got canceled.

Expect a lot of instability in docker around filesystem, performance issues and race conditions. Low volume stateless web servers don't get to trigger issues much, but databases do.




I can't possibly hope to change your mind but stability issues with union filesystem driver in docker(part of it was not even docker's problem) and persistent volumes of kubernetes are two very different things. Cassandra running standalone on host(and crashing) is no different from cassandra crashing when running using a PV inside a container.

Moreover - all/most Linux distros have switched to using Overlay2 as default driver. If you are running latest version of RHEL/CentOS/Fedora/Ubuntu that is the driver you will be most likely using.


Don't get me wrong, I know it's not a bug in kubernetes, it's a bug in the filesystem. Kubernetes is as stable as the weakest part and the weakest part is the container engine (docker and underneath).

Containers require volumes/filesystems to run and some implementations are buggy as fuck.

Docker abandoned CentOS 6 many years ago, whether they stated officially or not, the last docker package and kernel/drivers are unstable. Similar story on some other distributions.

It wasn't production-ready at all back then and it's still not a good idea to containerize databases now. Besides bugs that come and go, there are other challenges around lifecycle, performance and permissions that are not trivial to deal with.


>"I can't possibly hope to change your mind but stability issues with union filesystem driver in docker(part of it was not even docker's problem)"

Can you outline what those stability issues are/were? Was the non-Docker part of the problem kernel related? Genuinely curious.


See RHEL and Debian sections: https://thehftguy.com/2017/02/23/docker-in-production-an-upd...

The filesystem drivers are buggy as fuck. You would experience kernel panics on Debian Jessie (overlayFS), or containers + docker daemon hanging on CentOS 6 (devicemapper). The fix in both cases is a reboot.

You might not notice it if you barely used docker, but it can be very outstanding at scale. I've been consulting briefly at a major web company that was deploying their web services to 5-20 nodes, daily. On every service deployment there would be up to 3 nodes dying.


For sure it is a very different thing. Local SSD or remote drive? That means a lot for Cassandra.


Kubernetes supports local volumes. With GKE you get local SSDs.


That doesn't make sense to use GKE for this. Eventually you will just have bunch of VMS that run only your DB (since you need to avoid interference of other workloads) and there are no support for multi DC mode... And what benefits? Restarting SQL or Cassandra is not very cheap operation and can cause large data migrations.


In the Cassandra case, you would not write the persistent data in the Docker image (that's the part of the file system mounted as a layered file system, using AUFS or OverlayFS). Instead, you would write it in a volume. For a local volume, that's just a part of the "normal" file system (Ext4, XFS, ...) exposed to the Docker container through a bind mount.

Volumes are quite stable and reliable when based on a stable file system.

So while you could lose the container due to the but describe, you would not lose the persistent data.

It's best practice not to write to the Docker image at all during runtime (no log files, no PID file, etc), but to write only to volumes or tmpfs mounts. I'm a little bit suspicious about the crashes you described: are you sure you followed that best practice?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: