My initial comment below is a detailed response to the "Docker is terrible for performance" camp.
TLDR: "read the Docker manual before trying to benchmark it".
It looks like the guy didn't bother to mark /var/lib/postgres as a volume, which is the recommended way of running a database in Docker. It can be fixed with the following line in his Dockerfile:
Read the docs, fix your configuration, and try again. Hint: the IO overhead will be negligible. Also: it will end up being a benchmark of your underlying host storage, not of Docker.
What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your underlying hardware.
As Solomon has already pointed out as soon as you use a bind mounted volume in Docker parlance you are simply benchmarking your host Linux kernel + host filesystem + host hardware, nothing more. I am unsure of whether Docker is now configuring the cgroups cfq or blkio controllers yet but that could also play a part.
The TLDR is this: Docker doesn't have an impact on performance because it's just glue code, none of Docker itself sits in any of the hot paths. You are better off benchmarking the individual components in the actual hot path. Also worth nothing that compute benchmarks will be absolutely worthless as they -will- be 1:1 native performance (because there are actually no additional moving parts).
Blog posts like this one don't start a discussion, don't help anyone, they just get used in random FUD discussions from any and all points of view.
Whether it's accurate or not, people associate Docker with the functionality if facilitates. (i.e., cgroups, namespaces, etc.) so i think it's valid to show the performance impact of running an application from within docker.
Based on my data, they're not 1:1 native performance. I suggest you consider the data before dismissing it.
I think you're conflating his separate points:
"What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your underlying hardware."
"Also worth nothing that compute benchmarks will be absolutely worthless as they -will- be 1:1 native performance"
I'm the author of the blog post.
I started the container with --volume in the case of "docker with no virtual I/O"
Which is the same as setting VOLUME in your Dockerfile.
I've updated the blog post to specify that since it was obviously unclear.
My point remains that Docker benchmarks are especially easy to spin any way you want, since usually they are really benchmarks of the underlying system configuration (with some parts glued together by Docker, but most parts outside of its control).
The numbers showing here are about what I would expect if something in the storage stack is doing more caching than normal. This likely has hidden costs, either in memory usage or robustness. It's worth digging down to find out what is happening, and how to validate docker configurations.
Currently I build containers with supervisord as pid 1 and logrotate running inside the container,but with logs being saved to a bind mounted volume.
Is this correct?
P.S. there are no docs, blog posts or articles on this topic. I'm a little puzzled if people are living with ephemeral logs.
It seems that the recently recommended setup is to create a container specifically to host volumes for both logs and other persistent data (ie database files). You then connect each container that needs to write to those volumes using the volumes-from directive. This is explained in blog posts and included in the documentation.
This stackoverflow has links to blogs and docs:
Other approaches are to have the app(s) in container log to a VOLUME that is bind-mounted in from the underlying host (where you can access them directly) - yet another approach is to bind mount in syslog or other tools into the container and allow the process(s) inside the container to log to it. All work well.
Another option is syslog, which Postgres also supports, with which you would log to the host's syslog daemon. (You can mount the host's /dev/log if you don't want to deal with setting up the networking.)
Currently, I am bind mounting /dev/log which means syslog is logging to the host system - but again, the same issues.
Also, processes in containers exist as processes on the host (they have different PIDs inside and outside of the container due to PID namespacing); so logrotate should be able to send signals to containerized processes.
I have rsyslog on the host and I'm sending logs to host syslog through bind mounting of /dev/log . Is this what you meant by sending out logs to host ? I'm having a lot of trouble figuring out how to do it any other way.
Could you also confirm that rsyslog on host will be able to send appropriate signals to the container's syslog ... which forwards it to the containerized process ? I'm unable to find documentation in rsyslog (or syslog-ng) that talks about this behavior so I'm not sure.
Sighup is needed when processes (such as Postgres) write directly to log files. These files are open, and when you want to rotate the file, you have to tell the process to close the files so that it will start writing to a new file.
If you tell Postgres to log to rsyslog (or any other syslog daemon), the log data will be sent to rsyslog via UDP, TCP or a Unix socket. Postgres itself will not have any files open, so there is no need to sighup it.
You will have to sighup rsyslog in the host system, though.
Not by default:
# docker run -i -t ubuntu bash
root@48dbef418f4c:/# echo foo > /bar
# docker restart 48dbef418f4c
# docker attach 48dbef418f4c
root@48dbef418f4c:/# cat /bar
The almost 50% more context switches for normal Postgres is very telling. If those had any disk implication which is quite possible with the 512Mb of RAM, it could easily explain the discrepancy.
This benchmark is useless.
It's also the smallest instance type. At least if you had chosen the largest you'd possibly have the entire box. With the smallest instance type however anyone with a larger instance than you is going to steal cycles and I/O away the second they have any load.
What makes people think benchmarking on virtualized hardware is at all worthwhile? (Serious question) It's like writing a blog post about which way the wind was blowing for the last five minutes. This is all nonsense.
Also, this test wasn't meant to be scientific. I'm not selling docker, postgres or digital ocean, just satisfying a personal curiosity.
Benchmarks are touchy things in general, I understand that.
I published the results and I'd love to see someone publish conflicting results, or verify them. I've tried to be as transparent as possible.
It is not nonsense because people really run their applications (e.g. PostgreSQL) in virtualized environment these days (i.e. AWS, DO, Linode, Google Cloud etc.) People still need to estimate the performance they will have under these situations - and these benchmarks are relevant to them.
Benchmarking is worthwile. Blogging about any perceived objective results is indeed questionable.
The 3 docker conditions where
--volume /.../ --net=host
I'd say it's much more likely the pre-configured docker digital ocean makes available is on hardware that isn't as heavily used as a regular old postgress configuration.
That is, there's more load in general on the postgress machine rather than the docker machine, you don't get perfect isolation in those environments.
I would see it possible that the pre-configured Digital Ocean machine favors docker processes over non-docker somehow.
First of all. Docker is -not- virtualisation. It's -containerization-. Effectively namespacing and little else.
In Linux namespaces are hierarchal, I can arrange them in a tree. Where at the top namespace A can have 2 sub-namespaces B and C. A can see the process table of B and C, but B and C can only see their respective process tables.
This is called pid name spacing.
Full containerisation on Linux actually requires a bunch of different namespacing, firstly the pid namespaces above but also uids, network (yes, in Docker your container has a fully namespaced network stack in including it's own (multiple!) route tables and adapters) and also device/filesystem etc.
The key take away here is NOTHING is being virtualised, we are simply only showing a section of the system to each process depending on it's position in the namespace hierarchy. What this means is Docker (well really Linux.. Docker is just glue that calls this functionality up) is just.. Linux. Nothing fancy and definitely nothing KVM/Xen/VMWare would be able to influence.
The droplet itself is virtualized.