
PostgreSQL Performance on Docker - porker
http://www.davidmkerr.com/2014/06/postgresql-performance-on-docker.html
======
shykes
EDIT: incredibly, I'm seeing people use this benchmark to argue both 1) that
Docker is bad for performance and 2) that Docker is magically faster (and
probably does something stupid to be faster). Talk about an illustration of
what's wrong with benchmarks! Neither of these statements are correct. It's
possible to configure Docker for negligible performance overhead. Docker
definitely does not do magic tricks to make things faster. Mostly it sets up
the storage backend of your choice and then gets out of the way. Whatever
you're seeing is probably a result of your particular host configuration
combined with your particular storage backend configuration.

My initial comment below is a detailed response to the "Docker is terrible for
performance" camp.

TLDR: "read the Docker manual before trying to benchmark it".

It looks like the guy didn't bother to mark /var/lib/postgres as a volume,
which is the recommended way of running a database in Docker. It can be fixed
with the following line in his Dockerfile:

    
    
        VOLUME /var/lib/postgresql
    

This will make sure his persistent (and IO-critical) data is bind-mounted from
the underlying host filesystem, instead of depending on Docker's copy-on-write
backend (which could be btrfs, aufs or lvm/devmapper depending on his
configuration).

Read the docs, fix your configuration, and try again. Hint: the IO overhead
will be negligible. Also: it will end up being a benchmark of your underlying
host storage, not of Docker.

~~~
jpgvm
It's worth mentioning when you benchmark Docker you are never really
benchmarking Docker unless what you are doing is measuring setup/teardown of
containers.

What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your
underlying hardware.

As Solomon has already pointed out as soon as you use a bind mounted volume in
Docker parlance you are simply benchmarking your host Linux kernel + host
filesystem + host hardware, nothing more. I am unsure of whether Docker is now
configuring the cgroups cfq or blkio controllers yet but that could also play
a part.

The TLDR is this: Docker doesn't have an impact on performance because it's
just glue code, none of Docker itself sits in any of the hot paths. You are
better off benchmarking the individual components in the actual hot path. Also
worth nothing that compute benchmarks will be absolutely worthless as they
-will- be 1:1 native performance (because there are actually no additional
moving parts).

~~~
kerr23
Well, Docker is big news and people are interested in what the performance
impact of "using docker" is.

Whether it's accurate or not, people associate Docker with the functionality
if facilitates. (i.e., cgroups, namespaces, etc.) so i think it's valid to
show the performance impact of running an application from within docker.

Based on my data, they're not 1:1 native performance. I suggest you consider
the data before dismissing it.

~~~
coolj
> Based on my data, they're not 1:1 native performance. I suggest you consider
> the data before dismissing it.

I think you're conflating his separate points:

"What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your
underlying hardware."

"Also worth nothing that compute benchmarks will be absolutely worthless as
they -will- be 1:1 native performance"

------
casca
This kind of benchmarking should really be run on physical hardware where you
know that the underlying resources are not being switched out. The repeated
running somewhat mitigates this, but when you don't control the hardware,
decisions are being made that are beyond your visibility.

The almost 50% more context switches for normal Postgres is very telling. If
those had any disk implication which is quite possible with the 512Mb of RAM,
it could easily explain the discrepancy.

~~~
antocv
Indeed, doing benchmarking on an environment you dont control - which probably
runs other stuff by the side.

This benchmark is useless.

------
tbrock
You are running a benchmark in a virtualized environment with no guarantees
about access to the underlying hardware from one moment to the next.

It's also the smallest instance type. At least if you had chosen the largest
you'd possibly have the entire box. With the smallest instance type however
anyone with a larger instance than you is going to steal cycles and I/O away
the second they have any load.

What makes people think benchmarking on virtualized hardware is at all
worthwhile? (Serious question) It's like writing a blog post about which way
the wind was blowing for the last five minutes. This is all nonsense.

~~~
kerr23
I ran the benchmark multiple times and I got very consistent results. I know
that doesn't eliminate the VM factor, but it certainly minimizes it.

Also, this test wasn't meant to be scientific. I'm not selling docker,
postgres or digital ocean, just satisfying a personal curiosity.

Benchmarks are touchy things in general, I understand that.

I published the results and I'd love to see someone publish conflicting
results, or verify them. I've tried to be as transparent as possible.

------
themgt
If this is the Dockerfile [1] and he's not mounting a volume in, then he's
running the Postgres data dir on a btrfs snapshot vs. probably ext4 w/o Docker

[1]:
[https://bitbucket.org/davidkerr/docker/src/c23b2040321b65d38...](https://bitbucket.org/davidkerr/docker/src/c23b2040321b65d38665ede901b1be4edb1318c3/postgresql/9.3/ubuntu/Dockerfile?at=master)

~~~
kerr23
Mentioned above that i started the container with --volume.

The 3 docker conditions where No flags, \--volume /.../, \--volume /.../
\--net=host

------
felixgallo
I pray to god nobody uses docker and postgres or any other database until this
question is resolved. The most rational explanation for why postgres would be
'faster' under docker virtualization than it is raw-on-the-operating-system
would be that libcontainer/docker are skipping over some fundamental atomicity
guarantee.

~~~
jfoutz
I had a somewhat snarky answer earlier.

I'd say it's much more likely the pre-configured docker digital ocean makes
available is on hardware that isn't as heavily used as a regular old postgress
configuration.

That is, there's more load in general on the postgress machine rather than the
docker machine, you don't get perfect isolation in those environments.

~~~
kerr23
I ran all of the benchmarks on the same machine.

I would see it possible that the pre-configured Digital Ocean machine favors
docker processes over non-docker somehow.

------
ForHackernews
If this is running in a DigitalOcean droplet, isn't it already running inside
a virtualization layer? How valid are benchmarks in this circumstance?

~~~
chris_mahan
Could it be that the virtualization layer does something with docker?

~~~
jpgvm
No.

First of all. Docker is -not- virtualisation. It's -containerization-.
Effectively namespacing and little else.

In Linux namespaces are hierarchal, I can arrange them in a tree. Where at the
top namespace A can have 2 sub-namespaces B and C. A can see the process table
of B and C, but B and C can only see their respective process tables. This is
called pid name spacing.

Full containerisation on Linux actually requires a bunch of different
namespacing, firstly the pid namespaces above but also uids, network (yes, in
Docker your container has a fully namespaced network stack in including it's
own (multiple!) route tables and adapters) and also device/filesystem etc.

The key take away here is NOTHING is being virtualised, we are simply only
showing a section of the system to each process depending on it's position in
the namespace hierarchy. What this means is Docker (well really Linux.. Docker
is just glue that calls this functionality up) is just.. Linux. Nothing fancy
and definitely nothing KVM/Xen/VMWare would be able to influence.

~~~
ticklemyelmo
> The key take away here is NOTHING is being virtualised

The droplet itself is virtualized.

~~~
jpgvm
Yes I realize this, if you read my comment I was addressing that KVM couldn't
have interfered with Docker if it tried.

------
keypusher
Until shown otherwise, I am partial to the idea that this is just a difference
between PG on Ubuntu vs PG on CentOS. It's possible docker is simply a red
herring, and that there are too many confounding variables for any of these
benchmarks to be meaningful. I would be interested to see more rigorous
benchmarks of virtualized environments though.

~~~
kerr23
I did a control benchmark between PG Ubuntu vs PG on CentOS, the results are
in the post.

------
scoopr
Just as an uninformed guess, wouldn't docker effectively be just moving the
postgres process to another cgroup task group, where it might be scheduled
more fairly when the benchmark process is hogging the system?

