
Containers vs. Hypervisors: The Battle Has Just Begun - jonbaer
http://www.linux.com/news/enterprise/cloud-computing/785769-containers-vs-hypervisors-the-battle-has-just-begun/
======
jpgvm
This is at the core of the education issue with containerisation.

By pushing wrappers like LXC and Docker we have been positing containers as a
replacement for virtual machines, a "lightweight" VM if you must.

The issue is this has lead to fundamental mis-understanding about how
containers work on Linux.

In other communities like BSD and Solaris communities they have understood
containerization for some time. Both BSD and Solaris have had network
virtualization/namespacing for significantly longer than Linux and the
community grew around them utilizing them the way they were intended - as a
name spacing technology for processes.

Most people don't understand the underlying concepts that Docker and LXC rely
on, this leads to people asking questions like "What is the performance impact
of Docker?". If you knew that Docker was basically constructed ontop of the
cgroups, namespaces primitives and AUFS or Device Mapper you would probably
not ask such a question because the answer would already be obvious.

For those that don't know the overhead of cgroups and namspaces is mostly
constant (there are some exceptions but won't go into that), infact all of
these systems are compiled and in operational in modern kernels even when the
cgroups fs is not mounted. As such the CPU overhead of running a process in
Docker should be close to nil.

There are 2 other important factors however, namely network and disk
performance. The Docker graph storage system relies on Copy-on-Write (CoW)
semantics. There are few filesystems that implement this and none are
currently upstream, the fastest and most robust and AUFS and is the Docker
default. AUFS does have some overhead, it's by no means large but I wouldn't
go as far to say it's negligible either (especially when creating lots of
files).

However, if you are running on a system that doesn't have AUFS available you
will be using the Docker Device Mapper graph driver. This driver is not
actually using a CoW filesystem, it is instead using a CoW block storage
system called Device Mapper thinp (Thin Provisioning). This has -many- more
performance implications. Namely you can't utilize a block device directly,
you need to format it with a filesystem. Docker supports ext4 and XFS. Then
when you are adding layers it's creating block level overlays rather than file
level overlays like AUFS. The major difference here is the IO path,
specifically in the case when you are not using a dedicated DM pool for
storage.

ie. When using DM in default loopback mode

mounted layer -> dm-thinp device (funky stuff with thinp metadata device too)
-> loopback file -> host filesystem -> host block device

I don't bring any of this up to bash Docker, I am just saying it's important
that you know how this stuff is constructed.

Now, all of that can be completely avoided if you use Docker volumes for all
IO intensive workloads (and you definitely should do that) as it should have
close to no overhead as they are implemented as bind mounts. This is safe due
to the way filesystem namespacing works. The isn't quite as good as Solaris
zones which uses ZFS functionality to do all of this.

To quickly touch on networking Docker relies on stateful iptables NAT. This
has significant performance implications and should be something you also
understand if using Docker in production.

You could argue that none of this matters for the way people are using
containers right now - and you would be right. But I love this technology and
I want to to be used in production and at scale, to do that we need to be
educating people on what this technology really is, how it works and other
ways you can put these building blocks together rather than just mimic-ing
full virtualization.

~~~
simoncion
Docker also has a Btrfs filesystem driver which makes use of Btrfs's snapshots
and CoW features. BTRFS has been in the mainline kernel for years, and I have
been using it for just about as long.

~~~
gizmo686
BTRFS is still considered experimental, but it certainly seems stable enough
for many uses.

~~~
simoncion
Yep. I've been using it on my desktop for a long time and my laptop for even
longer.

The primary reason for my comment to the OP was to remind him that there are
at least _three_ Docker storage systems, and that _two_ of them don't have any
substantial performance impact.

------
amenod
Tl;DR: containers aren't (yet?) suitable for protecting host from privilege
escalation.

In our case this is just fine... we are using Docker for app distribution (so
we don't have to rely on libraries installed on host systems but rather carry
our libraries within the container). So we are not replacing VMs with
containers, we are replacing _applications_ with containers.

------
MyDogHasFleas
Pavlicek's article takes the form of "Docker is a lightweight VM. It is not as
secure as hypervisor-based VMs. Hypervisor ecosystems can and are getting more
lightweight, and so maybe they'll win the battle."

Framing the discussion this way is of course a false dichotomy as jpgvm ably
points out.

However, I would not start out by pointing out the technical issues with the
article. I would start by viewing this article is as technical competitive
marketing material, rather than as an attempt to have a serious technical
discussion about VMs and containers. Russell Pavlicek is the lead Xen
technical evangelist.

~~~
walterbell
Could you recommend an independent serious technical article which is written
by someone not involved in either the container or hypervisor community?

Edit: Previous comment from Docker maintainer,
[https://news.ycombinator.com/item?id=7910117](https://news.ycombinator.com/item?id=7910117)

 _" Hi all, I'm a maintainer of Docker. As others already indicated this
doesn't work on 1.0. But it could have. Please remember that at this time, we
don't claim Docker out-of-the-box is suitable for containing untrusted
programs with root privileges. So if you're thinking "pfew, good thing we
upgraded to 1.0 or we were toast", you need to change your underlying
configuration now. Add apparmor or selinux containment, map trust groups to
separate machines, or ideally don't grant root access to the application.
Docker will soon support user namespaces, which is a great additional security
layer but also not a silver bullet! When we feel comfortable saying that
Docker out-of-the-box can safely contain untrusted uid0 programs, we will say
so clearly."_

