Hacker News new | past | comments | ask | show | jobs | submit login
Containers vs. Hypervisors: The Battle Has Just Begun (linux.com)
34 points by jonbaer on Aug 30, 2014 | hide | past | web | favorite | 7 comments

Tl;DR: containers aren't (yet?) suitable for protecting host from privilege escalation.

In our case this is just fine... we are using Docker for app distribution (so we don't have to rely on libraries installed on host systems but rather carry our libraries within the container). So we are not replacing VMs with containers, we are replacing applications with containers.

This is at the core of the education issue with containerisation.

By pushing wrappers like LXC and Docker we have been positing containers as a replacement for virtual machines, a "lightweight" VM if you must.

The issue is this has lead to fundamental mis-understanding about how containers work on Linux.

In other communities like BSD and Solaris communities they have understood containerization for some time. Both BSD and Solaris have had network virtualization/namespacing for significantly longer than Linux and the community grew around them utilizing them the way they were intended - as a name spacing technology for processes.

Most people don't understand the underlying concepts that Docker and LXC rely on, this leads to people asking questions like "What is the performance impact of Docker?". If you knew that Docker was basically constructed ontop of the cgroups, namespaces primitives and AUFS or Device Mapper you would probably not ask such a question because the answer would already be obvious.

For those that don't know the overhead of cgroups and namspaces is mostly constant (there are some exceptions but won't go into that), infact all of these systems are compiled and in operational in modern kernels even when the cgroups fs is not mounted. As such the CPU overhead of running a process in Docker should be close to nil.

There are 2 other important factors however, namely network and disk performance. The Docker graph storage system relies on Copy-on-Write (CoW) semantics. There are few filesystems that implement this and none are currently upstream, the fastest and most robust and AUFS and is the Docker default. AUFS does have some overhead, it's by no means large but I wouldn't go as far to say it's negligible either (especially when creating lots of files).

However, if you are running on a system that doesn't have AUFS available you will be using the Docker Device Mapper graph driver. This driver is not actually using a CoW filesystem, it is instead using a CoW block storage system called Device Mapper thinp (Thin Provisioning). This has -many- more performance implications. Namely you can't utilize a block device directly, you need to format it with a filesystem. Docker supports ext4 and XFS. Then when you are adding layers it's creating block level overlays rather than file level overlays like AUFS. The major difference here is the IO path, specifically in the case when you are not using a dedicated DM pool for storage.

ie. When using DM in default loopback mode

mounted layer -> dm-thinp device (funky stuff with thinp metadata device too) -> loopback file -> host filesystem -> host block device

I don't bring any of this up to bash Docker, I am just saying it's important that you know how this stuff is constructed.

Now, all of that can be completely avoided if you use Docker volumes for all IO intensive workloads (and you definitely should do that) as it should have close to no overhead as they are implemented as bind mounts. This is safe due to the way filesystem namespacing works. The isn't quite as good as Solaris zones which uses ZFS functionality to do all of this.

To quickly touch on networking Docker relies on stateful iptables NAT. This has significant performance implications and should be something you also understand if using Docker in production.

You could argue that none of this matters for the way people are using containers right now - and you would be right. But I love this technology and I want to to be used in production and at scale, to do that we need to be educating people on what this technology really is, how it works and other ways you can put these building blocks together rather than just mimic-ing full virtualization.

Docker also has a Btrfs filesystem driver which makes use of Btrfs's snapshots and CoW features. BTRFS has been in the mainline kernel for years, and I have been using it for just about as long.

BTRFS is still considered experimental, but it certainly seems stable enough for many uses.

Yep. I've been using it on my desktop for a long time and my laptop for even longer.

The primary reason for my comment to the OP was to remind him that there are at least three Docker storage systems, and that two of them don't have any substantial performance impact.

Pavlicek's article takes the form of "Docker is a lightweight VM. It is not as secure as hypervisor-based VMs. Hypervisor ecosystems can and are getting more lightweight, and so maybe they'll win the battle."

Framing the discussion this way is of course a false dichotomy as jpgvm ably points out.

However, I would not start out by pointing out the technical issues with the article. I would start by viewing this article is as technical competitive marketing material, rather than as an attempt to have a serious technical discussion about VMs and containers. Russell Pavlicek is the lead Xen technical evangelist.

Could you recommend an independent serious technical article which is written by someone not involved in either the container or hypervisor community?

Edit: Previous comment from Docker maintainer, https://news.ycombinator.com/item?id=7910117

"Hi all, I'm a maintainer of Docker. As others already indicated this doesn't work on 1.0. But it could have. Please remember that at this time, we don't claim Docker out-of-the-box is suitable for containing untrusted programs with root privileges. So if you're thinking "pfew, good thing we upgraded to 1.0 or we were toast", you need to change your underlying configuration now. Add apparmor or selinux containment, map trust groups to separate machines, or ideally don't grant root access to the application. Docker will soon support user namespaces, which is a great additional security layer but also not a silver bullet! When we feel comfortable saying that Docker out-of-the-box can safely contain untrusted uid0 programs, we will say so clearly."

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact