
Containers and Docker: how secure are they? - jpetazzo
http://blog.docker.io/2013/08/containers-docker-how-secure-are-they/
======
contingencies
I support docker in its efforts. However, docker is too cute, too hyped, and
too rapidly developed to trust with your security as yet. Quite frankly, you
have to understand a bit more than how to call an API to have faith in your
infrastructure's inherent security.

For example, in this article the author links to the 'list of dropped
capabilities in the Docker code'. As it happens, I wrote that list quite some
time ago, and wrote it for _lxc-gentoo_ , a guest-generation script for raw
LXC against an earlier kernel version with an earlier LXC userspace. Not only
is the list now out of date, it's no longer using the preferred approach. Why
is this? Instead of explicit drop ("allow all, deny some") after some months
of raising the issue one of the LXC devs finally added the 'lxc.keep' (ie.
"deny all, allow some") which is architecturally more secure against things
like kernel upgrades which add or modify kernel capabilities.

Furthermore, the docker people only included this when I added
[https://github.com/dotcloud/docker/commits/v0.5.0/lxc_templa...](https://github.com/dotcloud/docker/commits/v0.5.0/lxc_template.go?author=globalcitizen)
... things as important as _WARNING: procfs is a known attack vector and
should probably be disabled if your userspace allows it. eg. see_
[http://blog.zx2c4.com/749](http://blog.zx2c4.com/749) and _WARNING: sysfs is
a known attack vector and should probably be disabled if your userspace allows
it. eg. see_ [http://bit.ly/T9CkqJ](http://bit.ly/T9CkqJ)

Again, I fully support docker's efforts but the article is ... misleading at
best.

~~~
shykes
Hi, docker maintainer here.

There's a reason we keep saying docker is not yet production-ready.

Right now our focus is on usability and stabilizing the management API to make
deployment-centric deployment awesome. You can be sure that before we tell
anyone that they can use docker to sandbox untrusted code in a shared
environment (which by the way is not the only use case of docker) we will be
locking down our default lxc configuration and doing a sweep of all pending
security issues.

For the record, we (dotCloud) have tens of thousands of lxc containers
currently running untrusted code _in production_ on shared infrastructure, and
have had to monitor and maintain them 24/7 for several years. Before that we
ran openvz. And before that, we ran vserver. So while docker itself may not
yet be ready for production (and indeed we don't use it in production at
dotcloud either), you don't need to worry about our stance on security. We
care about it just as much as you do.

~~~
duskwuff
Do you allow any of that untrusted code to run AS ROOT within a container,
though? (If so: what capabilities do you allow it to have?)

~~~
shykes
Good question - no, we don't. Developers can request for certain whitelisted
commands to be executed within an environment that we know to be safe. For
example, you can specify a list of system packages, and dotcloud will install
them from the official LTS Ubuntu repository.

There's an ongoing discussion in the Docker community on the best way to make
this possible in a shared environment. One possibility is to add support for
OpenVZ, which has a better track record on that front (although it's not clear
how much of the perceived difference is just fud). Another is to combine
namespaces with SELinux, so that even if you break out of the namespace,
you're stuck in a "limbo" context with no ability to do harm. Lastly, there's
the possibility of extra instrumentation around the container, to limit the
risk - for example you could allow root privileges only for a whitelist of
commands on a whitelist of base images. Or you could only authorize network
connectivity with a whitelist of remote hosts (keeping in mind most use cases
which require root access involve short-lived image building). Or you could
map containers with root privileges to dedicated virtual machines, separately
from the unprivileged containers. Etc.

~~~
shykes
To clarify, I'm talking about our use of containers _at dotCloud_ , which is a
multi-tenant environment.

 _Docker_ _does_ allow running processes as root inside a container, and it
also allows dropping privileges to the uid of your choice. It all depends on
your particular use case.

------
dap
Good post, except that it's extremely misleading to use Solaris as the
canonical example of non-Linux containers and then say that non-Linux
containers "haven't had as much exposure" and "the source code isn't always
available for peer review and auditing". Solaris containers (in Solaris first,
and then illumos when Solaris became closed-source again) have been open
source since 2005 and running in hostile production environments that whole
time.

~~~
jpetazzo
True, I will update the blog post so that it feels less misleading. Source
code for Solaris zones is indeed available; but I wouldn't consider it widely
deployed. Of course, some people are using it in public hosting environments
(the most notable example is probably Joyent); but I don't think that it's
significant compared to the installed base of VServer, OpenVZ, or LXC out
there.

I mean — it's trivially easy to get access to a Linux VPS, for a ridiculously
low price (sometimes for free). Now compare with something equivalent based on
Solaris zones.

But yeah, I'll definitely update the blog post, thanks!

~~~
bcantrill
_Source code for Solaris zones is indeed available; but I wouldn 't consider
it widely deployed._

Why would you not consider it widely deployed? If it needs to be said: just
because you don't use something doesn't mean that others aren't. Speaking only
for us (I work for Joyent), we have deployed hundreds of thousands of zones
into production over the years -- and Joyent was running with FreeBSD jails
before that. And that's just us; there are many others in the
illumos/SmartOS/OmniOS, Solaris and FreeBSD communities who have been running
this technology in production -- broadly -- for years. Perhaps OS
virtualization is a new technology for you, but understand that it's not new
for everyone; some of us have been doing this for a while -- widely deployed
and in production.

~~~
jpetazzo
I wouldn't consider it "widely deployed" compared to the Linux installed base.

Sure, Joyent (and others) has "deployed hundreds of thousands of zones into
production over the years", but you guys are the only well-known, large-scale,
public hosting service using zones (and you're damn good at that, no doubt
about it!)

Now compare with dotCloud, Heroku, Dreamhost, 1&1, Mediatemple, OVH, Amen
(just to name those I can remember without doing an extensive research): those
guys have also "deployed hundreds of thousands of Linux-based VPS into
production", using VServer, OpenVZ, and more recently, LXC.

Don't get me wrong: I'm a big fan of Solaris (and its heritage); I have lots
of my marbles on ZFS; I hacked basic ZFS support in Docker just for fun a
while ago; and if I knew better, I would love to find a way to run sub-zones
and a ZFS pool in a Joyent SmartOS instance and port Docker to your platform.
But there is a helluva lot of Linux hosters out there.

I'll close with a lovely paraphrase: « Perhaps LXC containers are a new
technology for you, but understand that it's not new for everyone; some of us
have been doing this for a while â widely deployed and in production. » ☺

~~~
spartango
Just because popular hosting companies are not using Solaris zones, that does
not mean there are not large Zones-based deployments elsewhere in the
industry. Particularly, some major corporations (including banks and telcos)
are using Zones in their production Solaris environments. These turn out to be
particularly large deployments, with hundreds of machines in data centers
across the US.

These users do not broadcast their use of Zones, but having worked with them,
they certainly do exist.

*Used to work at Sun on a project related to Zones and ZFS.

------
jpetazzo
By the way, if anyone knows of a documented exploit for LXC, I would love to
hear about it. People (generally advocating VMs, zones, jails, OpenVZ...) will
often say that "containers are not secure", but once you've taken some basic
steps (like locking down kernel caps and device access) it becomes difficult
to find an actual threat.

~~~
contingencies
_I would love to hear about it_

See [http://blog.zx2c4.com/749](http://blog.zx2c4.com/749) and
[http://bit.ly/T9CkqJ](http://bit.ly/T9CkqJ)

~~~
ak217
Neither of these exploits works on stock Ubuntu 12.04 LTS, with LXC or
otherwise (AppArmor kicks in).

Like jpetazzo, I would love to see a working LXC exploit. In my case,
"working" == "can get host root when given container root on Ubuntu 12.04 or
later".

~~~
liuw
The fact is that by the time you know about the "working" exploit it's already
been fixed. Unless you're security-related researchers / engineers you're not
very likely to get hold of a 0-day exploit.

------
SkyMarshal
_> Finally, if you run Docker on a server, it is recommended to run
exclusively Docker in the server, and move all other services within
containers controlled by Docker._

This looks like what CoreOS is providing, a stripped down barebones host, with
all other services not strictly necessary in the host moved to the containers.

 _> Capabilities turn the binary “root/non-root” dichotomy into a fine-grained
access control system. Processes (like web servers) that just need to bind on
a port below 1024 do not have to run as root: they can just be granted the
net_bind_service capability instead. And there are many other capabilities,
for almost all the specific areas where root privileges are usually needed._

This is awesome, has been a personal pain point in the past, trying to get JVM
running as non-root in ubuntu server. Theoretically it's easy with IPTABLEs,
but in practice it can be tricky to get working exactly right.

~~~
secstate
To your first point, CoreOS is SmartOS for linux :)

------
pacala
Any 0-day Linux root vulnerability qualifies. Linux is a _large_ system, do
your own risk analysis.

~~~
jpetazzo
Are you implying that gaining root access inside a LXC container means that
you can escalate to the host system, or to sibling containers?

If yes, I would like to see an example of that (that works on systems with
very minimal lockdown, i.e. using the device control group and kernel
capabilities).

Otherwise, if you just mean that "0-day Linux root vulnerabilities can be used
to escalate from non-root to root in a Linux Containers", that's a truism, and
it also stands true for VMs or OpenVZ systems.

Just like 0-day vulnerabilities will help people to escalate from non-root to
root in a FreeBSD jail or Solaris zone.

~~~
JoshTriplett
Many (though not all) local kernel exploits that allow you to escalate to root
will also allow you to run arbitrary code in kernel-space, and there's only
one kernel, not one kernel per container.

VMs will always be more secure than containers, simply through defense in
depth; the only question is whether you want to trade away some performance
and flexibility to increase security.

~~~
jpetazzo
I disagree with this assertion that "VMs will always be more secure". Of
course, they bring an extra layer (or rather, a layer of different nature).

But check the number of Xen vulnerabilities (I kept up with those for a while
because I still run a Xen cluster): they are very real. And keep in mind that
Xen (at least in my case!) doesn't bring an extra layer of security: if you
are (e.g.) an IAAS provider using Xen to sell VMs, your customers can run
anything they like in their VMs, and Xen will be the only layer. Your
hypervisor will be "on the front line" if you see what I mean.

I would actually argue quite the contrary. I.E.: exploits affecting containers
are likely to be exploits affecting _all_ Linux systems, meaning that they
will draw much more attention and scrutiny than exploits affecting
hypervisors, and they are likely to be fixed faster.

~~~
JoshTriplett
To clarify: VMs will always be more secure for sandboxing a non-root service.
In that case, untrusted code would have to get root first, then use that to
either replace or exploit the kernel, and _then_ exploit the VM.

In the case where you run untrusted root or kernel code, that code only needs
to exploit the VM, true. (On the other hand, many VMs have smaller attack
surfaces than the Linux kernel.)

------
AYBABTME
I'm very interested in all those things, but I clearly lack a trajectory for
learning them. Is there a reference I could read or a 'name' for that domain?
How does one become educated on these things?

So far I've grabbed knowledge by reading paper on operating systems (and
misunderstanding 80% of their content), reading man pages, reading Tanenbaum's
textbooks, etc. But still I don't feel like I know or understand.

They say a lack of words for things render one blinds of their ignorance.
Sometimes it's also that you just don't know what needs to be learnt.

------
gouggoug
"No exploit has been crafted yet to demonstrate this, but it will certainly
happen in the feature". But will it be considered a future? ;)

