
Docker container breakout? - eugeneionesco
http://stealth.openwall.net/xSports/shocker.c
======
shykes
Hi all, I'm a maintainer of Docker. As others already indicated this doesn't
work on 1.0. But _it could have_.

Please remember that at this time, we don't claim Docker out-of-the-box is
suitable for containing untrusted programs with root privileges. So if you're
thinking "pfew, good thing we upgraded to 1.0 or we were toast", you need to
change your underlying configuration now. Add apparmor or selinux containment,
map trust groups to separate machines, or ideally don't grant root access to
the application.

Docker will soon support user namespaces, which is a great additional security
layer but also not a silver bullet!

When we feel comfortable saying that Docker out-of-the-box can safely contain
untrusted uid0 programs, we will say so clearly.

~~~
api
Don't worry. This is how things get more secure. Just stay the course and
whack the moles. :)

Docker is awesome by the way.

~~~
SixSigma
It is better to think "less insecure" than "more secure".

------
hapless
Here is your annual reminder: Use SELinux.

Nothing with a shared kernel is going to be very secure. That's just the
nature of the beast. It's why Docker supports SELinux. It's why RHEL and
CentOS ship with pre-written SELinux policy for common daemons.

If you intend to have more than zero services on the system, you want SELinux.

~~~
saosebastiao
I can't help but be skeptical about SELinux, having been written by the NSA.
What would make you choose SELinux over AppArmor?

~~~
nickstinemates
Isn't it for the NSA, but by Red Hat? May have my history mixed up.

~~~
citruspi
I believe that the National Security Agency was the original developer, but
that Red Hat is significant contributor.

------
bastichelaar
Apparently this is already fixed in Docker 1.0:

    
    
      Its fixed in docker 1.0 since CAP_DAC_READ_SEARCH is no longer available.
    
      Other FS-related threats to container based VMM's that have been discussed:
    
      - subvolume related FS operations (snapshots etc)
      - FS ioctl's that accept FS-handles as well (XFS)
      - CAP_DAC_READ_SEARCH also defeats chroot and other
        bind-mount containers (privileged LXC)
      - CAP_MKNOD might be a problem too (still available in docker 1.0) depending on the drivers available in the kernel
    

Source: [http://seclists.org/oss-sec/2014/q2/565](http://seclists.org/oss-
sec/2014/q2/565)

~~~
kvmosx
Confirmed not working in Docker 1.0:

    
    
      root@377a6f4ab0a4:/# history
      10  wget http://stealth.openwall.net/xSports/shocker.c  
      11  cc -Wall -std=c99 -O2 shocker.c -static
      12  apt-get install build-essential
      13  cc -Wall -std=c99 -O2 shocker.c -static
      14  cc -Wall -std=c99 -O2 shocker.c -static -Wno-unused-result
      15  ls
      16  ./shocker
      17  shocker
      18  nano a.out
      19  cat a.out
      20  ./a.out
      21  history
      root@377a6f4ab0a4:/# ./a.out
      [***] docker VMM-container breakout Po(C) 2014           
      [***]
      [***] The tea from the 90's kicks your sekurity again.     [***]
      [***] If you have pending sec consulting, I'll happily     [***]
      [***] forward to my friends who drink secury-tea too!      [***]
      <enter>
      [*] Resolving 'etc/shadow'
      [-] open_by_handle_at: Operation not permitted
      root@377a6f4ab0a4:/# uname -r
      3.14.1-tinycore64

------
valarauca1
I believe it was Theo de Raadt who once said, "Why does everything think that
when it comes to writing VM/container software suddenly people gain super
human programming powers and no longer make the same mistakes they make
writing operating systems?" (Slightly paraphrasing).

While the issue is currently fixed in the .12 and 1.0 versions. I doubt Docker
is still completely bullet proof.

~~~
gtjay
This phrasing unfairly conflates VM/hypervisor technology and containers.
Containers being a pure software technology _do_ require near superhuman
ability to secure but VM/hypervisors can lean on chip-level separation.

People forget that in-chip memory protection didn't come about for security
reasons, memory errors were a particularly dangerous and particularly common
kind of bug and the hardware was extended to help with memory isolation. OS
session ending memory errors are almost unheard of since operating systems
have started fully utilizing the on-chip protection. Programmers didn't become
"superhuman" at preventing these errors.

For similar reasons it's much easier for hardware-backed virtualization
programmers to protect you from malicious business inside a VM than it is for
OS or container programmers.

~~~
nhaehnle
You now rely on chip designers being super-human.

The real truth is that the difficulty of containment is proportional to the
interface that is available to the contained process. You don't need VM or
hypervisor technology to build a virtually unbreakable container. You only
need to prevent the contained process from using any syscalls at all.

Hardware only seems better at this kind of stuff because (a) it's harder to
find errata in hardware and (b) the syscall interfaces of commonly used
operating systems are much larger than what the hardware offers, and were
developed without keeping containability in mind. It is a well known fact that
tacking on security features in hindsight is problematic.

~~~
gtjay
You don't need super-human chip designers because, as you say, "the difficulty
of containment is proportional to the interface that is available". Hardware
doesn't just seem better because "the syscall interfaces of commonly used
operating systems are much larger than what the hardware offers", it _is_
better. It is easier to analyse, has a more limited state-space, has more
provable behavior, etc.

You can't just argue away the fact that a certain class of error has been all
but eliminated by hardware-supported virtual memory. Multi-tasking as we know
it today would basically be impossible without it. The reliability of "just
get it right" systems like the early Macintosh isn't even comparable to, for
example, a modern Linux machine that uses the chip to trap large classes of
erroneous memory accesses.

Given that we have the above, a case of a class of error that programmers
seemed unable to eliminate (practically) eliminated, I'm not really sure what
you're arguing. Are you saying that hardware designers of the 80's _were_
superhuman?

Okay... maybe Jay Miner...

~~~
nhaehnle
My point is that there is no difference between software and hardware.

You don't need hardware to eliminate memory errors: software can do it as
well. Two examples of this are the Singularity system that Microsoft Research
built and Google's NaCl, where the system only loads code that can be verified
not to access memory incorrectly.

Your claim that hardware is easier to analyze is also incorrect. Modern
processors are extremely complex beasts and are not inherently simpler than
software. All processors have long lists of errata. You may be mislead into
thinking hardware is easier to secure because (a) those errata are less
visible to userspace developers because the kernel shields you from them and
(b) hardware developers invest _much_ more resources into formal verification
than software developers out of necessity (you can't just patch silicon). If
software developers invested a similar amount of effort into formal
verification tools, your impression would be rather different.

Again, the point is that there is no inherent distinction between software and
hardware when it comes to securing systems. It is always and everywhere first
a question of how you design your systems and interfaces and second a question
of investment in development effort targeted at eliminating bugs.

~~~
gtjay
"My point is that there is no difference between software and hardware."

Okay, now I see where you're coming from. Theoretically I agree. However,
practically there are a number of things that make hardware different:

* Hardware has inherent "buy-in". The software systems you describe as also solving the memory access problem are basically opt-in frameworks. While you can make software frameworks hard to opt-out of (e.g. OS integration etc.) by definition... software runs on hardware...

* hardware solutions are often much more transparent. Again, your software example require a great deal of re-tooling. One of the most elegant aspects of the classic 80's memory access solution was how transparent it was.

* The ratio of software to hardware vendors has far fewer hardware vendors. Combine this with the fact that, as you point out, hardware is so expensive to retool and you create an environment where it is much more likely that a single hardware solution will be "correct enough" to enforce a constraint on software than it is the case that the majority of software will properly opt-in to a framework/code-correctly.

------
gtjay
This vulnerability is a good example of how a security bug is _still a bug_.
That is: even if all the bad guys went away there would still be problems.
This issue was fixed (as far as I can tell) pre-1.0 for non-security reasons.
See this discussion:
[https://github.com/dotcloud/docker/issues/5661](https://github.com/dotcloud/docker/issues/5661)
.

Basically, docker used to have a "drop" list associated with each execdriver.
By default docker kept all kernel bestowed capabilities but would explicitly
drop those on the execdriver's drop list. This created issues with
compatibility. If an image was prepared on a kernel that _didn 't_ have a
particular capability at all and was suddenly run on one that was, weird hard
to diagnose behavioral differences could emerge. So now docker drops
everything by default and the execdrivers have a "keep" list. Also, there's
now a check for the kernel defining a capability before trying to mess with
it.

There are---of course---still some issues. For example, docker drops
everything then tries to add the capability back. That won't work for some
capabilities because some can't be added back once they've been renounced.
Still, the situation is an interesting example of how a security bug is still
a bug and is likely to bite you in some way sooner or later. See also XSS
flaws that limit valid inputs.

~~~
contingencies
Yes. I pointed it for LXC originally, filing the bug with LXC on sourceforge
and later re-filing on github. Nothing to do with docker. Finally, they
implemented it. This is like multiple years of timeframe we're talking about.
Later I pointed it out for docker. Just sayin'. There's a lot of us
contributing here, and much of the work isn't under the docker banner.

~~~
gtjay
"There's a lot of us contributing here, and much of the work isn't under the
docker banner."

Of course! If my comment could be taken as part of the mass of prose that can
be read to imply otherwise, I apologize. I don't want to undermine what Docker
has done with providing a packaging format, UX, and history niceness. I also
don't want to undermine the years of hard work that's gone into kernel
namespacing etc. that makes it all possible.

------
notacoward
Just to be clear, this doesn't really seem like a problem with Docker
specifically. It looks like a problem with the kernel's namespace isolation,
affecting _any_ container-based solution. Yes, that's in the PPS, but probably
should be in the title.

~~~
kapilvt
that's true to some extent, but it also has to do with which namespaces and
isolation features a given container solution supports, for example lxc has
seccomp syscall filtering support and user namespace support ootb which would
have mitigated this attack surface to those of the unprivileged user running
the container (and covering the ps on kexec). in addition lsm usage (selinux,
apparmor) can also limit the attack surface area.

------
peterwwillis
Linux capabilities are not a good security model in general. They're more
suitable for sysadmins who want to do the very basic locking down of system
resources to prevent users from fucking things up, or preventing programs from
doing basic accidental mistakes. A MAC or RBAC implementation is a lot more
robust and actually fulfills the qualifications for things like secret/top
secret computing systems.

~~~
contingencies
grsec + containers is nicer.

~~~
peterwwillis
From what I recall, grsec's rbac doesn't give you the same flexibility as
selinux's mac. You have to pick and choose whether you want advanced
heuristics to prevent different kinds of attack, or just get really fine-
grained with your system control. I prefer grsec personally, but only because
i'm lazy, and it's more than likely not certified for top secret systems.

~~~
contingencies
grsec's developer philosophy fully admits this pragmatic focus... basically,
if we are too lazy to use a custom policy because policy development is
painful, then essentially we are not going to use any policy and will instead
remain vulnerable. grsec makes it easier, therefore it actually gets used.

I studied SEL policies in depth back in 2000 or so, but have never once
deployed a custom policy. I suspect others are the same, though common daemons
on common distributions recently (~last 5 years) began to have usable pre-
supplied policies, unfortunately standard services are so commodified these
days they're often outsourced (email, chat, web, etc.) and so the benefits of
this 'too-little-too-late' development are partly mitigated in practice.

------
xeroxmalf
"tested with docker 0.11"

Looks to be a few releases out past 0.11, notably 0.11.1, 0.12, and now 1.0.
Can anyone confirm this works on the later versions of docker?

~~~
ewindisch
The kernel version is just as important, if not more important. I've performed
tests that have given me non-specific breakout capabilities on certain
combinations of Linux and Docker. I haven't however, as is done here,
reproduced and written an exploit.

As an employee of Docker, I feel it is more important to me to know if we can
breakout and patch those issues than to write viable exploits for them.

I have noticed that newer kernels and Docker versions (such as 1.0) are
currently more difficult to break out of than they were in earlier versions.
Again, however, it's highly dependent on the pairing.

What's important to recognize here is that even with breakout potential,
containers should add a useful layer of security to break out of that wouldn't
otherwise exist. Containers should never remove security from your system,
they should only add to it. However, although deployers may find the removal
of virtual machines to weaken their security story. The "secure all the
things" story would be to put Qemu in a container, then run containers inside
the VM.

Otherwise, security practices are as they _always_ have been: Don't leave
setuid binaries floating around, etc.

------
julien421
Docker's answer [http://blog.docker.com/2014/06/docker-container-breakout-
pro...](http://blog.docker.com/2014/06/docker-container-breakout-proof-of-
concept-exploit/)

------
throwaway9995
Relevant recent discussion on the security of containers vs full
virtualization:
[https://news.ycombinator.com/item?id=7834338](https://news.ycombinator.com/item?id=7834338)

------
floatboth
Now try breaking out of a FreeBSD jail ;-)

~~~
nisa
like this? [http://freebsd.1045724.n5.nabble.com/Thoughts-on-jail-
privil...](http://freebsd.1045724.n5.nabble.com/Thoughts-on-jail-privilege-
FAQ-submission-td4219099.html)

~~~
profquail
That's dated Jan 15, 2009. Are there any newer exploits for breaking out of
jails on recent versions of FreeBSD? OR breaking out of a Bhyve VM?

~~~
floatboth
And that's not even an exploit, that's the sysadmin doing it wrong: "If the
host system and the jail share the `john' user _and_ you are sharing
`/usr/local' as read-write between the host and the jail, then ``you are doing
it wrong!''."

------
mkent
Confirmed this works on lxc 0.7.5-3ubuntu69 (ships with Ubuntu 12.04 lts).
Just change "/.dockerinit" to any file within a bind mount and run it.

To fix this you can shutdown the container, edit the config in
/var/lib/lxc/<name>/config and add dac_read_search to lxc.cap.drop. Voila.

    
    
      [*] Resolving 'etc/shadow'
      [-] open_by_handle_at: Operation not permitted

------
gabrtv
Docker image to test if your host is vulnerable to this particular
open_by_handle_at() container breakout:
[https://registry.hub.docker.com/u/gabrtv/shocker/](https://registry.hub.docker.com/u/gabrtv/shocker/)

Remember to always audit the source code before running something like this.
Origin repo is here:
[https://github.com/gabrtv/shocker](https://github.com/gabrtv/shocker)

------
jrockway
> _" The tea from the 90's kicks your sekurity again. If you have pending sec
> consulting, I'll happily forward to my friends who drink secury-tea too!"_

Uh, what?

~~~
eugeneionesco
stealth is a legend of the '90s

[http://en.wikipedia.org/wiki/TESO](http://en.wikipedia.org/wiki/TESO)

~~~
contingencies
Never really got talking but they sent me some friendly warez when I released
an ARP-based remote OS detection script in like 1999. As I recall they were
all Germanic. Doubt there's any 'reach out' from groups today, too much
profit/loss going on...

------
jhardcastle
With a technology used in production in a lot of places, does responsible
disclosure apply here? Did the author (OP?) notify Docker and/or the Linux
kernel team before distributing this for public consumption?

~~~
zorked
While the author called the exploit "shocker" I don't think anybody is shocked
by it. There are likely many other ways to break out of Docker. And while I
assume the Docker (and kernel!) people are closing them as they come along, I
don't think anyone is claiming Docker is unescapable so it's not a surprise if
it is.

~~~
zobzu
I think he might have been sarcastic

