For example, in this article the author links to the 'list of dropped capabilities in the Docker code'. As it happens, I wrote that list quite some time ago, and wrote it for lxc-gentoo, a guest-generation script for raw LXC against an earlier kernel version with an earlier LXC userspace. Not only is the list now out of date, it's no longer using the preferred approach. Why is this? Instead of explicit drop ("allow all, deny some") after some months of raising the issue one of the LXC devs finally added the 'lxc.keep' (ie. "deny all, allow some") which is architecturally more secure against things like kernel upgrades which add or modify kernel capabilities.
Furthermore, the docker people only included this when I added https://github.com/dotcloud/docker/commits/v0.5.0/lxc_templa... ... things as important as WARNING: procfs is a known attack vector and should probably be disabled if your userspace allows it. eg. see http://blog.zx2c4.com/749 and WARNING: sysfs is a known attack vector and should probably be disabled if your userspace allows it. eg. see http://bit.ly/T9CkqJ
Again, I fully support docker's efforts but the article is ... misleading at best.
There's a reason we keep saying docker is not yet production-ready.
Right now our focus is on usability and stabilizing the management API to make deployment-centric deployment awesome. You can be sure that before we tell anyone that they can use docker to sandbox untrusted code in a shared environment (which by the way is not the only use case of docker) we will be locking down our default lxc configuration and doing a sweep of all pending security issues.
For the record, we (dotCloud) have tens of thousands of lxc containers currently running untrusted code in production on shared infrastructure, and have had to monitor and maintain them 24/7 for several years. Before that we ran openvz. And before that, we ran vserver. So while docker itself may not yet be ready for production (and indeed we don't use it in production at dotcloud either), you don't need to worry about our stance on security. We care about it just as much as you do.
If you have a minute and can share, I'd love to hear why you switched away from vserver (and then openvz but especially vserver). Or maybe you have those transitions written up somewhere?
There's a related discussion here: https://news.ycombinator.com/item?id=6227937
There's an ongoing discussion in the Docker community on the best way to make this possible in a shared environment. One possibility is to add support for OpenVZ, which has a better track record on that front (although it's not clear how much of the perceived difference is just fud). Another is to combine namespaces with SELinux, so that even if you break out of the namespace, you're stuck in a "limbo" context with no ability to do harm. Lastly, there's the possibility of extra instrumentation around the container, to limit the risk - for example you could allow root privileges only for a whitelist of commands on a whitelist of base images. Or you could only authorize network connectivity with a whitelist of remote hosts (keeping in mind most use cases which require root access involve short-lived image building). Or you could map containers with root privileges to dedicated virtual machines, separately from the unprivileged containers. Etc.
Docker _does_ allow running processes as root inside a container, and it also allows dropping privileges to the uid of your choice. It all depends on your particular use case.
Docker allows 'everything' minus the explicit list linked to in the article. What it should do is allow an explicit list, which recently became possible.
Just a heads-up: I know this isn't your fault, but docker.io does not say this on the front page, About, or FAQ that I can see. In fact, it currently says "same container that a developer builds and tests on a laptop can run at scale, in production".
Docker looks very interesting, thanks for your work.
First, I think it's a little disingenuous to say that your issue disappeared. No one is censoring the Docker issue list. If you could provide a bit more information (your github handle, the issue title, etc.) I'll be happy to investigate.
edit: the first point was addressed, thanks :)
Second, Docker is an open source project with a rich community and a great deal of contributors for any project, even more so for a project less than 6 months old. People like yourself with clear passion can only make it better. I encourage you to continue your contributions by opening an issue and working with the maintainers to solve it.
Unfortunately I don't have time to run docker. Right now I am working on a broader-goaled system internally which supports arbitrary virtualization platforms and integrates concerns around platform integrity, host integrity, failover, automated scale-out, network topology specification and development/operations processes.
Docker apparently aims to make deployment really easy, and does this for some subset of cases, but with ease of use sacrifices security for new users who cannot evaluate statements such as the comments I added to its template in the commits referenced above.
To be frank I am not sure this is a winning goal, and suspect that any attempt to criticize docker's place within broader concerns would more likely result in something close to negative feedback from the existing developer community rather than an abstract thoughtfest resulting in wins for everyone. Happy to discuss further by email.
At the same time, saying that Docker's goal is to "sacrifices security" is untrue and unfair to the project. So yes, as long as you make these unfounded statements, you will meet resistance in the form of a constructive rebuttal by the community. Especially coming from someone who "doesn't have time" to contribute to the project or even use it.
People running things they don't understand means probable security issues for those users... and I think it's totally fair and in no way bad form to discuss this tradeoff in the context of docker and similar projects. Especially given two attack vectors documented in the current codebase, and the fact that the article we are commenting on ignored such. What docker is attempting to do - apparently give people easy to use 100% portable containers for arbitrary code - is hard, and security for arbitrary code is one of the challenges.
Personally I wonder if perhaps taking some time out to consider the blurrier and more complex edge cases with regards to the project's overall goals and architecture, potentially considering a dalliance in to integration with weightier operations + development process concerns, higher security deployment requirement concerns and other areas that container-based deployments may affect would be really valuable for docker at the moment.
That's unfortunate. Even in development of products/internal infrastructure with overlap, there may be some ideas that benefit each project. It might also provide a more thorough understanding of the goals / strengths of the Docker project.
I'm eager to learn more about and continue our discussion. I will definitely take you up on your offer to email further.
So why does Docker still ship with lxc.drop? Well, a large number of people are still using LXC 0.7, which doesn't support lxc.keep, AFAIK. But it is very likely that Docker 1.0 will either require LXC 0.9, or totally get rid of LXC userland tools, or provide multiple implementations depending on what you have installed locally; and then lxc.keep will definitely kick in.
Also, the initial security choices of Docker represent a middle ground between "lock down everything" and "allow anything to happen". It had to be secure enough so that people could run regular app servers with a reasonable level of trust; and permissive enough to allow e.g. normal package managers to run.
Moreover, Docker is evolving: we recently added the "-privileged" flag (available in the master branch, and very probably in 0.6.0, due in a few days), allowing to switch between a more secure configuration, suitable for e.g. public PAAS environments, and a more permissive configuration, suitable for private PAAS, continuous integration, that kind of things. And this is just one step in that direction.
Err, where did you get that idea? I couldn't be less concerned about the fate of my docker 'contribution' of inline comments (which was simply given out of shock that nobody seemed to be considering these vectors, and was merely copied from lxc-gentoo).
My motivation in commenting here is to prevent people from getting the wrong idea about security and LXC, something the article, IMHO, failed to do. In fact, it came across as fairly misleading to my mind.
I mean — it's trivially easy to get access to a Linux VPS, for a ridiculously low price (sometimes for free). Now compare with something equivalent based on Solaris zones.
But yeah, I'll definitely update the blog post, thanks!
Why would you not consider it widely deployed? If it needs to be said: just because you don't use something doesn't mean that others aren't. Speaking only for us (I work for Joyent), we have deployed hundreds of thousands of zones into production over the years -- and Joyent was running with FreeBSD jails before that. And that's just us; there are many others in the illumos/SmartOS/OmniOS, Solaris and FreeBSD communities who have been running this technology in production -- broadly -- for years. Perhaps OS virtualization is a new technology for you, but understand that it's not new for everyone; some of us have been doing this for a while -- widely deployed and in production.
Sure, Joyent (and others) has "deployed hundreds of thousands of zones into production over the years", but you guys are the only well-known, large-scale, public hosting service using zones (and you're damn good at that, no doubt about it!)
Now compare with dotCloud, Heroku, Dreamhost, 1&1, Mediatemple, OVH, Amen (just to name those I can remember without doing an extensive research): those guys have also "deployed hundreds of thousands of Linux-based VPS into production", using VServer, OpenVZ, and more recently, LXC.
Don't get me wrong: I'm a big fan of Solaris (and its heritage); I have lots of my marbles on ZFS; I hacked basic ZFS support in Docker just for fun a while ago; and if I knew better, I would love to find a way to run sub-zones and a ZFS pool in a Joyent SmartOS instance and port Docker to your platform. But there is a helluva lot of Linux hosters out there.
I'll close with a lovely paraphrase:
« Perhaps LXC containers are a new technology for you, but understand that it's not new for everyone; some of us have been doing this for a while â widely deployed and in production. » ☺
These users do not broadcast their use of Zones, but having worked with them, they certainly do exist.
*Used to work at Sun on a project related to Zones and ZFS.
Your comment makes a fair point, but it's really diminished by your condescending tone.
See http://blog.zx2c4.com/749 and http://bit.ly/T9CkqJ
Like jpetazzo, I would love to see a working LXC exploit. In my case, "working" == "can get host root when given container root on Ubuntu 12.04 or later".
This looks like what CoreOS is providing, a stripped down barebones host, with all other services not strictly necessary in the host moved to the containers.
>Capabilities turn the binary “root/non-root” dichotomy into a fine-grained access control system. Processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the net_bind_service capability instead. And there are many other capabilities, for almost all the specific areas where root privileges are usually needed.
This is awesome, has been a personal pain point in the past, trying to get JVM running as non-root in ubuntu server. Theoretically it's easy with IPTABLEs, but in practice it can be tricky to get working exactly right.
If yes, I would like to see an example of that (that works on systems with very minimal lockdown, i.e. using the device control group and kernel capabilities).
Otherwise, if you just mean that "0-day Linux root vulnerabilities can be used to escalate from non-root to root in a Linux Containers", that's a truism, and it also stands true for VMs or OpenVZ systems.
Just like 0-day vulnerabilities will help people to escalate from non-root to root in a FreeBSD jail or Solaris zone.
Plus, the user namespace functionality is fairly new and complex, and there have already been a few bugs found, e.g. . I assume all the known bugs have been fixed, but that doesn't ensure that more aren't lurking somewhere.
VMs will always be more secure than containers, simply through defense in depth; the only question is whether you want to trade away some performance and flexibility to increase security.
But check the number of Xen vulnerabilities (I kept up with those for a while because I still run a Xen cluster): they are very real. And keep in mind that Xen (at least in my case!) doesn't bring an extra layer of security: if you are (e.g.) an IAAS provider using Xen to sell VMs, your customers can run anything they like in their VMs, and Xen will be the only layer. Your hypervisor will be "on the front line" if you see what I mean.
I would actually argue quite the contrary. I.E.: exploits affecting containers are likely to be exploits affecting all Linux systems, meaning that they will draw much more attention and scrutiny than exploits affecting hypervisors, and they are likely to be fixed faster.
Certainly Xen has its fair share of vulnerabilities, but vastly fewer than the kernel.
In the case where you run untrusted root or kernel code, that code only needs to exploit the VM, true. (On the other hand, many VMs have smaller attack surfaces than the Linux kernel.)
http://blog.zx2c4.com/749 and http://bit.ly/T9CkqJ
So far I've grabbed knowledge by reading paper on operating systems (and misunderstanding 80% of their content), reading man pages, reading Tanenbaum's textbooks, etc. But still I don't feel like I know or understand.
They say a lack of words for things render one blinds of their ignorance. Sometimes it's also that you just don't know what needs to be learnt.