The script is available at https://github.com/Incubaid/jenkins-lxc (no docs for now, and several improvements are possible). It expects CI job scripts to be contained in the job repository, e.g. https://github.com/Incubaid/arakoon/tree/1.6/jenkins
At FrozenRidge.co, we work on our own CI server called Strider and have actually integrated Docker directly:
You can read about it at: http://blog.frozenridge.co/next-generation-continuous-integr...
We're still polishing a few rough edges. If you want early access add your github ID to this thread and we'll add you right away!
I've built a similar tool using Go. Wondering how you guys get around the lack of clone(2) in the stdlib :)
You have a lot of projects in your github repository and I was not able to identify it from scanning the list.
Send me an email if you want to discuss: firstname.lastname@example.org
How does this compare with warden? https://github.com/thoward/vagrant-warden
It should be public so you can view it even without a G+ account (I think)
I've been wondering when something like this would come around and if I'd have to try to write it myself. I've made smaller, less isolated, scale versions of this idea before but this looks snazzy.
Thanks, that sound interesting
Looks awesome and crazy fast. Great work.
github id: natejenkins
Looks really useful.
Is this linux-vserver or openvz re implemented with lxc and cgrougs?
looked at the pycon demo ... looks awesome!
Thanks! This might help a lot for an idea that my friends and I are working on.
Built a similar system in-house at my workplace. Would definitely be interested in migrating/contributing to a wider effort!
If the security is almost at par and the isolation is good enough that one bad process can't bring the whole system down, might this be a good alternative to virtualization, since I imagine it would definitely use less resources.
The main theoretical difference between hypervisor isolation and container isolation is one sits above the kernel, so a kernel level exploit only applies to a single virtual machine. With containers you're relying on the kernel to provide the isolation so you are still subject to (some) kernel level exploits.
Practically linux containers (the mainline implementation) have only provided full isolation in recent patches and probably shouldn't be considered full shaken out for something like full in the wild root level multi-tenant access.
They are super for application isolation for delivery of multiple single tenant workloads on one machine though - something people use hypervisors for quite a bit. The resources used can be a small fraction of what you're committing to with a hypervisor.
DRAM isn't the only win, of course: for every other resource in the system (CPU, network, disk), OS-based virtualization offers tremendous (and insurmountable) efficiency advantages over hardware-based virtualization -- and it's great to see others make the same realization!
For more details on the relative performance of OS-based virtualization, hardware-based virtualization and para-virtualization, see my colleague Brendan Gregg's excellent blog post on the subject.
Recent patches DO NOT provide "full isolation" and never did. What they add is usermode containers. Those are broken weekly since the release. Seriously. Have a look at http://blog.gmane.org/gmane.comp.security.oss.general
Funny you should say that. The latest virtualization-related CVEs there are actually in KVM -- a trio including two host memory corruptions, which usually enables completely owning the host. http://permalink.gmane.org/gmane.comp.security.oss.general/9...
And on the other hand, I don't see any container-related CVEs at all from 2013 in the CVE database: http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel
(The KVM issues I mentioned don't show up yet either, because they're from today.) What vulnerabilities are you referring to?
Maybe you mean kernel vulnerabilities in general, some of which could be usable by a user inside a container. Everyone should stay on top of kernel updates in any event. If you hate the rebooting, Ksplice is free for Ubuntu (and Fedora.)
With virtualization, a buggy or malicious guest is still limited to its sandbox unless there's a flaw in the hypervisor itself. With containers/namespaces, the host and guest are just different sets of processes that see different "views" of the same kernel, so bugs are much more likely to be exploitable. Plus, if you enable user namespaces, some code paths (like on-demand filesystem module loading) that used to require root are now available to unprivileged users.
There's already been at least one local root exploit that almost made it into 3.9: https://lkml.org/lkml/2013/3/13/361
If I recall, Heroku uses cgroups (EDIT: and namespaces) exclusively for multitenant isolation (and by the looks of this, dotCloud does too), so that's two big votes in the "if it's good enough for them" category.
And as far as I'm aware (speaking as an interested non-expert, so please correct me if I'm wrong) cgroups have no effect on permissions, whereas UID namespaces required a lot of very invasive changes to the kernel.
Shameless plug: I work at dotCloud, and I wrote 4 blog posts explaining namespaces, cgroups, AUFS, GRSEC, and how they are relevant to "lightweight virtualization" and the particular case of PAAS. The articles have been grouped in a PDF that you can get here if you want a good technical read for your next plane/train/whatever travel ;-) http://blog.dotcloud.com/paas-under-the-hood-ebook
However if that code is trusted, or if you're running it as an unprivileged user, or if nothing else of importance is sharing the same host, then I would not hesitate to use them.
Containers are awesome because they represent a logical component in your software stack. You can also use them as a unit of hardware resource allocation, but you don't have to: you can map a container 1-to-1 to a physical box, for example. But the logical unit remains the same regardless of the underlying hardware, which is truly awesome.
Getting away from huge per-VM block devices is a step in the right direction.
Here are some of the technologies explained:
cgroups: Linux kernel feature that allows resource limiting and metering, as well as process isolation. The process isolation, also called namespaces, is important because it prevents a process from seeing or terminating other running processes.
lxc: this is a utility that glues together cgroups and chroots to provide virtualization. It helps you easily setup a guest OS by downloading your favorite distro and unpacking it (kind of like debootstrap). It can then "boot" the guest OS by starting it's "init" process. The init process runs in its own namespace, inside a chroot. This is why they call LXC a chroot on steroids. It does everything that chroot does, with full process isolation and metering.
aufs: this is sometimes called a "stacked" file system. It allows you to mount one file system on top of another. Why is this important? Because if you are managing a large number of virtual machines, each one with 1GB+ OS, it uses a lot of disk space. Also, the slowest part of creating a new container is copying the distro (can take up to 30 seconds). Using something like AUFS gives you much better performance.
So what about security? Well, like every (relatively) new technology LXC has its issues. If you use Ubuntu 12.04 they provide a set of Apparmor scripts to mitigate known security risks (like disabling reboot or shutdown commands inside containers, and write access to the /sys filesystem).
Those products work at two levels: using filtering drivers for registry and the filesystem, and hooking into the Windows operating system API.
This doesn't mean that containers are more secure than VMs either. Attacking VMs attracts more security researchers from what I've seen (but I may be wrong on that point). However whether your running a container or a virtual machine, you still need some shared processes (eg the 'ticks' of a system clock) and with any sufficiently complicated code WILL have bugs that can be potentially exploited.
However the crux of the matter is regardless of whether you're running containers or full blown virtual machines, you cannot escape out of the sandbox without having elevated privileges on the guest to begin with. And if an attacker has that, then you've already lost - regardless of whether the attacker can or cannot escape the sandbox.
Lastly, I'm not sure if you're aware of this or not, but this is a Linux solution and has nothing to do with Windows (I only say this because your post seemed tailored towards Windows-hosted virtualisation)
Even being aware that this is a Linux solution I mentioned the Windows technologies that I know technically.
A bit of both, but mostly the former. In practical terms, they both have the same level of security. But -as with any software- something could be published tomorrow exposing some massive flaw that totally blows one or the other out of the water. However neither offer any technical advantage over the other from a security stand point and from a practical perspective, the real question of security is whether your guest OSs are locked down to begin with (eg it's no good arguing which home security system is the most effective if you leave the front door open to begin with).
> Even being aware that this is a Linux solution I mentioned the Windows technologies that I know technically.
That's fair enough and I had suspected that was the case. I just wanted to make sure that we were both talking about the same thing :)
It destroyed my entire file system. I have no clue how the hell it happened--it floors me that something like that would be possible--and I suspect it's probably simply the result of a newbie like myself somehow misusing userspace tools. But it was enough to turn me off of it for the time being.
Of course, you could bundle libssl with your app. But then the standardization is at the level of kernel/libc ABI. In which case the container is basically a full LXC guest.
But then why standardize an image format if you can create a small script which builds the image with lxc-create + installs whatever else necessary for your app. That script will be much smaller than the full image, even a barebones ubuntu lxc guest (debootstrap quantal) is ~400MB.
Deploying from images should be much faster and less fragile (oops, is Github down?) than from scripts.
So you get clean separation of build and run, which is a hugely important part of reliable deployment.
You wouldn't need to save it as /sbin/init. You would just type:
$ docker run MYIMAGE /path/to/my/static/binary
Breaking out of a filesystem container is as easy as creating a root block device.
Breaking out of a network container is as easy as creating a network device
And in all cases, you can just inject memory, load lkms, etc. That's without mentioning the amount of weekly CVEs for Linux namespaces.
This is out of date. As of Linux 3.8, or with out-of-tree patches in older kernels, LXC puts each container in its own user namespace, so that root in the container has no privileges outside. LXC also uses network namespaces, so the user inside the container can only do on the network what the admin allows them to do.
Because root inside a user namespace is unprivileged outside it, it can't scribble on memory or load modules, etc., either.
See https://wiki.ubuntu.com/LxcSecurity for a decent summary of the situation in Ubuntu's releases. Several Ubuntu contributors are also among the main drivers of LXC upstream.
It's true that user namespaces and other kernel features LXC relies on are beginning to get much more use than they used to, and probably still have flaws, though I think you exaggerate how many CVEs are actually being found. Ubuntu's LXC support also uses apparmor and seccomp to provide further isolation. Conservative users will probably wait a while more to see what bugs get shaken out.
Yes, you probably don't want to run untrusted code with root privileges inside a container if anything valuable is running on the same host.
Containers are awesome because they represent a logical component in your software stack. You can also use them as a unit of hardware resource allocation and multi-tenancy, but you don't have to: you can map a container 1-to-1 to a physical box, for example. But the logical unit remains the same regardless of the underlying hardware and multi-tenancy setup, which is truly awesome.
EDIT: details on multi-tenancy.
if you're sharing nothing of importance on the host, then, you don't really need LXC, unless you don't know how to setup mysql with more than one database, nginx with more than one virtual host, yada yada.
Here's the trick: you CAN use LXC and SUPPLEMENT it by something providing security such as SELinux.
LXC lets you use cgroups, i.e. setup memory/cpu/IO limits per container. If you setup MySQL with more than database, you can't do that.
Also, we DO use LXC and SUPPLEMENT it by something providing security such as GRSEC (in the current version in production at dotCloud) and AppArmor (with docker) :-)
if you do use apparmor and grsec (as in RBAC's part of grsec in particular) it's probably acceptable, but I haven't seen it mentioned on the website - and people figure, they'll just use lxc "and be safe".
When they were tired about seeing their patches rejected from the mainstream kernel, they decided to try a different approach, and that approach is LXC. In other words, LXC is a reimplementation of OpenVZ concepts by almost the same team.
LXC is actually more secure than OpenVZ, if only because it went through more scrutiny than OpenVZ.
It now looks like overlayfs will make it into the 3.10 mainline kernel , so it may be a better choice in the future. I think that the Under the hood stuff on docker's page are implementation details that can change, so a switch to overlayfs when that becomes more suitable could be possible. (Confirmation from the dotClouders present would be apprecaited.)
In other words, Docker + Mesos is a killer combo. There is already experimentation underway to use Docker as an execution engine for Mesos.
Any ideas ?
To answer the original question: Docker extends LXC with a higher-level API which operates at the process level. OpenVZ helps you create "mini-servers". Docker lets you forget about servers and manage processes.
# Run command which adds your payload
$ docker run base apt-get install curl
# Commit the result to a new image
$ docker commit 5b4a1ee8 nwg/base-with-curl
# Run a command from the new image. Your payload is available!
$ docker run nwg/base-with-curl curl http://www.google.com
I would argue that virtual machines at a hypervisor/hardware level were just a hack for OSs not living up to their isolation promises/obligations. Strong OS level isolation implementations (cgroups, namespaces etc) allow people to put isolation back where it belongs, the OS.
The job of the OS is to control the hardware, wrapping the OS is software to emulate hardware is ridiculous and VMs generally have much more performance overhead than isolation containers.
If you couple a container with a CoW file system that supports snapshotting (eg ZFS or BtrFS), then you can have most of the features you'd expect from virtualisation but without as heavy footprint.
Containers are an underrated and often forgotten solution in my opinion.
edit: it's too early, sorry this has nothing to do with your post... but I hope someone does correct me about hvm.
A PaaS host saying they supported Docker would imply that they'd be using, for example, SquashFS for container format, AuFS instead of OverlayFS for union-mounts, LXC instead of OpenVZ/Xen/KVM for isolation, and any other set of things your container might subtly rely upon.
The culmination of this, I imagine, would be a PaaS host allowing you to specify the "stuff" you want to run just by the URL of the container-image.
What if the namespace changes? What if AuFS changes? What if LXC changes? Independently or all together? ABI changes? Version changes? Feature changes? Are all the licenses compatible? Will it ever support platforms other than just certain versions of Linux? Or languages other than Go?
I don't see a standard. I see marketing for a product and a mailing list to collect potential customers. But maybe i'm missing something.
I do think there is a need for a standard way to package and share software at the filesystem and process level - we don't pretend to define that standard, but hopefully we can contribute to it by open-sourcing a real-world implementation.
1. every one of those attributes would be fixed against a given version of the (coming) Docker spec, and a given host would specify what version(s) of the spec they were compatible with.
2. Go is, I think, just the language the glue code is written in; not the language your own things-deployed-using-Docker must be written in.
3. It might support other Linux distros (Fedora, probably), but it won't support other OSes as hosts--because the whole point is to run things that need a POSIX-alike as their "outer runtime" (i.e. not Windows programs, etc.) The way to run these containers on another host will be to run Linux in a VM on that host, and run the containers in the VM--just like the way to play a Super Nintendo game "container" on your computer is to run them in a Super Nintendo VM. [Actually, come to think of it, game ROMs are a great analogy for precompiled SquashFS containers. I would adopt it if I were them :)]
We think Docker's API is a fundamental building block for running any process on the server.
Though! If you want to, you can think of this standard as specifying an "ABI format" for high-level, lightweight VMs that happens to run on a "Linux machine" instead of, say, an "IA32 machine."
I want a new instance of WEBSERVER.qcow2?
qemu-img create -b WEBSERVER.qcow2 -t qcow2 WEBSERVER-$SERIALNUMBER.qcow2
And, importantly, Docker maintains a filesystem-level diff between versions of an image, and only needs to transmit each diff once. So you get tremendous bandwidth savings when transmitting multiple images created from the same base.
I'm a copyright noob: would simply acknowledging the copyright owner be enough and fall under fair use, or should we not use it unless we get written permission?
This page has what is very likely the original image: http://www.peeron.com/scans/7823-1/ Peeron.com has special permission directly from Lego to display the images, so if you wanted to be extra careful you could email email@example.com and ask for permission to deep link to their picture (they'd probably say yes, the admin is a linux geek too). But honestly, a simple copyright disclaimer is probably fine. Lego won't reach out and swat you even if they do decide they don't like it, they'll just ask you to take it down.
(As a side-note, this is an example of an interesting bit of game theory: in a niche, the Majority Player will tend to keep their tech proprietary to stay ahead, while the Second String will tend to release everything OSS in order to remove the Majority Player's advantages. This one is dotCloud taking a stab at Heroku, but you can also think of, for example, Atlassian--who runs Github-competitor Bitbucket--poking at Github by releasing a generic Git GUI client, whereas Github released a Github client.)
I will add that our implementation predates Heroku's. Using a generic container layer early on (first OpenVZ-based prototypes in 2009) is what allowed us to launch multi-language support a year before any other paas. It's also how we operate both application servers and databases with the same underlying codebase, and the same ops team.
The reason why I give that description to begin with is because containers solve a lot of problems that often lead people towards the virtualisation route. And to the guest "OS", all the applications think they're running on a unique machine from the host. However containers differ from virtualisation in that it's just one OS (one shared kernel). This means you can only run one unique OS (though if you know what you're doing, you can run multiple different distros of Linux in different containers - but you couldn't run FreeBSD nor Windows inside a Linux container). Containers can also have their own resources and network interfaces (both virtual devices and dedicated hardware passed through).
Because you're not virtualising hardware with containers and because you're only running one OS, containers do have performance advantages over virtualisation while still being just as secure. So I personally think they're a massively underrated and under utilised solution.
If you're interested in investigating a little more into containers, Linux also has OpenVZ, and FreeBSD and Solaris has Jails and Zones (respectively). The wikipedia articles on each of them also offer some good details (despite the stigma attached to wikipedia entries).
I've not used Docker specifically, but I have used other containers in Linux and Solaris, so I'm happy to answer any other questions on those.
https://lwn.net/Articles/524952/ - Glauber Costa's talk on the state of containers at LinuxCon Europe 2012
https://lwn.net/Articles/536033/ - systemd containers; these follow on from a theme of containerizing whole distros rather than single apps
Basically, these are all ways a way of running programs with a partitioned subset of access to the OS - in some cases, it may even appear to be an entire installation under that subset.
This (and many of the other mentioned solutions) adds to that by being able to capture and deploy an environment for running a specific app.
Has anybody tried this?
Soon could be in tomorrow, a week, a month, a year, or longer ...
Docker doesn't have an opinion about how you package things. It only cares about the resulting changes on the filesystem. So you are free to use the best tool for each job.
By the way this means you can use Docker and Nix together. I would love to see that :)