Hacker News new | past | comments | ask | show | jobs | submit login
After Docker: Unikernels and Immutable Infrastructure (medium.com/darrenrush)
238 points by axelfontaine on Nov 14, 2014 | hide | past | favorite | 69 comments



Rump kernels (http://rumpkernel.org/) are essentially Unikernels for POSIX. I'm currently working on running unmodified application stacks (base firmware/"not-OS" + rump kernel + userland application) on Xen and later, KVM and bare metal.

Will be giving a talk on this at http://operatingsystems.io/ in London on November 25th.


How'd you get started with rump kernels? I tried poking around and there's not much documentation.


I was using rump kernels in an unrelated project for a client. In that project, the application I developed was essentially a Qt GUI for mastering custom UDF filesystem images.

You're right that the documentation for the rumprun-* stacks (running unmodified applications on rump kernels + Xen/bare metal) is lacking, however, it's early days for this work. New, extra fresh, hot off the press! We're working on it :-)

Note that rump kernels as such (sans the Xen and baremetal stacks) are well-proven and tested code, perfectly usable in production today.


While I can't speak for anyone else, the intent is that people would follow the "Getting started" tutorial linked from the front page of http://rumpkernel.org/ : http://wiki.rumpkernel.org/Tutorial:-Getting-started

If that is lacking, please let us know!


Read the docs there are, and just try stuff... and ask questions (mailing list, irc)... The docs are getting better, and its getting easier.


I found that the original research paper is currently the best (only?) proper introduction to rump kernels.



How do rump kernels differ conceptually from recent academic work on "Library OSes" such as Microsoft's Drawbridge or Linux based Oses such as Bascule or Grapheme?


The main differences are perhaps practical not conceptual. Drawbridge and Bascule are closed source. Grapheme attempts to implement a Linux compatible OS, from scratch, which is a huge task. Rump uses unmodified NetBSD code (kernel and libraries), so it is exactly compatible and updated continually as it is upstream in NetBSD.

Also rump is BSD licensed, Grapheme is GPL, which is useful as you have to link it to your application.


What's the architectural difference between Rump kernels and User Mode Linux? http://user-mode-linux.sourceforge.net/

IIRC Linode (or some similar company) used to use User Mode Linux, but switched to Xen for performance reasons.


A rump kernel is not an OS and it's not restricted to running in usermode.

see: http://wiki.rumpkernel.org/Info%3A-Comparison-of-rump-kernel...


That page doesn't say anything about User Mode Linux. I ask because User Mode Linux is basically a way of running the Linux kernel in user space on a Linux machine, which sounds like what Rump kernels are. From your response I'm not sure if you got that.


Can you share your presentation in HN ? I would very interested in the talk.


The talks are being recorded (I am organizing the conference).


Why do you use Xen first? I didn't follow virtualization closely, but I thought KVM is going to be 'the future', because it was adopted by the Linux kernel and Red Hat.


Xen is a Linux Foundation project, I believe, so it's not like it's particularly dead.


I can't believe no one has mentioned ZeroVM[1] yet. The project page is unfortunately non-descriptive, but Wikipedia has some important details[2]:

The ZRT[ZeroVM RunTime] also replaces C date and time functions such as time to give program a fixed and deterministic environment. With fixed inputs, every execution is guaranteed to give the same result. Even non-functional programs become deterministic in this restricted environment. This makes programs easier to debug since their behavior is fixed.

I've had a play with it - there's a version of python that runs on it, and it's surprisingly usable.

[1] http://www.zerovm.org/

[2] https://en.wikipedia.org/wiki/ZeroVM


This reminds me a bit of how (AFAIK) Nix packages are patched to avoid timestamps and generate deterministic builds when needed.


> "It remains virtually impossible to create a Ruby or Python web server virtual machine image that DOESN’T include build tools (gcc), ssh, and multiple latent shell executables."

At work, our tech team has found an interesting way around this for our Python app. We build out the virtualenv in the docker container, and then run our ansible-based deployments inside the same container. With that, our virtual environments are rsync'd to the app servers so we can avoid installing developer tools.


I'm ditching virtualenvs and going with old good Debian packaging and private APT repository.

For VMs/containers that already run a single application, except for some weird edge cases, there's really no point in having a virtual environment in a virtual environment.

I have initial success with a few simpler projects, now looking into transitioning more complex ones. Not sure whenever it'll go without any hassle, but seems worth trying. At worst, I'd just waste my time and return to virtualenvs.


You could also try virtualenvs inside Debian packages:

https://github.com/spotify/dh-virtualenv

One reason to keep virtualenvs is that the system Python (VM or container) includes extra Python packages that your app may or may not need. If you use a virtualenv, you exclude these system-installed packages and guarantee a clean starting point.


I find the problem with this approach comes when you want to ship a package (ie: requests) that is newer than whatever is packaged for your OS. Then you either have to repackage OS packages, taking on the corresponding duty to keep them patched. I am using wheel files in production for this very reason


Same here. As a bonus, it makes it easy to create a bare-metal OS installation image that includes your app.

I'm running a script right now that generates an ISO that turns a brand new machine into a server running our app with a template DB in completely unattended fashion.


I'm not sure how to interpret that quote.

In the context of Docker it's inconvenient, but entirely straightforward to create images that don't include those elements.

It's a matter of creating an unwieldy chain of build steps to avoid committing intermediate containers.

If/when something like this [1] gets merged things will be greatly simplified.

1: https://github.com/docker/docker/pull/8021


I don't know what you consider an unwieldy chain of build steps, but for Ruby it's simply a matter of building the container, and run it with your app directory (or a suitable build location) mounted as a volume, install dev-dependencies then executing "bundler install --standalone --path vendor", and subsequently using that as the basis for building the final container image.

You can make that cleaner by making the build steps into one Docker image, the final app into a second, and have them share a base image that contains all the basic dependencies.

For Ruby at least the intermediate build step would typically only need to be re-run whenever your Gemfile/Gemfile.lock changes.


Python's new wheel (https://pypi.python.org/pypi/wheel) package format might help you avoid having build tools (ie, for numpy)


I've done some experiments packing gunicorn+ a wsgi app with PEX[1] to have a standalone executable.

Seems to work pretty well, although I haven't tried it for anything production related.

[1] http://pex.readthedocs.org/en/latest/api/index.html


I'm assuming your app servers are traditional VMs?


Yes, so the attack vector is still higher than other solutions outlined in the article, but we've managed to avoid installing stuff like gcc. It's also significantly cut down our deployment time, especially when updating libraries/requirements.


Forgive me if I'm totally clueless, but isn't the idea of the unikernel basically a throwback to the earliest, pre-OS days of computing when all programs needed routines to initialize the base hardware resources before they could perform tasks?

The idea of the unikernel and the libOS in general where applications can be linked with their bare minimum OS runtime and packaged is certainly nifty, but it's kind of funny that people are being so hyped over what sounds like a more advanced form of what was regularly done in mainframes 60 years ago.


Because in 60 years the OS got useless layers of abstractions when used as server OS with programming languages that come with batteries included.

If the programming language has a rich ecosystem with a runtime that is already taking care of hardware abstractions and scheduling, why replicate it a few times in lower layers?

How many schedulers or device drivers are needed to serve network requests?


It's only useless until you discover that you're reimplementing it from scratch on a full-time basis.


I assume you program in pure C without libc, using only syscalls for your OS of choice.


Because you don't use libraries?


And it's what's being done now on mainframes too. What was lacking in the commodity processor space was hardware support for mature hypervisors that stay out of the data plane (IBM got this right pretty close to the first try). We're almost there, but there's still some work IMO.


OS/400 with kernel JIT, quite a nice design.


Actually OS/400 isn't quite what I'm talking about (despite being really cool!). {AS|OS}/400 is more like a cross between a minicomputer and all of those promises of a Java OS we were given in the 90's.

I'm talking more about the mainframe level stuff, like z/TPF on the software side and adding address space tags to the channels on the hardware side. Basically that last one is a better implementation of an IO/MMU where a device knows that it's probably running under a hypervisor, can get requests directly from multiple VMs without the hypervisor's intervention, and the VMs requests are implicitly tagged with the address space of the VM so it's still memory safe. ie. a VM can't request DMA from a device that would be outside of it's allocated space, but it can still directly ask the device for DMA without involving the hypervisor directly, and the device can service multiple VMs (that last bit is what isn't really present with current IO/MMUs).


I used to start backup jobs on a AS/400 during a summer internship back in 1994, just logged in and started the respective job. Only more than a decade later I bothered to delve into how it actually works.

I was quite surprised to discover that Java, .NET, Android, Windows Phone concepts were already successfully implemented in the marked in such systems, so many years before. And that the majority of developers out there are unaware of them.

Thanks for the z/TPF overview.


The article is correct that there aren't yet best practices about building minimal and secure Docker images, but it seems like switching to unikernels would be much more work. Unikernels also suffer from the lack of VM resizing and minimum VM sizes being too big in many cases.


Agreed. The real problem is with the current tools for building images.

As a proof of concept, several months ago I built a few tiny Docker images using musl libc and no package manager. But I had do deviate from the normal image build process to do so.

http://mwcampbell.us/blog/tiny-docker-musl-images.html


Ha, I wanted to do something similar to build tiny Virtual Box machines for network labs. I too ended up reading about sabotage linux. Great link !


I am a big fan of these nano-images and really want to support them better in Docker's builder. See "nested builds" and "image squashing". Basically, you should be able to define a Dockerfile for the build environment, then within that define how to produce the final image. Then you need squashing to avoid carrying the build layers.

I personally played with aboriginal Linux, but I believe it's the same idea :)


<future>I'd like to see $CLOUD selling micro-VMs with burstable memory in the tens of MBs, billed by actual CPU time used, similar to what AWS Lambda does.</future>

As for lack of VM resizing, that is a hypervisor/Unikernel implementation detail.


The problem with micro-VMs in the current world is the overhead of the "control plane" eating most of the VM's capacity. I'm not referring to the "OS stuff" (in Linux, kernel daemons and so forth) but rather the software components any sensibly-engineered application needs if it's going to participate in a larger SOA architecture: logging, queuing, access control, service discovery, load monitoring, subprocess supervision, etc. These things (or at least stub clients for them) need to exist in each and every VM, and making the VM a single-purpose worker drone doesn't eliminate the need for them.

If you want to see what happens when you aim for a single-purpose isolated unikernel design, but then bake it fully for operational requirements, look at an (embedded release package of an) Erlang application. There's still a lot of "stuff" there—a lot of attack surface that has nothing to do with achieving the purpose of your app per se—but it's all necessary to keeping your app healthy and stable in the greater ecosystem of services it interacts with.

(This is presuming that you can't just shunt off these responsibilities to the hypervisor. If logging means "your unikernel writes to the console and Xen pipes it to rsyslog" then a lot of problems do go away.)


> Unikernels also suffer from the lack of VM resizing and minimum VM sizes being too big in many cases.

Could you clarify what you mean here? Is this specifically a Unikernel problem or an ecosystem problem (in terms of actually trying to deploy Unikernels in the wild)? If so, those seem like different issues and should be discussed separately.


It's a cloud provider problem, but I don't think a technology can be usefully separated from its ecosystem. The cloud can remain irrational longer than you can remain solvent.

On a technical level, VM memory hotplug is probably necessarily slower and flakier (ACPI anyone?) than changing one setting in a cgroup.


KVM has a memory balloon driver that is pretty simple.


I think the biggest problem with Unikernels that I haven't seen addressed is hypervisor inefficiency. Emulating any part of a kernel or multiple kernels will just be slower. You have 20 guests on your hosts? That's 20 probably-overlapping (uni)kernels running.

Sure you could optimize the heck out of the hypervisor, but now you've created a kernel. And your applications run on that kernel.

With containers, you have one kernel that won't have to instantiate 20 drivers for the disk subsystem. It can be smarter because it knows more about the loads. It's what kernels have been built to do since day 0.

My main concern with unikernels is that eventually the hypervisor will need to be a kernel to be any more optimized. I just worry it will be come something of a self-defeating concept.


AWS, Rackspace et al. use hypervisors already, so we're trending toward 4 layers:

  hypervisor -> monolithic kernel -> containers -> application
Unikernels collapse it to:

  hypervisor -> unikernel/application
It's certainly more elegant, although I'm skeptical of the purported performance gains as well, simply because so many optimizations have been thrown into traditional kernels.


If you are using a more minimal hypervisor (see my other comment on the parent), then there do seem to be some measurable gains. I've seen a few papers in this style:

https://www.usenix.org/conference/osdi14/technical-sessions/...

We describe the hardware and software changes needed to take advantage of this new abstraction, and we illustrate its power by showing improvements of 2-5 in latency and 9 in throughput for a popular persistent NoSQL store relative to a well-tuned Linux implementation.

That said, a simple application like memcached might be currently latency-bound by the kernel's network stack, but a more complex application that reads from disk (even SSD) won't be.


If you're running a service, then you can use a much trimmer hypervisor, e.g.

https://github.com/siemens/jailhouse

Since the guest unikernel isn't a full kernel, the hypervisor interface is much more minimal, and the few host features it needs can be delegated to the CPU via VT-X (e.g. page table mapping).

At least, that's the dream. (I've never actually used Jailhouse or tried any of the research projects attempting this.)


This is unavoidable, because the aim for unikernels is to run on the cloud as a platform. They don't get to do or see anything in the hypervisor (or between guests) because the hypervisor is owned and managed by someone else who very much does not want you to see what's going on.


The idea is to avoid overlapping and emulation. You can for example pass through physical devices to the unikernel.


There's so much wrong with this post, I don't know where to begin. The idea that security is based on removing files, and not the holistic auditing and hardening of a system. The idea that you can't remove a compiler from a system image before packaging and deploying it (seriously? you don't know how to remove a file before you run a packager?). The idea that you have to ship an entire image to update a couple files. The idea that the entire design of an operating system (which is designed to make it easier for programs to run and interact without having to be tailor-made) is obsolete. It's like this guy has never held an operations job in his life, yet he's telling people how systems should be managed.


If you have an idempotent service, why do you need the accumulated cruft of 40 years of bad ideas to provide it?


So, UNIX boils down to "the accumulated cruft of 40 years of bad ideas"?

I'm not disagreeing on the principle of immutable servers, but that's a pretty bold claim.

I don't see getting rid of the "accumulated cruft" as being a particularly interesting reason for exploring the unikernel or immutable server concept. The benefit is in building for scale and redundancy. The lighter your image, the easier it becomes to replicate it and maintain it, generally speaking.

Further, there is an argument to be made that building your own cruft into your system is counter-productive compared to letting the operating system cruft handle it. The Linux developers are probably better at it than you or me; unless we understand our usage patterns dramatically better and the options for optimizing them, it may be best to trust the OS "cruft" to do the right thing.

In short, I'm not really taking a side on this one. I believe there is interesting research to be done, and probably useful outcomes to be found, in this direction. But, why dismiss 40 years of operating system refinement by some of the brightest minds in the world as "accumulated cruft...of bad ideas"?


Because that cruft has grown up in service of a timesharing model of computation that no longer holds. Why does my phone have a root user? Why do I have to escalate privileges to bind to ports < 1000? Why am I context switching at all?

I think it's a mistake to conflate path dependence and correctness.


Actually, CoreOS is a platform designed for orchestration and management of Docker instances. It’s not intended to be used as a base image for Docker containers. Specifically, CoreOS is based on Gentoo Linux, but the recommended base Docker image is Debian.

Well I learned something...


You learned wrong. CoreOS is a derivative of Chromium OS, which uses Portage as its package manager. Simply typing "emerge" into a window does not Gentoo make, which is a bummer, because it shows a lack of research on the part of the author (that really came out in other areas, too).

It's also largely irrelevant, because CoreOS should in practice be read-only when you boot it, and you're not extremely concerned with the details of how it's put together (which its usage of Portage is).


Gentoo's Portage is incredibly powerful. Using it to build your own distro is one of its greatest strengths, which is why Google also uses it for building ChromeOS images.


The ideas here are all very interesting, but I don't think we need to even discuss issues with Docker to find the idea of an immutable server interesting.

I also don't find the "problems" with Docker overly problematic.

* The use of many images is probably(?) not an issue? Do people just use "any old base image" without further thought? * An image of a few hundred megabytes isn't small, but it's not terribly large either.

Lastly, I see people's confusing over what CoreOS is besides the point. What it is becomes pretty apparent after taking a look at coreos.com.

Overall I really like the idea of an immutable server though!


One thing to be careful with these models is that you're moving the burden of maintaining libraries to the application code.

So instead of updating packages and what not, you rely on the developer to update the libraries and reship.

Sure its not far from today's model if the dev has to ship the whole container, but it also makes it even harder. How do you know if you have lib x or z when it's sometimes just dropped among a bunch of files? I think it's much worse. it hides the problem and makes it difficult to detect.

I'm suspecting kernels will slowly converge toward plan9-like functionality instead. It makes more sense. It's faster, more efficient, simpler.

The main barrier so far has been portability - but with more and more apps being written on very portable languages (python, Go, C#, ...) its becoming easier.


For Mirage OS, all libs are released as packages in OPAM [0], so it's really straightforward to find out which versions you're using (and manage/update/remove them). In fact, we just did a set of releases recently [1]. I'm not sure how it is for the other systems.

[0] http://opam.ocaml.org

[1] https://github.com/ocaml/opam-repository/pull/3028


I've always been a proponent of autarchy in these matters, and so am intrigued by the idea of the unikernel. I'm also going to spend some time with OCaml, so Mirage looks like something that might turn out to be really fun.


Please do get involved! If you're learning OCaml, then http://realworldocaml.org is a great resource. When you start trying out Mirage, join the mailing list and let us know how you get on. Finally, to see where we can take this tech, have a look at http://nymote.org


Immutable systems are on the rise, and I'm glad that it is getting more developer mind share. I literally just wrote an article about this a couple days ago ( https://medium.com/@marknadal/rise-of-the-immutable-operatin... ) and since then I've seen like 3 other posts about it on top of HN.


> Heroku is a great example of immutable servers in action: every change to your application requires a ‘git push’ to overwrite the existing version.

Um, not it's not.

Heroku's buildpacks code caches a hell of a lot of stuff on each execution agent. Still more code has to recognise and try to repair various broken states. It's mutability, through and through.


How far away are Unikernels from being adopted in production ? Does it make sense to consider them for a project starting today ?


So what do you think of things like Apache Mesos in this regard?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: