Hacker News new | past | comments | ask | show | jobs | submit login
Why unikernels might kill containers in five years (viethip.com)
76 points by ngrilly on Oct 9, 2015 | hide | past | favorite | 58 comments

I still don't get it.

Yep, unikernels can get away without the security boundary between user and kernel modes... except if you want any security at all in your installation, because at kernel mode people can break out of the container.

They can also get away without process scheduling and IPC, but if you want any multiprocessing on the application, you'll need to bring it either on the kernel or the userland. Ditto for user management.

They can't really get away without device drivers. Ok, except for devices that you won't use, just like any other kernel.

What is that kernel feature that people take away? Or are they just talking about a stripped down Linux without the device drivers (that fit a floppy disk)?

The same applies to the services. What are those services that people gain so much from removing from containers, and why don't people remove them from VMs and real machines too?

The advantage comes from unikernels being even more disposable than containers. Your kernel is built with an absolutely tiny number of options. You wouldn't even need things like the console driver, because if a unikernel crashes, you just restart it, with crash logs sent somewhere for analysis. You can remove a lot of your networking stack, because your HTTP server, that only ever speaks http, can get rid of base things like UDP support. You don't need a full-featured filesystem, because things are only written to at compile time. The number of syscalls you expose can equally be tiny. In short, you have an absolutely minuscule kernel.

From there, you can start stripping out other things. You don't need a huge init system because all you're doing is running a single application. Your libc can be made tiny, or removed completely, because you can strip it of all the calls your app doesn't need. You don't need to run an sshd, because there is no shell to get into. You can get rid of PAM because there's no accounts to access or authenticate against.

On real machines and VMs, these things are necessary so you can run tests, do cleanup, etc. Unikernels are centered entirely around the idea that those things are someone else's job, and if needed, can be shouted for across the network.

> because if a unikernel crashes, you just restart it

Because you've stripped out everything that could supply you with any other option.

Because they're just processes with a funky syscall interface that looks a little bit like hardware. Especially if you start comparing paravirtualized drivers and high performance interfaces like Netmap.

The idea is that your security boundary is at the VM. The VM hypervisor presents a virtual CPU; malicious code cannot jump out of the box.

Many common server frameworks build in multiprocessing to the language runtime: Node.js mandates an event-driven callback style, Go has goroutines, etc. So it already exists in userland. Unikernels avoid duplicating that functionality at the kernel level, so the two systems don't fight each other.

The hypervisor proxies device drivers through from the host operating system (running in Dom0) to the guest operating systems.

The idea is to run eg. Node, or Postgres, or memcached on bare metal. So instead of write() trapping into the operating system, it becomes a library function that directly talks to the underlying device driver. The only process the VM runs would be that one application, and you'd rely on the hypervisor to schedule multiple applications on one physical box.

Mildly interesting historical note -- a few years ago I was reading an old paper on operating systems from the 1960s. As I recall, it seemed to refer to a process as a "virtual machine".

(Sadly, I don't remember the title for a citation here.)

Didn't containers just create a sandbox for the code to run? If they get a full VM, what is the difference between containers and old style VMs?

Anyway, I don't have experience on Go or V8 Javascript, but C pthreads, Haskell threads and Python threads all just encapsulate the OS functionality to add some conveniences. Java does come with a complete threading implementation, and gets no lack on criticism because of that. You may get away running Node on a single task kernel, but Postgres will simply refuse to load if it can not fork some processes and talk to them by shared memory. Apache's behavior is similar.

Containers share a kernel, but unikernels run right on the hypervisor. Basically your program is the kernel, and it's the only thing running in the VM.

Oh, thanks.

So unikernels are not about containers, and I've been mixing two different approaches. It makes more sense now.

I'm gonna stretch things a lot but it reminds me of the standalone webserver versus language embedded webservers.

Before you had to rely on say apache for urlrewrite and routing. Now the data and logic has been lifted into languages.

Unikernel frameworks like Mirage either provide all those features, or in the case of drivers, are just targeting virtio/veth/etc. paravirtualized devices offered by the hypervisor. So the domU doesn't have any abstraction to worry about other than just targeting a particular Hypervisor's representation of storage and networking.

Xen provides very basic (but safe) message/event channels between VMs. Basically this becomes your IPC, and the VMs become user processes.

With a new enough CPU with the correct memory virtualization features, xen can scale to many thousands of domUs. And with each domU only taking up slightly more resources than a typical user process, this is a completely reasonable approach.

Again, your Hypervisor becomes the OS which has to worry about the baremetal, and the VMs replace user processes.

I intend to playing with MirageOS seriously myself next year.

Edit: some dependencies in an example MirageOS app: https://mirage.io/wiki/technical-background#ModularOSLibrari...

But if the hypervisor already has process separation, quota management, device drivers, etc. there's no point duplicating it all in the VM.

> They can't really get away without device drivers

I agree with you 100%, but it does make me think. Every Linux installation I've done in the last five or so years has been either under ESXi, or on AWS.

I'm sure I'm not alone in that, and that's a trend that's going to continue to grow.

How much of the kernel is drivers that are absolutely never going to useful in that scenario?

How much room is there for distributions to start supplying a kernel configuration with half of it never built?

Id point you to osv.io havent really played arounf with iy much though the concept is pretty intriguing .Being able to run the kernel directly on the hypervisor , and containing only your app and whats needed to run , thus obviatibg the need for device drivers with everything running in the kernel address space.Also check out LING for the erlangVM

Now, I'm waiting for someone to discover that unikernels can be optimized if they have some sort of way of directly requesting the host operating system to do things for them.

A call to the system, you may say.

You joke but one nice use of them is a simplified system interface which can allow a traditional OS to give such processes significantly lower privileges, e.g. to sandbox something like mplayer. Unikernels make it possible to play with where you put a bunch of code that would otherwise need to be in the kernel or reimplemented.

Does this mean if I wait long enough, I won't ever need to figure out what exactly Docker is?

That's the case with most technologies.

I managed to skip the whole Angular/Mongo hype, and now that I'm actually getting back into frontend development, it's all about React. Can't say I missed it.

I remember telling my boss at my very first job "Yeah, I want to learn MFC and COM". His response was "Why? By the time you're out of college, everything will be .NET." It turned out he had underestimated: by the time I graduated from college, MFC & COM had been replaced by .NET, which had been replaced by webapps running under J2EE, which were just about to be replaced by Rails & Django.

Actually, J2EE (December 12, 1999, if the Great Wiki is to believed) predates .NET (February 13, 2002, same source).

.NET and J2EE (or C# and Java) never really replaced each other--at best, .NET was MS's attempt to upgrade the MFC/COM stuff to the Java VM world after J++ fell apart. They're more or less the same model, one's just the Windows Server flavor while the other is the Linux/Unix flavor.

The hype cycle was later, though. Microsoft was the dominant tech company in the late 90s, and they liked to pre-announce software, and so everyone knew by 2000 or so that .NET was coming and was going to be the next big thing. In 2000 e-commerce was all the rage, but after the consumer Internet melt-down in 2001, a lot of the effort shifted to enterprise, which was where J2EE was marketed.

I'd say peak J2EE was 2002-2003 - it continued to be a factor up through 2007 or so, and probably still is now but the hype has long since passed it by. .NET had a peak hype cycle around 2000, when it was announced, and also continues to have many satisfied users today but ceased to gain major mindshare right around when Google started ascending in 2004-2005.

How do you stand on VanillaJS vs React these days? I remember some posts from you recommending that path.

I still believe every JS developers should know VanillaJS well. Without those fundamentals, you lack the ability to understand what the framework is doing for you.

React is a nice piece of software, I'm glad to see it introduce FRP to a mainstream audience, and it fixes certain problems around componentization that you will run into if you try to build a big-enough app in vanilla JS. I've been tempted to use it for a couple startup ideas, but then I remember that no app is "big enough" until it has actual users, and getting them is the hard part. So I remain largely agnostic about its usefulness for startups, while believing that it can be quite useful for more established teams.

Sure. With the true AI around the horizon if you wait long enough you won't have to figure anything - it will do it for you.

I wonder when AIs will start outsourcing work to humans.

AI is smart enough to not do that.

s/Mechanical Turk/Mechanical Human/

Interesting, but I’m not convinced.

Size: Each unikernel image contains a kernel specially compiled for the application. A container image properly optimized contains just the application (the kernel is shared by all containers). It’s possible to create Docker containers under 10 MB: see http://blog.xebia.com/2014/07/04/create-the-smallest-possibl... and http://blog.xebia.com/2015/06/30/how-to-create-the-smallest-....

Security: Reusing a well known and largely deployed Linux kernel (without all the surrounding provided by the “larger” OS) sounds at least as secure as compiling its own specialized unikernel.

Compatibility: The compatibility story sounds better for containers because your app will work by default without doing anything.

The original poster says:

"Given that unikernels compile only which is necessary into the applications, the surface area is very small."

I think this is a somewhat strange statement. You don't lower the surface area by not compiling in dead code--unless you're offering a way to run arbitrary code, in which case it can help implementing the anti-features the attacker wants. But to me it seems the primary innovation of unikernels is that they are not implemented in C, but in a higher-level language like Ocaml, Erlang, etc. Those languages, if their compilers and runtimes are implemented correctly, don't allow buffer overflows, and have also other features to prevent arbitrary code execution (type system, or secure process isolation). Those kernels have (given the same attention to quality) the potential of being more secure than the Linux kernel by using such languages. Also, by not having to be a general OS kernel, they are simpler to implement, hence reducing the number of bugs likely to be around.

It's not the "compiling into the application", it's the "implementation of the algorithms" that's smaller. Perhaps that's what the original poster meant, but it sounded strange to me.

As far as security is concerned a unikernel with lower attack surface will beat out a general kernel. A unikernel for AppX will have a completely different attack surface from AppY so most generalized attacks just won't work. Shell attacks like heartbleed won't work because there is no shell. Given that there's a single process, even if you did compromise the system there's not a whole lot you can do with it.

> A unikernel for AppX will have a completely different attack surface from AppY so most generalized attacks just won't work.

If unikernal systems do become popular, it's very likely this would not be true because AppX and AppY would likely share a popular library that they've both been statically linked against. (Ex: An HTTP library with TLS). Granted, the footprint of exploitable features would be considerably smaller

I agree more with the the latter point that a compromised system would probably offer very little for an attacker to leverage.

I imagine that a unikernel management system would have all the modules -- http, tls, the user facing app, etc -- just as object files, so that an update of the TLS library would entail little more than relinking and restarting the new unikernels.

I think you mean shellshock, but people were only vulnerable to shellshock if they were actually using bash for something. And anybody using bash is, by necessity, going to include it in their virtualunicontainerimage.

I think the poster's point is that a unikernel may have an overall lower attack surface, but it now has a different, unique attack surface.

That seems like it might be a legitimate concern.

There's also support considerations to taken into account as well from a vendor perspective.

Can't you already achieve this with seccomp-bpf?

Apparently unikernel images can be as small as 200KB.


That might be only for niche cases, though, but others seem to say it can be under 5MB for more general uses, too:


This is a nice talk on unikernels:


At Boxfuse (https://boxfuse.com) we decided to combine the best of both worlds: dynamically generated minimal Linux-based images. They are just a few MBs in size, generated in seconds and start almost instantly, yet retain the tried and true quality and compatibility of the Linux kernel.

We coupled that with a secure repository and blue/green AWS deployments for a great out of the box experience.

A unikernel can be pretty small, the sample one that comes with Xen builds to 230KB.

Some of the OpenMirage builds are around 25MB.

Regarding security, any distro install is going to have an attack surface several orders of magnitude higher than a unikernel. Does anyone know how the quantity of exploitable vulnerabilities varies with attack surface (ie. SLOC)?

Are we essentially witnessing the death of the operating system?

Docker builds what amounts to giant statically linked binaries including images of an OS as support, while unikernels relegate the OS to library status.

In both cases this is starting to look like the way you deploy to embedded devices: build image, flash image to device. I mean that is some serious full-circle going on there... we've been trying to get away from that kind of thing since computers were invented.

No, just realising the ones we have based on assumptions from the 70s don't match what we really want.

My phone has an operating system that mostly pretends that it's a mainframe. With just a little hardware, I could hook some circa 1978 VT100s up to it and a team of average *nix web backend developers could even be productive.

My "workstation" pretends to be several tens of machines doing all sorts of different things. These boundaries exist for no reason other than to map to organizational concerns -- my CPU does all kinds of computational gymnastics so I can "keep things tidy".

The model is just wrong. Things are immutable that shouldn't be. Things are mutable that shouldn't be. Things aren't distributed that should be. Things aren't durable that should be. Things are shared that shouldn't be. Things are isolated that shouldn't be. Config management, containers, orchestration, stuff like adding BPF to the kernel, Docker, Hadoop, Mesos -- all very cool, but all very much bandaids.

We're working on it but because we're human we can't go straight for the goalpost, we have to play politics, cheat by modeling the future using what we already have today and lead by example.

Interesting comment, but you're missing a critical reason we can't go straight for the goalpost - we don't know where it is! Yes, our computation architecture is sub-optimal for the situation we find ourselves in, but we only know that now because we're here, and these things take a long long time to make it from idea to usable hardware.

Sooner or later computation will have been around for long enough that everything will settle on a common boring architecture (how much steam engine innovation has there been recently?) but until then we'll always be using the systems designed to solve the problems of 10-15 years ago.

To paraphrase Rumsfeld, you don't build on the platform you want, you build on the platform you have.

Unikernels are essentially the far less interesting cousin of exokernels, in that they nominally preserve the libOS angle but are limited to application deployment under a bare-metal hypervisor.

Much of the motivation for microkernel research, too, was for applications to be able to control their workloads at a finer granularity while retaining a full OS environment. Scout and SPIN were such examples. Something as simple as making page replacement and eviction a userland server where each library or application can implement its own policy over the RPC interface, can yield great benefits.

> Are we essentially witnessing the death of the operating system?

It depends what you mean by "death". Chester's First Law applies. Some code, somewhere, is responsible for the services traditionally provided by an OS.

Definitely not. Where do the hypervisor calls of your unikernel end up? In the OS that is running in the Dom0.

If anything this is not the death of the OS, it is the death of POSIX as an application API.

Container images (like in Docker) contain an OS? That's news to me. I thought they avoided having an OS, and instead just have binaries that are run in a very isolated environment.

It sounds like unikernels are still based around hardware virtualization. That seems like a mistake to me. If your image includes the code to switch an x86 processor from real mode into protected mode, you're doing it wrong (IMO).

Container images don't have a kernel, but they do have userland portions of an OS (e.g. libc).

Won't each unikernel image running on a hyper visor also include redundant copies of libraries like libc?

Would depend on the choice of unikernel and how theyr doing their stuff osv.io for example basically run the app in kernel space on top of a hypervisor and inherently as hardware agnostic as possible.

AFAIU the whole thing runs in ring 0.

I don't know...the big win (IMO) with containers is all of the nifty orchestration software in the ecosystem. I wonder about the possibility of "hybrid" approach by launching a unikernel (via kvm/xen?) inside of a docker container, orchestrated by kubernetes/mesos/etc.

This page is a direct copy of https://gigaom.com/2015/10/09/why-unikernels-will-kill-conta... , fix the link?

I like the idea of the small attack service and minimal image size. However, having a different kernel image for each service seems likely to trigger some extremely hard-to-diagnose bugs, which have to include performance problems.

As least with conventional OS you have a chance to get in and figure out what's going on. With Unikernels it sounds as if you may end up with either a core dump (how?) or a misbehaving service you can't access to diagnose problems.

Mirage can be configured to write trace data to a ring buffer shared with another VM, where you can run a visualisation tool to explore it. For an example, see:


Of course, having such a small (and legacy-free) OS makes debugging vastly easier in the first place.

You are right about simplification and I considered that point. Yet the fact that each application is developed on a general purpose platform, then compiled down to a unique production OS image seems to open up a lot of new corner cases involving long tail bugs. We had a similar phase in during the Java transition to JIT in the late 1990s, though that's obviously an antithetical technology approach. Either way you get a lot of ugly crashes.

The key for industrial use is extremely robust dev and management tools. Unikernels are thought-provoking but on their own do not appear to be anywhere near enough to solve the problem they are taking on.

I think the reduced attack surface and faster startup times will have a big impact eventually.

At the moment language choice is a blocker for me. I don't want to have to learn OCaml to play with unikernels. LING looks interesting since its based on Erlang. If I could develop on Elixir that would make a big difference.

I'm torn between a programming language that is safe enough to allow a complete ring0 os to the beauty that is an exokernel. I predict within 10-20 years syscall overhead will dominate anything to the point that mitigation measures will have to be developed - batching, etc.

> If you look at the size of a container instance – hundreds of megabytes, if not gigabytes, in size

Or... 10's.

I think the author is confusing containers and full virtualization which runs a separate kernel.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact