Hacker News new | comments | show | ask | jobs | submit login
Unikernels: Rise of the Virtual Library Operating System (acm.org)
44 points by tizoc 1468 days ago | hide | past | web | favorite | 22 comments

Is the idea that you'd compile a virtualized OS in with your application to produce one really streamlined VM appliance? I get that you'd be able to avoid the overhead of the OS in the VM but effectively making the VM a single application again.

Is this any better than the Docker method of reusing the same base-OS and compartmentalizing the applications? Is there that much to be gained in avoiding kernel/user-space transitions?

I think the real problem with this sort of idea is that, in the end, you're just reinventing processes. There's already a way to write an isolated single purpose application and run it on a server: fork() and then exec().

If you want to bring the isolation level of that process down to just absolutely what it needs to run we've got things like jails and cgroups. You could probably run a Go app with no access to the filesystem since everything is linked in statically anyways.

I think it misses the reasons people are excited about virtualization. Reproducibility and uniformity of environment has a higher value than isolation to most software developers. The priorities may be inverted on the sysadmin side, but I don't think so far as to justify this kind of approach.

My understanding is that the goal is not isolation, but performance. You can remove large chunks of the OS which you don't need. You also don't have any overhead from system calls since all code runs at the same privilege level. This is possible because (in theory) you can't execute arbitrary code. All the executable code is baked into the kernel at compile time and the page tables are sealed so no new code can be loaded.

You can achieve the isolation with jails and cgroups, but not the performance improvements.

As mentioned in a sibling you still have the hypercalls, and you definitely need those to still be present if you're running at ring 0 since, essentially, direct access to the hardware is probably an opportunity to attack the whole physical system (since hardware often has arbitrary bus access). Never mind the need to arbitrate access between multiple VMs.

And this is what I mean when I say that taken to its conclusion you're just reinventing processes.

I think this kind of performance claim needs to be solidly proven by something at least vaguely like a real running application to be taken as a given.

Fair point. More benchmarks need to happen before it's obvious this is really a win. A real application would be nice. I'm biased because I worked on a similar idea myself and I've been waiting for this to come. I think it's a potential win now for running on public clouds.

However, as Docker PaaS gains popularity, that may be a better alternative. Only benchmarks will tell :)

What did you work on?

A similar yet much simpler idea of porting some simple application code to MiniOS. Although I never ended up with anything of value.

You can use I/O virtualization to allow direct hardware access in a safe fashion, assuming that your CPU and peripherals support it.

This isn't the attack I'm referring to. The peripherals themselves have, potentially at least, complete access to the bus through DMA, so being able to convince them to, say, write to an inappropriate physical address (say the hypervisor's kernel), could lead to a significant breach of the security model. As far as I know, no processor-level features actually protect against this.

You've eliminated system calls but you have hypercalls; it's not clear whether this is faster than a container-based system that has system calls but no hypercalls.

True. Of course one other advantage is that you can run a unikernel on a public cloud. You can of course run OS to serve as a host for containers on a public cloud, but then you have an additional layer of overhead.

I wish there was more experimentation on cloud architecture; VMs aren't the be-all and end-all of the cloud IMO.

> Reproducibility and uniformity of environment (...)

Shouldn't the unikernel approach actually improve reproducibility greatly? You build your application and all it's dependencies together, that should run exactly the same locally or on your Xen cloud.

That's exactly right. Since everything's a library, the normal OCaml dependency analysis pulls in a complete manifest of everything that goes into the final output. In the Xen output mode, this implies that the manifest contains everything. In the Unix backend, you still need to package up the kernel and library dependencies.

An example of this is the Mirage website itself, where all the kernel outputs that are live are stored in GitHub at https://github.com/mirage/mirage-www-deployment -- an explanation of the Travis CI workflow is at http://www.openmirage.org/wiki/deploying-via-ci

They're not on the same scale. It does improve both, but it improves isolation to a much much larger degree. It also makes the world inside significantly different from the world outside, so it takes a different knowledge base to be able to program effectively. Particularly if the inside is an OCaml program, tbh.

Is there that much to be gained in avoiding kernel/user-space transitions?

Yes, there's at least an order-of-magnitude improvement in packet processing, for example, if you bypass the kernel.

Our mobile app runs WebSocket-like connections over UDP with libsodium for crypto, and we're moving our stack off of Node.js for exactly that reason.

I don't get this. Maybe you can win big if you start with an inefficient system but a run of the mill Linux box has been able to saturate gigabit links without breaking a sweat for a decade if not more.

Things like file descriptor limits or numbers of connections are more of a pain in modern times, not necessarily a context-switch caused problem.

Linux (and Unix in general) is "control plane" software. It was literally developed to replace the people that used to patch phone calls using physical cables.

That thing they were patching? That's the "data plane". It was incredibly high bandwidth relative to the control plane, since it was completely optimized for moving data.

Back to Unix. Unix is designed for the control plane. It is not designed for rapidly moving large amounts of data with maximum efficiency. Why do we use it today for what are arguably "data plane" tasks? Well, when all you have is a hammer...

Nowadays, you can have both on the same machine: run Linux on the first core or two, and reserve the remaining cores for your app. Along with huge page allocations to reduce TLB impact, you can literally own all CPU activity on those cores, and lock all of your RAM too.

That app is a normal Linux app, but it runs on the raw hardware—like not having an operating system at all. When you also give your app complete control over the network hardware, you've completely bypassed the kernel.

With UDP, you don't even need a networking stack, making this approach particularly attractive. Another poster mentioned saturating a 10Gb link. How about saturating four 10Gb links on a single machine? It can be done with the E5 processors and the software architecture I described above.

We're shooting for 10 million packets processed per second on a single, ~$20K machine. That's pretty sweet if you ask me, and a hell of a lot more than Node.js can do.

Gigabit is slow. Saturating 10Gb+ is where the sweat might be broken or not. See eg netmap https://github.com/aarrpp/netmap which is in FreeBSD.

No, the idea is a bit broader. By structuring your application in a modular set of libraries, you can break up the "ambient dependencies" (e.g. the monolithic kernel) into what your application actually needs.

Once that's done, the Xen backend is just a matter of filling in the missing kernel components with OCaml libraries (or, in the case of OSv, with C libraries, or in HalVM's, Haskell libraries).

In the case of MirageOS though, we're using this fine-grained dependency control to implement other backends too. For instance, compiling the same source code to run as a FreeBSD kernel module or as a JavaScript library. There's already a Unix backend, so nothing special needs to happen to run it under Docker.

Database storage managers could get some advantages by having direct access to the memory page maps.

Those are great questions, I was wondering about the latter myself.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact