Hacker News new | past | comments | ask | show | jobs | submit login
Unikraft – Fast, Specialized Unikernels (arxiv.org)
138 points by nderjung on April 27, 2021 | hide | past | favorite | 72 comments

Unikernels are probably best compared to RTOS that are used in embedded systems. They dramatically reduce the guest operating system overhead for virtual machines.

As the weight of the isolation technology gets smaller, we are going to see an increasing use of it as a form of security, and scalability in application designs. We've seen the first wave with microservices, but we'll see more.

I wonder how much time will pass until application compilers start outputting unikernels

Couldn't you make a real time unikernel? Basically you should be able to modify an existing unikernel implementation to make it real time.

> Basically you should be able to modify an existing unikernel implementation to make it real time.

It's very very difficult to hack in real time properties after the fact. Most successful versions of this end up hacking in a virtualization layer to run the code that wasn't designed to be real time in the first place rather than trying to make that code play nice cooperatively with a real time system.

RTLinux is a good example which uses Linux to bootstrap a microkernel which then virtualizes the Linux kernel that booted it.

What kind of guarantees, mechanisms and instrumentation can unikernels provide in this space for real-time execution of code for mission critical systems?

I think it should be able to provide the same guarantees as a RTOS if you implement them. But then, would you call it a unikernel still or a RTOS?

You can and there are a few use-cases we've looked at - v2v && autonomous driving.

At John Deere they compile various "applications" into their home-grown JDOS (or rather for a given controller, they compile several apps and the OS into a single 'executable'). I.e., the OS is a library, which I think is the definition of a unikernel, although perhaps not since there are often several apps on a controller?

Typically unikernels have a single application in them because then you can fully specialize the entire image to the needs of that single application. Having said that, it is entirely possible to place multiple applications in one unikernel, each in its own thread; we have done a few of these in the past.

I'm not an expert, but from my beginner's view, unikernels feel like a hack to make hypervisors work as an app orchestration platform. Maybe this is useful regardless :)

In the MS-DOS days, I used to play games on the PC, and arguably, these were all unikernels. Typically, the game had its own set of drivers and was basically running in ring 0, and immediately after start it bypassed MSDOS and did most of the HW stuff itself. This was usually good for the game because it allowed it to squeeze every last bit of performance from the machine, being completely in control.

So that's definitely a possible use case.

What is a late '80s kid with a box of 5¼" floppies if not an app orchestration platform?

It's similar for for console games up until the sixth generation, also many earlier systems didn't even have a BIOS.

Unfortunately, today most graphics hardware relies on drivers for specific versions of specific operating systems.

> It's similar for for console games up until the sixth generation, also many earlier systems didn't even have a BIOS.

Mmm? Earlier and every console system generation have a BIOS even a Atari 2600 have it.

Unless I am misunderstanding your comment?

Are you sure it was MSDOS and not BIOS, or to be even more specific, BIOS video services?

MS-DOS is about files -- and AFAIK most games did not re-implement that part, nor they re-implement BIOS drive handling which was sometimes machine-specific. I certainly remember playing games from Novell Netware shared drive, which would be impossible if they re-implemented filesystem support.

And none of the games I know was taking over the boot process as unikernels do. It was always: start DOS, configure system with MSDOS's CONFIG.SYS / AUTOEXEC.BAT, and only then start the game. Not very unikernel-y at all.

Now BIOS video services were very common. BIOS "int 10" was great at setting video mode, but it sucked for everything else -- even in the text mode, we often read/wrote to framebuffer directly.

Not sure about keyboard -- the hardware is fairly simple, so it would be easy to reprogram... but also BIOS did a satisfactory job handling it, so it was OK as long as you keep interrupts enabled.

And as for mouse and sound card, those did not have any DOS drivers to begin with.

some games from 80ies didn't even try to retain MSDOS in memory. The only way to "exit" the game was to reboot.

Atari 8-bit games loaded from the boot sector so 'DOS' never even got loaded to begin with. There was still 'bios' ROM support for basic I/O (read sector, etc).

Kind of, but is that really a problem? Virtualization provides a layer of security and isolation that containers still haven't (and maybe never will). For a long time just the fact you could run multiple apps on a single server was a breakthrough and for the most part people weren't all that concerned with performance.

Now that it's matured, people are back to optimizing for performance, and this is one of the ways to do that while maintaining the security that virtualization provides. One thing that seems to remain true is that the pendulum of flexibility <> performance is always swinging back and forth in this industry.

May you elaborate why virtualization provides a layer of security and isolation that containers still haven't (and maybe never will)?

1) VMs have hardware backed isolation - containers do not.

2) Containers share the guest kernel. To elaborate many/most container users are already deployed on top of vms to begin with - even those in private cloud/private datacenters such as openstack will deploy on top since there is so much more existing software to manage them at scale.

3) Platforms like k8s extend the attack surface beyond one server. If you break out of a container you potentially have access to everything across the cluster (eg: many servers) vs breaking into a vm you just have the vm itself. While you might be inside a privileged network and you might get lucky by finding some db creds or something inside a conf file generally speaking you have more work ahead of you to own more hosts.

4) While there are vm escapes they are incredibly rare compared to container breakouts. Put it this way - the entire public cloud is built on virtual machines. If vm escapes were as prevalent as container escapes no one would be using AWS at all.

I agree, an argument for 4 is the fact that the hypervisor attack surface can be scaled up and down by adding/removing virtual devices. There is only a little set that stays permanently, like 30+ hypercalls on Xen. Overall compared to a standard OS interface (Linux has in the range of 350+ syscalls) this is still very little. The Solo5 VMM project tried even out another extreme by reducing the hypercalls to less than 10 if I remember correctly.

It's also worth mentioning that a hypervisor's API, like Xen's, is much more stable; the Linux one is constantly growing.

Very true. And we also did not speak about the heavily multiplexed system calls like `ioctl`.

> the entire public cloud is built on virtual machines

Some cloud providers will trust containers to isolate different customers' code running on a shared kernel, but it's not the norm. I think Heroku might be one such. There's at least one other provider too, but frustratingly I'm unable to recall the name edit found it, it was Joyent, who offer smartOS Zones. [0]

[0] https://news.ycombinator.com/item?id=25838037

Hence why everyone is now turning containers into micro-VMs, thus the sales pitch from containers is kind of waning.

I fully agree with the statement "to make hypervisors work as an app orchestration platform", although imo the real hack is using two full blown general purpose operating systems to run a single application which is precisely what most people are doing when deploying to the various clouds today.

Unikernels have many advantages and one is removing the complexity of managing the guest vm. People really haven't had the option of removing this complexity since creating a new guest os is time-consuming, expensive and highly technical work. When you have a guest that knows it is virtualized and can concentrate on just that one workload you can get a much better experience.

Agree that one of the major downsides of most unikernel projects so far has been how difficult/time consuming it was to create each unikernel -- oftentimes for each application. That's one of the explicit goals of Unikraft, to make it much easier, or seamless, to run a wide range of mainstream applications and languages.

VM machinery has been hardware optimized for a decade. Also, hypervisors have much reduced attack surface compared to a shared Linux kernel. They're better for long-running production systems than containers.

Not necesssarily, there can also just be performance and security benefits that result from not having to include all the random stuff you need for an more general purpose OS.

So much spin to unpack in that statement.

Aren't containers a hack to prevent issues in libc and shared libs from impacting application deployment?

Guest kernels are already "the application", unikernels remove overhead, everything else is basically the same abstraction applied to itself.

"hypervisors as an app orchestration platform" sounds like a good description of Kubernetes too (if we ignore the difference between OS-level virtualization and full VMs).

Well, I'm pretty sure this wasn't their intended purpose :). To some extent, they come from the observation that the hypervisor already provides strong isolation, and so having things like multiple memory address spaces, syscalls, etc just brings overhead. The second point is specialization: unlike a general purpose OS like Linux, each unikernel image can be tailored, at compile time, to meet the needs of a single (or a a few) target applications. As a result of all this, it is entirely possible to have unikernels that beat Linux in terms of boot times, memory consumption, and even throughput, as explained in the paper (they can even beat bare metal Linux).

How does this differ from MirageOS? ( https://mirage.io/ )

Our evaluation using off-the-shelf applications such as nginx, SQLite, and Redis shows that running them on Unikraft results in a 1.7x-2.7x performance improvement compared to Linux guests

AFAIK MirageOS only supports OCaml and thus not any of those applications. That is an intentional design decision to get dead code elimination, global program optimization, etc.

(And FWIW I think "monoglot" OSes are a mistake for this reason; Unix is polyglot and that's a feature, not a bug. Even in contexts where a unikernel would make sense, you may need to inspect the node dynamically.)

It is true that MirageOS has its focus on OCaml as a type-safe language. Its libraries are implemented in OCaml. However, even with having most parts written in C in Unikraft, we are also able to apply dead-code elimination and link-time optimizations with the compiler. In principle you should even be able to combine libraries that are written with different languages.

Regarding inspecting: The design does not exclude that you could add libraries to enable inspection and monitoring. We are even thinking that this is quite important for today deployments: Imagine an integration with Prometheus or a little embedded SSH shell.

The difference to Unix and what we want to achieve is to utilize specialization. Unix/Linux/etc. are general purpose OSes, that are very good fits for cases where you do not know beforehand which applications you are going to run on it (e.g., end-devices like desktops, smartphones). Unikraft wants to optimize the cases where you know beforehand what you are going to run on it and where you are going to run it. It is optimizing the kernel layers and optionally also the application for the use case.

Thanks for the clarification. That does make sense but I guess I have a hard time coming up with use cases for it. That doesn't mean they don't exist of course!

In my mind unikernels in the cloud specifically don't seem that different from Unix, because you're running on a hypervisor anyway. If you zoom out, there could be 64 OCaml unikernels running on a machine, or maybe you have 10 OCaml unikernels, 20 Java unikernels, and 30 C++ unikernels.

That looks like a Unix machine to me, except the interface is the VMM rather than the kernel interface. (I know there have been some papers arguing about this; I only remember them vaguely, but this is how I see it.)

It was very logical to use the VMM interface for awhile, because AWS EC2 was dominant, and it is a stronger security boundary than the Linux kernel. But I do think the kernel interface is actually better for most developers and most languages.

And platforms like Fly are running containers securely: https://fly.io/docs/reference/architecture/

Apparently they use Firecracker VMs which are derived from what AWS lambda uses internally. So I can see the container/kernel interface becoming more popular than the VMM interface. (And I hope it does).

To me, it makes sense for the functionality of the kernel to live on the side of the service provider than being something that the application developer deploys every time. Though Nginx, redis, and sqlite are interesting use cases ... I'd guess they're the minority of cloud use cases, as opposed to apps in high level languages. But that doesn't mean they're not useful as part of an ensemble, most of which are NOT unikernels.

From the paper:

"MirageOS [40] is an OCaml-specific unikernel focusing on type-safety, so does not support mainstream applications, and its performance, as shown in our evaluation, is sub-par with respect to other unikernel projects"

This approach competes with an easier and more widely used method of running a regular process on an "isolated core", and using "kernel bypass" libraries. On such a system, the process starts up with full access to system resources while it reads config files, allocates, maps, and initializes memory, and sets up devices, and then enters its main loop where it operates solely on memory-mapped buffers and, possibly, I/O registers, running with no system calls or interruptions by the OS, potentially for months after.

For example, it might use an OpenOnload library to watch a ring buffer being pushed packets by a NIC via DMA, processing the packets as they appear and updating a mapped/shared memory region with the results. It might, further, construct packet images based on its results, or what it finds in memory mapped from another process, and use the same library to trigger the NIC to send the packet images.

Regular, scheduled processes may watch the mapped memory and do less timing-constrained work, such as logging events, performing file system operations, or even operating a UI, and maybe queue requests into more mapped memory polled by the isolated process.

This mode of operation is quite common at high-speed stock trading firms, or to control specialized equipment with stringent timing requirements, without need to deploy a realtime OS.

Usually these systems make heavy use of ring buffers to keep processes decoupled, placed in "hugetlb" 2MB-sized memory pages to minimize or even eliminate memory-map cache contention, by mapping files in /dev/hugepages.

All the regular system facilities still work. The program is built with the ordinary linker. You can attach your regular debugger to the process, at startup or after it has been running for a month.

Can anybody ELI15 what does a Unikraft contains compared to Docker running a Linux distro? Can it install pip (Python) packages with compiled C modules, create and read temporary files, and other container things?

You can create a unikernel with Python - it's one of Unikraft's examples, but you won't be able to pip install modules out of the box.

Unikraft is good if you want to run untrusted code, giving a way for developers to extend functionality on your platform in Python (or JavaScript if you use Ducktape, also in their examples) - kind of like AWS Lambda and the likes. Besides the added security from running in a virtual machine, unikernels boot extremely fast. Without applying any optimizations, I have a QuickJS unikernel that boots, runs JavaScript and shuts down within 400ms (running in Windows -> VMWare -> Linux -> KVM+QEMU). This performance would be hard to get with Docker and a Linux distro.

This does not mean you cannot still use pip to manage Python dependencies. Simply package all the required addditional libraries from pip through a virtualenv/venv, and package this in the filesystem that the python3 Unikraft unikernel reads on boot.

For more details, check out the library for details: https://github.com/unikraft/lib-python3

That's awesome, I stand corrected!

I don't know about Unikraft, but as far as I know, a unikernel is an application specific OS.

Think of it as if Node.js could be directly installed on a PC and no Linux is needed.

Yes, correct, Unikraft generates an OS that is customized to the needs of particular applications.

Oh, this looks really cool! I was just research the other day how I could make a RPI running some simple LED-blinking app boot as fast as possible, this looks quite much faster than trying to strip down linux.

For someone who doesn't know that much about unikernels, is it possible (or a good idea) to run a multi threaded / multi process application on there? I'm thinking of something like a python app + nginx reverse proxy, or a Go App using the built in http server?

Hey, yes, you could and in some cases it might make sense (e.g., you need to do some packet processing as one app but you also want the control plane as another app in the same unikernel). Generally speaking unikernels have a single memory address space so don't support (or don't easily support) processes, so the apps would run on separate threads.

Looks promising!

How do you avoid problem of creating a unikernel that appears to work, until you exercise some different code path at runtime that needs some modular OS feature you didn't think to include in the unikernel?

(I have no idea if this is a real problem, just seemed like it could be from skim reading the basics)

You could try fuzzing. But, if you do not include a modular OS feature(e.g network stack), the unikernel would not compile since it's missing symbols.

Yes, that's right. If you do not include a module that could potentially be used, you'd be missing that symbol and it would not compile.

Ah good point, thanks

A somewhat related question: are the major cloud providers showing any interest in unikernels?

I believe there is growing interest in providing leaner, "trimmed" runtimes for services deployed to the cloud. Today, this is seen largely by specializing the Linux kernel for, for example, container services[0] or in general[1], as much as that is possible (the paper above covers this problem in greater detail). But, Unikernels in themselves are not yet widely adopted. This is the space Unikraft is aiming to enter, providing the ultimate level of specialization for a target application.

It's clear that bigger players, such as Red Hat[2] are interested in the topic of unikernels, and that cloud providers are preparing for this future too [3].

[0]: https://github.com/linuxkit/linuxkit

[1]: https://github.com/hckuo/Lupine-Linux

[2]: https://dl.acm.org/doi/10.1145/3317550.3321445

[3]: https://firecracker-microvm.github.io

Indeed, Red Hat is sponsoring students at Boston University to investigate turning Linux itself into a unikernel, with myself, Larry Woodman and Ulrich Drepper mentoring them (see link [2] in posting above). You can already run some smaller programs this way, either in qemu or on baremetal.

Note that this is a VMM and not a unikernel project. We can create bindings with this VMM for faster boot times and access to fast IO paths.

If unikernel offers excellent performance, faster boot times, throughput and memory consumption. I'm curious what are the benefits of VMM (Firecracker) in contrast to unikernel? Are not those two comparable?

They're improving different aspects that contribute to VM overhead:

> ...and boot in around 1ms on top of the VMM time (total boot time 3ms-40ms).

Firecracker is trying to minimize "VMM time", while the unikernel is minimizing guest overhead.

You can think of the VMM as the toolstack that executes the moment you send a command to start a VM up. Once it's done it hands over execution to the actual VM (i.e., the OS within the VM). Thus the total boot time is the sum of the VMM plus the actual VM boot time. In the past, both of these took a long time. With the advent of unikernels, which can boot in as little as a few milliseconds, the focus has also been on reducing VMM time.

They're always showing interest in lighter weight sandboxing mechanisms. For example, AWS has FireCracker which they use on Serverless, CloudFlare can run WASM workloads in V8, etc.

In general, the clouds want the unit of deployable compute to be as small as possible, because it means they can make more money packing more customers into the same machines.

What would that look like?

AFAIK, today, you can already build your unikernel app into an AMI and run it on EC2. No?

Is there a service which would be more useful for unikernels than this?

One thing would be for there to be an official AWS unikernel SDK - basically, a rump operating system that has everything you need to run nicely on EC2, and nothing else.

Yes, we have first experiments to run on AWS [1], we are currently up-streaming the left pieces so that everyone can try it by themselves. In my point of view, a main difference to rump is the finer grained modularity of our libraries. In theory every library (which implement OS primitives, like thread schedulers, heap management and APIs/ABIs (e.g., Linux-Syscall ABI) can be individually selected and replaced. This is following our specialization vision: Take only components that you need and choose the best fitting ones for your use case. This could mean that for a virtual network appliance, you may end up writing code to the virtual NIC drivers as close as possible. Basically you won't use a standard network stack or a VFS, you may even want to get rid of any noise caused by a guest-OS scheduler.

[1] https://www.linux.com/news/cut-your-cloud-computing-costs-by...

How maintainable do you think this is in the long run in comparison to e.g. using a Linux based unikernel? Do you think you can keep pace with the speed at which features are added to Linux?

This is my dream. Embed all my templates, configuration files and everything into a single static Go binary. Deploy it onto a unikernel that has basically zero attack surface because it just runs this one binary and nothing else. Connect it to a database service (via a config setting with the connection string embedded in the binary).

I can almost do this on AWS at the moment, I think, though I haven't tried yet and it looks like a big learning cliff from here. Something to make this easier would be a huge win.

For example app services for language runtimes like Java and .NET.

Are there TCP/IP network stacks available for Java or .NET ?

With Unikraft as being a librarized unikernel system you can actually choose if the OS layer should provide you a network stack (likely written in C/C++) for your runtime or if you prefer doing it in a higher-level language. Similar to this the MirageOS folks developed a network stack in OCaml.

Sure, there are even bare metal deployments available for embedded development.

PTC and Aicas on Java's case for example, or nanoFramework for .NET.

This is true, there are a number of embedded frameworks that you could use as well and even run it as virtual machine too. In contrast to this we want to make it as seamless as possible by still providing you the Linux-OS-like layers if you need them. The goal is that a previously developed app for Linux should be seamless to port. The OS interfaces in the higher-level language should be the same as you have it on Linux, so no code changes.

So is this something I can FROM in Dockerfile? How does it work in regards to Docker/K8S?

Not quite. To get started building Unikraft unikernels, you can use kraft[0] tool which builds a bootable image. At the moment, this is the entrypoint into the ecosystem and allows you to build, for example, NGINX for KVM, Xen or bare-metal.

Having said this, we have on-going work to integrate Unikraft into Kubernetes. We are very excited about this work and will be making noise and providing extra details about this very soon :)

[0]: https://github.com/unikraft/kraft

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact