

My First Unikernel - the_eradicator
http://roscidus.com/blog/blog/2014/07/28/my-first-unikernel%2F

======
enduser
If it helps to add clarity to what this is about, Xen still requires a host
(dom0) operating system, usually Linux but also NetBSD and OpenSolaris. Mirage
OS (the basis for building your own unikernel) allows building applications as
domU kernels. This is advantageous because it minimizes context-switching and
allows all of the code in a Xen VM (domU) to be written in a safer language
than C.

So the dom0 OS (Linux) still determines which devices are exposed to the
Unikernel. The Unikernel implementation is simple relative to a full "bare
metal" OS because it only needs to support the Xen interfaces to block
devices, the network, etc., and does not have to deal with disk drives,
ethernet controllers, etc.

If you are running your own hardware, you're probably better off using
something like FreeBSD + Jails or Linux + LXC (Docker). The Unikernel approach
is more appropriate for situations where you want to deploy applications on
Xen-based cloud infrastructure (Amazon EC2, Rackspace Cloud Servers) and do
not want to waste resources or increase security risk by running a full OS.
The physical servers at Amazon and Rackspace are already running a full dom0
OS (probably Linux), so running _another_ full OS on top of that just to run
your app is inefficient.

~~~
amirmc
Even if you have your own hardware there are scenarios where you might want
stronger isolation, increased density and/or heterogeneous deployments. All of
which is achievable with Unikernels running on Xen.

~~~
mwcampbell
How would unikernels running on Xen give you better density than containers
(which are really just namespaced processes) running on a shared Unix kernel
on bare metal? Wouldn't the Xen-based approach have more overhead?

~~~
avsm
This isn't supported yet, but unikernels on Xen don't currently switch address
spaces (cr2 on x86). We're thinking about how to increase density by
supporting process switching on Xen, which isn't hard, but needs to be done
carefully.

Xen also supports memory sharing among VMs if running in a hardware container,
and Hwanju Kim added support for "PVH" mode to MiniOS recently which unlocks
this functionality in Mirage. Early days on this work, but fun ones, since we
aren't bound by the compatibility constraints of Linux containers, but have
access to the same hardware resources.

------
616c
So I have been following this stuff with a lot of interest, but I am not very
familiar with Ocaml and have tried looking at Mirage docs with some limited
downtime.

If Ocaml can only handle one core and does not do SMP, how does it do in the
cloud? Does this mean Mirage unikernels handle only one processor/core in
Amazon and elsewhere?

~~~
avsm
Two answers: scaling through structured distributed systems abstractions is a
key aim in Mirage. We deliberately want each VM to be predictable, single vCPU
and scale via multiple VMs. It's far more efficient to scale via lots of small
VMs that are scheduled independently than having a few big multicpu VMs (which
have a lot of overhead since the CPU sync also needs to be virtualised). See
our ASPLOS 2013 paper for some simple workloads on this topic.

We are building this library to simplify this style of distributed
programming: [http://openmirage.org/blog/introducing-
irmin](http://openmirage.org/blog/introducing-irmin)

Other answer: we will be talking about our multicore ocaml implementation at
ICFP in September. I still don't want to see it in Mirage though :-)

~~~
wmf
_It 's far more efficient to scale via lots of small VMs that are scheduled
independently than having a few big multicpu VMs (which have a lot of overhead
since the CPU sync also needs to be virtualised)._

Are you assuming that vCPUs are scheduled or pinned? It wasn't clear from
skimming the paper. I would agree that two levels of scheduling are bad but if
that's the problem pinning should fix it.

The situation may change if you eliminate the hypervisor. My intuition is that
a single multi-threaded process with work-stealing will be faster than
separate processes or VMs due to better load balance (see PX vs. 1X dynos) and
faster inter-thread communication (if your app has any communication).

~~~
avsm
If you eliminate the hypervisor, you don't have vCPUs at all, so I'm not sure
how this is a useful comparison.

The question of IPC performance is an interesting one. We've been building up
a database of open source results that show wildly diverging results on
different architectures. Surf through [http://fable.io](http://fable.io) for
it, or read this work in progress:

[http://anil.recoil.org/papers/drafts/2012-usenix-ipc-
draft1....](http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf)
[http://anil.recoil.org/talks/fosdem-
io-2012.pdf](http://anil.recoil.org/talks/fosdem-io-2012.pdf) (fosdem slides)

The TL;DR of these numbers is that it's very hard to make firm performance
hypotheses about IPC across architectures, NUMA and hypervisors.

------
paulasmuth
This sounds like an awesome and fun project to hack on!

However, optimizing around context switches and task preemptions is something
you would usually do if your application is actually bound by IO/context
switching, is extremely latency sensitive or when you are trying to squeeze
the last bits of performance out of a machine. Why did you choose to build
such a microoptimized system in a garbage collected language? Doesn't this
defeat the purpose of the whole exercise?

I feel the need for more powerful types and built-in/standardized exception
handling too, but since performance seems to be one of your major goals,
wouldn't something like C++ be a better fit here? You'd get proper error
handling and a good type system (with some tradeoffs) without a significant
performance penalty.

On a sidenote, I agree that code written in a functional language with a
strong type system tends to be easier to get right than bare C, but this
doesn't imply that all low-level code is bug ridden and unsafe. In fact the
linux kernel is one of the most stable and reliable pieces of software I've
had the pleasure to work with so far. Suggesting there is a problem with the
linux kernel because it contains "a large amount of C code in security-
critical places" seems a bit dishonest.

~~~
andolanra
Don't think of it as optimizing around context switches—think of it as just
omitting what you don't need, and losing context-switches as a side benefit. A
multi-user OS has a lot of stuff which might not be necessary for a single
virtualized service (various security mechanisms, lots of file system
niceties, running other services, &c), so writing a unikernel like this allows
you to select exactly as much as you need for your particular service.

OCaml isn't all that much slower than C++. To use the Programming Language
Shootout as a rough esimation[^1], it can even come close to matching C++ in
certain programs, and is rarely more than three times as slow. And of
course—your type system will catch more errors and your resulting code will be
much shorter (and in my opinion, at least, easier to understand.)

Finally: the article didn't say "the Linux kernel"—it said "Ubuntu." The
kernel itself might be secure and reliable, but a running Linux system is
much, much more than just the kernel. And while C can be security-audited,
many of the properties that are important to verify in a C program come
_entirely for free_ from something like OCaml—e.g., an arbitrary piece of C
code _might not_ segfault given certain input, but a given piece of OCaml code
_definitely won 't_. So maybe a running Linux system is "secure enough", but a
unikernel like this will have a much smaller attack surface and stronger
inherent security properties with basically no extra work.

Disclaimer: I'm not the original author, I'm just speaking generally.

[^1]:
[http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u64/benchmark.php?test=all&lang=ocaml&lang2=gpp&data=u64)

~~~
paulasmuth
> _so writing a unikernel like this allows you to select exactly as much as
> you need for your particular service._

I agree this is a big upside of the authors approach. Less dependencies lead
to fewer problems caused by external/upstream changes.

> _OCaml isn 't all that much slower than C++. To use the Programming Language
> Shootout as a rough esimation[^1], it can even come close to matching C++ in
> certain programs, and is rarely more than three times as slow._

I wasn't only referring to raw execution performance but also to GC pauses and
GC overhead which I think are the bigger issue. The benchmark you linked tests
for compute load so this doesn't really show up. Anecdotal point; in most real
world apps I have worked on the GC was a limiting factor.

> _Finally: the article didn 't say, "the Linux kernel"—it said, "Ubuntu." The
> kernel itself might be secure and reliable, but a running Linux system is
> much, much more than just the kernel._

That's the beauty of having a kernel though. If one of those userland
processes is broken it won't affect the whole system.

> _an arbitrary piece of C code might not segfault given certain input, but a
> given piece of OCaml code definitely won 't._

This assumes that the OCaml compiler/interpreter and the hardware are free of
bugs...

~~~
avsm
Regarding GC pauses, remember that you already have them if your application
stack is written in (Scala,Java,Go,OCaml,Haskell). But you also have other
manual memory management going on everywhere!

With Mirage, it's all amortised in one consistent, fast GC! To give you a
sense of the malloc vs OCaml GC trade off, see
[http://anil.recoil.org/papers/2007-eurosys-
melange.pdf](http://anil.recoil.org/papers/2007-eurosys-melange.pdf)

Malloc and free list management is remarkably complex compared to a fast,
simple GC. It would be interesting to build an OCaml runtime in Rust to
experiment with these tradeoffs in a more controlled fashion.

------
dkarapetyan
I'm a little confused about the direction things are going in. I like high
level languages and I like that the OS manages certain resources so I don't
have to. This guy is writing directly to block devices from OCaml. Don't get
me wrong it's all pretty cool but there is some kind of dissonance there I
can't reconcile. Is Xen the new OS now?

~~~
derefr
Note, though, that he is indeed writing to a block device--which is much
higher-level than writing to a _disk_.

Remember back in the 80s, when the BIOS was actually an effective hardware
abstraction layer--giving you a defined interrupt to ask the BIOS to, say,
write to a disk--and the OS was just for module loading and scheduling and
policy-based security? (Not that DOS did either of the latter.)

Well, Xen isn't the new OS; instead, the domU is the new BIOS, and hypercalls
are the new BIOS interrupts.

I really hope to see Linux redone (or another *nix created) in this
"unikernel" style, where everything hardware-like or HAL-like is taken out,
and instead things like filesystem drivers are implemented directly in terms
of hypercalls.

I also hope to see UEFI reimplemented as a resident domU, such that a plain
old desktop or notebook computer could treat its user OS as a container-image
to be slung around, rather than having it "own" the hardware. UEFI actually
already supports this mode of operation--allowing you to boot "UEFI
applications" that keep UEFI around to provide BIOS-like functionality--but I
don't know of a single OS that makes use of that, rather than overwriting the
processor interrupt vectors and claiming all of physical memory for itself.

~~~
xxpor
>I also hope to see UEFI reimplemented as a resident domU, such that a plain
old desktop or notebook computer could treat its user OS as a container-image
to be slung around, rather than having it "own" the hardware. UEFI actually
already supports this mode of operation--allowing you to boot "UEFI
applications" that keep UEFI around to provide BIOS-like functionality--but I
don't know of a single OS that makes use of that, rather than overwriting the
processor interrupt vectors and claiming all of physical memory for itself.

This is really the opposite direction things are going, especially in x86_64.
PV is horrendously inefficient in 64 bit mode, because of the removal of CPU
ring 1 and 2.

[http://wiki.xen.org/wiki/Virtualization_Spectrum#Problems_wi...](http://wiki.xen.org/wiki/Virtualization_Spectrum#Problems_with_paravirtualization:_AMD_and_x86-64)

HVM allows PCI passthrough if your CPU and chipset support it, which means
domU now has direct access to the hardware, with no dom0/qemu layer to get in
the way and slow things down.

------
j_s
See also this discussion of MirageOS from 2.5 months ago:

[https://news.ycombinator.com/item?id=7726748](https://news.ycombinator.com/item?id=7726748)

------
callumprentice
As someone who knows nothing about lower level OS type stuff, I found this
article very easy to understand, interesting and well written.

Sounds like a lot of fun.

Thank you.

------
shyknee
Mirage was just featured on FLOSS Weekly: [http://twit.tv/show/floss-
weekly/302](http://twit.tv/show/floss-weekly/302)

------
guilloche
Since unikernel still needs to run in hypervisor, so I believe the performance
is worse than docker.

Am I right?

------
n0body
interesting, although the old mantra of "hardware is cheap, developers are
expensive" is still true. you could hire 10 perl/c/c++/javascript/php/etc dev
with ease for your project, but struggle to find one ocaml dev. And even then
your ocaml dev will need to know the mirage library, xen and have a good
knowledge of way more stuff than your project scope

that said, it's still really cool, but it's not something i'd use, and
especially not in production

~~~
rjsw
There is nothing stopping people from implementing something similar in other
languages.

The developer using this doesn't need to know anything about Xen, they just
see a single address space system that runs their code.

~~~
n0body
i didn't say there was, but it's still complicating the task.

i'm not saying it isn't cool, but it's a very round about way of doing
something, and as such it becomes more expensive in development time and skill
required

~~~
amirmc
More roundabout than the current paradigm of duplicate Linux stacks and
supporting services for DevOps just to get stuff deployed? That doesn't make
sense to me.

Your argument is actually "It seems very new. Not enough people know/use it.
Therefore, I won't use it." \-- which is fine, but please don't
mischaracterise it in terms of increased complexity or expense.

~~~
n0body
My argument is that you'll need better developers who cost more money. Not
only that, but you're making the project more complicated. i.e if the project
is to create x, then you have to also create y first and, it's harder to
debug. and you've also got the added overhead of running xen.

all in all it's a nice trick, but it's not very practical, because if it was
practice we'd all be using dos for our vms, since you get the bare metal, and
a little bit of a environment to bootstrap from.

~~~
lmm
> My argument is that you'll need better developers who cost more money.

Frankly I wouldn't want a developer who's too dumb to learn OCaml anywhere
near my production code. Is this thing new? Yes. Will developers take time
(=your money) to get up to speed on it? Yes. But do you need "better"
developers, long-term? I don't think so.

> Not only that, but you're making the project more complicated. i.e if the
> project is to create x, then you have to also create y first and, it's
> harder to debug.

Maybe a valid concern, but I remember very similar arguments from C++
programmers in the early days of the JVM. Turns out the JVM is rock-solid and
nowadays has better debugging tools than those for C++. There's no reason that
couldn't be true for this approach. Or if it's easy to make a multi-target
project that builds both a linux binary and a unikernel image, then debugging
would be no harder than it is for existing OCaml code.

> and you've also got the added overhead of running xen.

If you're already running linux-in-xen then this is reducing overhead. Even if
you're not, it could still improve overall performance by reducing context
switching, in the same way as user-mode networking stacks.

> it's a nice trick, but it's not very practical, because if it was practice
> we'd all be using dos for our vms

This sounds rather like "this can't be a good idea because if it was we'd have
it already". It's only in the last few years that xen and the "cloud" approach
have become so popular, so a lot of new ideas and approaches are still being
found.

