
Unikernels: The Next Stage of Linux’s Dominance (2019) - todsacerdoti
https://dl.acm.org/doi/abs/10.1145/3317550.3321445
======
neilwilson
Seems to me that a unikernel database should be the first application.
Databases tend to bypass practically all the facilities of a kernel anyway.
It’s often surprised me that they haven’t merged before now.

~~~
nick_kline
I think this is the paper
[https://www.cs.bu.edu/~jappavoo/Resources/Papers/unikernel-h...](https://www.cs.bu.edu/~jappavoo/Resources/Papers/unikernel-
hotos19.pdf). There's a great summary by cormacrelf below of some of the
details. This kind of things comes and goes across the decades. Back in the
90s it was common for databases to implement their own file systems or memory
management, but gradually the OSs of the day added features make this
unnecessary. As we found then, you lose so much flexibility to modify the os
via config changes, adding new monitoring or whatever if you had the db
directly connected to the os. I'd hate to give that up for more efficiency.
There's a lot of 'strength' in an OS separating various things from the
database itself, giving that flexibility.

A modern DB needs efficient interaction with the OS of course, but I'd say
based on experience at many companies a bigger challenge than raw efficiency
is the ability to implement changes in the db itself. I've worked on 4 or 5
major databases and we can always identify many ways to improve execution,
plan selection, various aspects of the db, it's just the giant challenge to
alter a big code base that blocks improvements more than OS layers. Improving
plan choice can make queries thousands of times faster - but you have to be
able to implement it.

~~~
DrScump

      Back in the 90s it was common for databases to implement their own file systems or memory management
    

Informix in the mid/late 1980s initially had Shared Memory implementations in
UNIX platforms that supported it. Then, the Turbo/OnLine/IDS servers added the
option of raw disk partition use for database spaces to avoid filesystems
altogether (and do raw unbuffered I/O).

------
MrBuddyCasino
> Unikernels have demonstrated enormous advantages over Linux in many
> important domains

By domains, do they mean that as "actually in use in certain sectors in the
industry" or "a prototype has shown that"?

> causing some to propose that the days of Linux's dominance may be coming to
> an end

Who exactly would make that claim?

So, _besides performance_ : what actual, real-life problems does this solve? I
think there is some overlap with containers, and at this point, replacing them
will require something a lot better.

~~~
microcolonel
> _So, besides performance: what actual, real-life problems does this solve?_

Well, don't go jumping to "besides performance". Dennard scaling is dead, and
the things that can interrupt your program in a full preemptive kernel system
are myriad.

Furthermore, even if your application properly manages and produces
backpressure, the OS can introduce buffering where you don't want it, often by
necessity (to avoid massive context switch costs). Now, if you just want to
manage backpressure over the network, an application-hosted networking stack
is probably a fine solution, but if you want to accurately translate disk
backpressure to network backpressure it gets more complicated.

There are heaps and heaps of ordinary programs written for preemptive systems
that would see noticeable, tangible benefits to users if they were run instead
on unikernels; but the main thing halting adoption is the inconvenience of
adapting programs. I tried getting Capstan/OSv to work the other day and the
documentation dragged me through several apparently-outdated methods of
achieving the same thing, all of which failed in incomprehensible ways.
Tooling for these things could use a lot of work.

~~~
MrBuddyCasino
So, a niche technology for a handful of HFT firms, betting platforms and
hyper-scale cloud vendors. Those already use non-mainstream tech (eg LMAX
disruptor) to achieve max performance.

This is all nice and good, but the Unikernel guys claim since 20 years its the
next big revolutionary thing. In the real world, the evolutionary approach
using containers has turned out to solve the problems most people care about.

Don’t get me wrong, this is fascinating stuff. But there is a difference
between „changes mainstream computing“ and „cool tech to shave off another
millisecond of latency“.

~~~
throwaway894345
Unikernels could also be the next phase of cloud orchestration technologies.
They would be truly lightweight VMs, and obviate the need for containers
(collapse the VM orchestration and container orchestration layers into a
single orchestration layer). This would offer better security and better
performance without impacting developer/operator friendliness. And a side note
about improved performance, in the cloud space, being able to boot in ten ms
means you can start a VM in the scope of a single request, which means you can
scale up dynamically without keeping a bunch of VMs running idle in the
background just in case your traffic spikes. This is real, significant cost
savings for cloud providers and customers.

~~~
rbanffy
Debugging a single-executable container is quite unpleasant. I assume
debugging unikernels or thin VMs (with a kernel and a single program) would be
a similar experience.

~~~
MrBuddyCasino
Yep dev tooling matters. Containers are already annoying, but Unikernels are a
different universe, and it is not going to be pleasant.

~~~
microcolonel
Depends on what is meant by debugging. If you're talking about running a
debugger on your program, you can attach GDB to your VM any time you want.

------
x3blah
"Also known as rump kernels a name inspired by the infamous purge of royalists
from Parliament following the English Civil War this process involves creating
a fork of an existing kernel codebase and manually purging it of the
components deemed unnecessary to the target unikernel."

That appears to come from the FAQ on Github for "rumprun" dated 2015.

Rump kernels were introduced in NetBSD in 2009. It was an described then as an
acronym for runnable userspace meta programs.

[https://blog.netbsd.org/tnf/entry/runnable_userspace_meta_pr...](https://blog.netbsd.org/tnf/entry/runnable_userspace_meta_programs_in)

I have not used rumprun for Linux but I still use the rump utilities included
with NetBSD.

------
rwmj
I'm one of the authors on this paper, so ask away if you have questions.

~~~
MichaelMoser123
isn't it hard to develop/debug/monitor an application with this approach? I
mean you can't just run a debugger as another process and attach to the thing,
you can't run tcpdump or strace, nothing of that sort. Also every bad pointer
access will require a reboot, wouldn't it? I mean how do you develop an
application with this approach?

~~~
rwmj
All good questions without a clear answer at the moment. Bad pointers in the
application can overwrite kernel data structures because everything runs in a
single address space.

~~~
Someone
The unikernel idea is 20-ish years old. I think those good questions cannot be
much younger.

Because of that, I think “without a clear answer at the moment” is worrisome.
Are there partial answers to these questions?

~~~
rwmj
I mean "without a clear answer" for UKL which is only just over a year old and
still in active and early development. There will probably be a way to attach
gdb at some point.

------
perbu
Unikernels are typically statically linked. Using copyleft code means the
whole resulting binary is subject to copyleft.

Free software is great, but not everyone is in the position where they can
release all of their code all the time.

For a unikernel to be viable it can't have copyleft code in it.

~~~
lmm
Depends what you're using it for. If you're just running it as a server then
(non-A) GPL code is fine since you're not distributing the program.

~~~
dathinab
But what about the cloud and auto scaling wouldn't that count as distribution
(through for internal use only)?

~~~
rwmj
It's an interesting point. I wonder if sending a binary to AWS counts as
distribution to a third party?

(Of course I work for Red Hat and the cult thinks all code should be free :-)

~~~
freedomben
Yes all code _should_ be free!

(I work for Red Hat also :-)

I was originally just being silly but figured I would add something of
substance here. I think the GPL goes a little too far but I like the spirit of
it. As a user if I pay for an application, I should get a copy of the code for
personal use as well. I don't think that gives _me_ the right to distribute it
because it's still somebody's property, but applications should be distributed
to customers with their source IMHO.

------
rectang
Can a unikernel be used to create a sound/music application which is not
afflicted by the latency problems which bedevil such applications when running
inside traditional operating systems?

For example, could it be used for a softsynth which takes in midi input and
emits digital audio with the OS only contributing guaranteed sub-millisecond
latency?

~~~
EnigmaCurry
is that not what the linux realtime patches do?
[https://wiki.archlinux.org/index.php/Realtime_kernel_patchse...](https://wiki.archlinux.org/index.php/Realtime_kernel_patchset)

~~~
rectang
I've been researching this for a long time, and it's unclear to me whether
Linux with PREEMPT_RT patches would meet that sub-ms requirement. I see
numbers all over the place from various sources, from sub-millisecond to more
than 10 ms.

10 milliseconds is often considered the threshold of human perception with
regards to audio latency. (It's possible that it's lower under some
circumstances, but 10 ms is a good enough approximation for my purposes.) My
goal is to create a sound/music tool which runs lots of DSP and which has a
total system latency of under 10 milliseconds, including OS, application
(including intrinsic latency of DSP procedures), hardware, and sound in air
(about 1 ms per 3 meters).

When I read about latency, I often see "we're at 6-9 ms, that's good enough
because it's not perceptible". Unfortunately, that's _not_ good enough if
there are several components which contribute to total system latency and they
are all pushing 10 ms. Hence, my sub-ms requirement for the OS.

Committing to a platform will be a costly choice. I don't want to invest in
writing for realtime Linux only to find that I really need to run a hardcore
RTOS, or run dedicated DSP chips, etc.

That's why I'm interested in whether running a unikernel can offer stronger
guarantees.

~~~
PaulDavisThe1st
I write realtime audio software for Linux, and have done so for 20 years.

While your goal is admirable - it certainly is possible to come up with
scenarios where "sub-ms" latency is desirable - it's really not relevant.

Your stated goal ("...lots of DSP ... under 10 msec") is already entirely
achievable on Linux, assuming you're close enough to your speakers (or wearing
headphones).

But sub-msec can only make sense here if it describes _scheduler latency_ ,
since there's no audio hardware that can function in the sub-msec range. The
linux scheduler is way, way below that threshold and SCHED_FIFO threads will
see that performance barring hardware issues (c.f.
[https://manual.ardour.org/setting-up-your-system/the-
right-c...](https://manual.ardour.org/setting-up-your-system/the-right-
computer-system-for-digital-audio/))

Finally .... writing audio software is a lot of fun. But please don't just
jump in without seeing if you can instead contribute to an existing project
first. The field is littered with the dead and half-dead corpses representing
the discarded work of developers who thought it would be fun, and then moved
on.

~~~
rectang
As for contributing to existing projects, I hear you — I've been highly active
in Open Source for well over a decade. Conceptually, the closest thing to what
I want to do is Pure Data — and I have in fact contributed to it.

But I'm also extremely motivated and willing to go all the way down and write
the entire thing from scratch if I have to. Or to learn enough about every
last step in the chain that I can actually control for latency — and it may be
that it takes about the same amount of work. Finding numbers I can trust on
latency seems hopeless. Everybody fudges rather than fails.

~~~
PaulDavisThe1st
The word "latency" covers many things, many subtly different from each other.
One of its meanings, unrelated to the one you're talking about, is the delay
in signal flow caused by algorithms (many digital filters, for example). Often
called "plugin delay compensation", or more generically "latency
compensation".

Let me just point out that it has taken 20 years and a guy whose PhD thesis
was about latency compensation in a DAW to finally "fix" this sort of
"latency" in Ardour. This is a massively harder problem from a design
perspective than the scheduling latency issues you've referred to.

[ EDIT: to be fair, the actual correct solution to latency compensation didn't
take 20 years to implement, more like a year or so when taking place within
the context of a large existing code base. ]

~~~
rectang
The complexity of solving such problems generally is actually what drives me
to start from scratch: rather than solve an intractable general problem,
instead limit the scope of where the project must run and what it must do.

My perception is that I am not capable of reliably tuning a general purpose OS
for a total system latency of under 10 msec. I can't give the total system a
hard number and believe that it will obey; instead, I need to perform a lot of
esoteric tweakery of subsystems I probably don't understand. The system won't
alert me reliably when it fails, but will instead either drop out or just give
me more latency than I asked for — and there are innumerable factors outside
my control that could cause it to fail.

However, the composition tool I want to create has a pretty small set of
requirements, if I accept that it only need serve my particular use case. So
how about I ensure that my app is the only thing running on the hardware, via
unikernel, or RTOS, or even bare metal?

Implementing all of my compositional requirements is probably easier and
certainly more rewarding than tweaking latency parameters without having
confidence that my results will be enduring or predictable.

If it turns out that the knowledge I gain during that exercise allows me to
control latency well enough and I can return to mainline operating systems and
contribute to existing projects, all the better. I don't really want to go
down this path, but I'm not willing to accept a tool that maybe-kinda-sorta-
sometimes meets my absolute requirements.

~~~
PaulDavisThe1st
You still won't be safe against several/most of the causes outlined in the
page from the Ardour manual that I linked to.

SMIs, random hardware that locks the bus for too long ... out of the kernel's
control.

If you really want 10msec finger->ear (a goal that I would say is reasonable,
though given many live performer's and mixing engineer's normal physical
setups, probably excessive) and you want to guarantee it, it's not just the OS
but the hardware you'll have to change. You cannot guarantee the required
scheduling on general purpose (intel/amd) motherboards unless you take great
care with the selection, and even then ... I've heard that financial
services/investment firms are the main reason you can still buy mobos
_without_ SMIs, because their "latency" requirements would be broken by these
interrupts.

On the other hand, the "not-guaranteed" case with a reasonable mobo, sensible
bus connected devices, an RT kernel and properly written user space software
is going to work almost all of the time. Just no guarantees.

~~~
rectang
Paul,

Thank you very much — for your ongoing work in Open Source audio, for being
willing to engage at length in this thread, and for being straightforward
about what the system can deliver.

> _If you really want 10msec finger- >ear (a goal that I would say is
> reasonable, though given many live performer's and mixing engineer's normal
> physical setups, probably excessive)_

I worked in a recording studio for 6 years, including two years as a mastering
engineer. Most of the sonic adjustments I would make during mastering and
mixing fell below the threshold of perception — but when added together they
would produce something well above the threshold of perception.

There's nothing magic about this. I don't have "golden ears" (although because
I've trained I can more quickly identify certain patterns than people who
haven't trained).

The point is simply that an aggregation of imperceptible changes can sum to a
perceptible result. It's akin to why you perform intermediate processing in
both video and audio at a higher resolution than the final delivery medium:
otherwise an accumulation of small, possibly imperceptable degradations will
cause perceivable degradation of the finished product.

And so, I dispute the idea that just because there are other sources of
latency, we should resign ourselves and accept substantial contributors to
latency which fall below perceptual threshold. The only number that matters is
the final sum of all latencies.

> _You cannot guarantee the required scheduling on general purpose (intel
> /amd) motherboards unless you take great care with the selection, and even
> then ... I've heard that financial services/investment firms are the main
> reason you can still buy mobos without SMIs, because their "latency"
> requirements would be broken by these interrupts._

With this in mind, I will set aside one possibity I'd considered: writing for
general purpose CPUs (e.g. multicore x86_64) outside of mainstream operating
systems.

Instead, while I'll continue prototyping the project on mainstream operating
systems, I'll probably look more deeply into dedicated outboard DSP boards.

> *On the other hand, the "not-guaranteed" case with a reasonable mobo,
> sensible bus connected devices, an RT kernel and properly written user space
> software is going to work almost all of the time. Just no guarantees.

I appreciate how hard you've worked to achieve that.

My question, then, is how can I be confident that I'm actually meeting these
"almost-all-of-time-time" latency requirements?

In my experience, most systems recommended that you lower the latency until
you hear clicks and pops. That convention leaves me... dissatisfied. A dropout
is a detectable event, and the monitoring system should surface it.

Just as boggling when you have exacting standards is when the system falls
back and delivers something subtly degraded without telling you, like changing
the latency without warning because the system would otherwise go down. I
understand why systems are designed to prefer degradation over failure, but
for my purposes I need to know when it happens. Expecting me to monitor
continuously for an effect which is at the threshold of perception, such as
subtly increased latency, is draining — and ultimately unreasonable.

We have meters for noise floors and red lights indicating that clipping
occurred. What facilities exist to help me understand when the latency
behaviors of my rig are not meeting my requirements?

~~~
PaulDavisThe1st
By "probably excessive" I wasn't referring to psycho-acoustics. I merely meant
that given your willingess to include speaker->ear latency, many people work
on music in scenarios where that measure alone is already close to or above
10msec. The worst case scenario is likely a pipe organ player, who deals with
latencies measured in units of seconds. Humans can deal with this without much
difficulty - it is jitter that makes it hard (or impossible) to perform, not
latency. Long-standing drummer & bass player duos generally report being able
to deal with about 10 msec when performing live.

On the flip side, you have people arguing convincingly the comb filtering
caused by phased reflections inside almost every listening scenario are
responsible for the overwhelming majority of what people as "different". Move
your head 1ft ... lose entire frequency bands ... move it again, get them all
back and them some!

Regarding latency deadlines: well, the device driver can tell you (and does,
if you ask it). If you use JACK, it will callback into your client every time
there is an xrun reported by the audio hardware driver. This in turn has a
quite simple definition: user space has not advanced the relevant buffer
pointer before the next interrupt. There are circumstances where this actually
isn't a problem (because the data has already been handled), but it is a
fairly solid way of knowing whether the software is keeping up with the
hardware. Something using ALSA directly can determine this in the same way
that JACK does.

For audio, there is no other measurement of this that really matters. Using
some sort of system clock to try to check timing, while likely to be kinda-
sorta accurate enough, ignores the fact that the only clock that matters is
the sample clock. If you're operating with huge margins of safety, some other
clock measurimg time is good enough, but as you begin to inch closer to
problem territory, it really isn't. For reference, we generally find that when
"CPU loads" (variously measured) get close to 80% on macOS and Linux,
scheduling deadlines start failing.

Nothing on linux will automatically "fallback" to less demanding latency
requirements. If the system can't meet the requirements of the audio
interface, it will continue to fail. This is actually part of the reason why
Ardour tends not to deactivate plugins - the user can expect the DSP/CPU load
to be more or less constant no matter what they do, rather than being low and
then climbing through a threshold that causes problems as they do stuff.

------
cm2187
The site has just the right number of modal windows hiding the content: a
cookie consent form, a covid 19 information at the bottom, and some
recommendations on the right, and if scroll back up the site banner...

~~~
sigwinch28
The old dl.acm was no amazing bit of modern design, but it was definitely much
better than this mess.

Yet again the ACM showing how out-of-touch it can be with its members and
patrons.

~~~
ancarda
Eh, sounds like every other website to me.

Wouldn't people say ACM are out of touch if they don't adopt all the common
junk you see on websites? Their website is so out of date it doesn't have a
cookie banner! Best I get my information from somewhere that keeps up with the
times ... and the law!

------
reubensutton
My gut instinct is always that using traditional kernels in a unikernel way is
a bit suboptimal because it doesn’t become a “library operating system” in the
same way that Mirage does.

~~~
rwmj
Only the bits of Linux which are used are linked in, same as when you link
together any program. The big advantage of using Linux is driver support - you
can run a UKL application on baremetal, linking to the drivers needed to run
on the target hardware.

~~~
reubensutton
Oh! That’s very cool

------
ncmncm
When I make a system where a unikernel could do the job, I usually want a full
OS for program setup and initialization, and then an isolated core for the
main loop, sharing (single-writer) memory pages with other, less performance-
critical processes for logging, stats reporting, and any needed file system
activities.

The makers of top-performing NICs have been quite good at providing direct
user-space access to their hardware, typically by exposing a ring buffer in
shared memory, and maybe mapping device registers too, so that the process on
the isolated core never does another system call until shutdown weeks later.

It is some hassle to get customers to add boot flags (isolcpus=, nohz_full,
rcu_nocbs=, rcu_nocb_poll, hugepages=, etc.), and to put any mapped files in
/dev/shm or /dev/hugepages so the kernel won't invent excuses to block the
procees, and to direct irqs to other cores; but unikernel setup is probably
not simpler.

So, I'm not sure what a unikernel would get me. Portability, or independence
from proprietary drivers?

~~~
bitcharmer
I don't think the tuning you refer to is sufficient to bring the platform
noise to levels required for some workloads. You will still see a lot of
system call interrupts, TLB shootdowns, timer events, etc. Funnily enough
yesterday I published an article tangential to this problem field.

[http://bitcharmer.blogspot.com/2020/05/t_84.html](http://bitcharmer.blogspot.com/2020/05/t_84.html)

I'm not an expert on unikernels but my assumption is that you will see none of
that OS jitter.

~~~
xfs
Good writeup, but is there any mitigation for TLS shootdowns?

~~~
ncmncm
TLB shootdowns are a product of multithreading, and of unmapping memory. Avoid
one, the other, or both, and TLB shootdowns fade out of the picture.

You still have cores to isolate, busybody kernel threads to suppress, and
hardware interrupts to direct elsewhere, but TLB shootdown paranoia is largely
a product of the current fashion favoring multi-threading over running
separate processes with carefully chosen sharing.

------
steeve
Obligatory plug for Elias Naur's unik, a unikernel written almost entirely in
Go, which can link with and run unmodified Go binaries:
[https://git.sr.ht/~eliasnaur/unik](https://git.sr.ht/~eliasnaur/unik)

Demo with virtio-gpu support:
[https://twitter.com/eliasnaur/status/1249765031299952646](https://twitter.com/eliasnaur/status/1249765031299952646)

------
gandalfgeek
If you're too lazy to read the whole paper I've made a quick explainer video:

[https://youtu.be/3NWUgBsEXiU](https://youtu.be/3NWUgBsEXiU)

------
edjrage
So apparently you need cookies to show text.

"We use cookies to ensure that we give you the best experience on our website.

It seems your browser doesn't support them and this affects the site
functionality."

------
sureshv
Wonder how this differs from exokernals:
[https://pdos.csail.mit.edu/archive/exo/](https://pdos.csail.mit.edu/archive/exo/)

~~~
GregarianChild
Exokernels are orthogonal to unikernels.

Unikernels run exactly one application, but can be based on an arbitrarily
complicated OS (specialised) to that one application. Exokernels can run
arbitrarily many applications, but OS functionality is minimal and limited to
ensuring protection and multiplexing of resources.

Specialising a rich OS (such as Linux) to a single application _might_ yield
OS functions that are as restricted as those you find in an exokernel.

------
axegon_
Full disclosure: I have only briefly fiddled with unikernels. And in all
honesty, I think there could be a good real life application for them. But the
biggest problems I see is that even though they have been around for a while,
building them is still incredibly complex and time consuming and there isn't a
core community behind them and looking at it, it seems like they are starting
to suffer from the javascript syndrome - hundreds of single maintainer or
micro-communities doing their own thing and putting a sticker("unikernel" in
this case) on top. As I said I haven't paid a whole lot of attention to them
and I'm hoping someone is addressing these issues.

~~~
heavenlyblue
Isn’t that’s what this whole thread about? Making it easier by providing Linux
as a compilation target.

------
jibanes
this is excellent, where is the code, do you have examples?

~~~
rwmj
[https://github.com/unikernelLinux](https://github.com/unikernelLinux)

We have memcached compiled for UKL but for some reason Ali has made that repo
private (it's under the same namespace as above). I will ask him if he can
make the other repos public this week.

------
moonbug
three objections: tooling, tooling, and tooling.

------
ncmncm
Full article text:

[http://sci-hub.tw/10.1145/3317550.3321445](http://sci-
hub.tw/10.1145/3317550.3321445)

Maybe link the top-level post there?

~~~
x3blah
There is a link to the PDF on the page. No ACM membership required.

[https://dl.acm.org/doi/pdf/10.1145/3317550.3321445](https://dl.acm.org/doi/pdf/10.1145/3317550.3321445)

~~~
ncmncm
The link was not obvious to me. What is supposed to be wrong with using Sci-
hub?

