
Critique of Microkernel Architectures – Is Linus Right? (2004) [pdf] - vezzy-fnord
https://www.cse.unsw.edu.au/~cs9242/04/lectures/lect05b.pdf
======
KMag
Seeing that their L4/Alpha implementation was written in PAL and C, it seems
that they made L4 run as the Alpha firmware.

For those not familiar with the DEC Alpha, it always ran what was basically a
hypervisor/microkernel in firmware (called PAL code). The OS kernel would
actually run in normal user mode and make upcalls to the PAL code, and the PAL
code could emulate an arbitrary number of protection rings (two for Ultrix,
several for VMS). VMS and Ultrix required different versions of the PAL code
to be loaded, and Linux on Alpha used the Ultrix version of the firmware.

It seems to me a shame that the DEC Alpha PAL code isn't the standard way
firmware is done... ship a nanokernel/hypervisor and some basic drivers in
ROM, and have some upcalls to replace drivers (or even the
nanokernel/hypervisor). The DEC Alpha was a powerhouse in its day, and seems
to have suffered very little from the PAL code abstraction.

~~~
gnufx
> For those not familiar with the DEC Alpha, it always ran what was basically
> a hypervisor/microkernel in firmware (called PAL code).

Gosh, I never realized, as a former Keeper of Alphas for scientific work,
previously dependent on the GEC 4000 series
<[https://en.wikipedia.org/wiki/GEC_4000>](https://en.wikipedia.org/wiki/GEC_4000>)
for physics. I wonder how it compared with OS4000 Nucleus.

The 4000s made competing VAXen look a bit silly for our sort of data
acquisition and analysis. Since they were pretty important for my publication
record, it always amuses me to see microkernel-ish systems rejected out of
hand. A colleague disproved the conventional wisdom that you needed a
"spectrum database" to manage the data by just trusting the filesystem to
manage directories. More recently I recall userspace filesystems being called
generally toys (?), notwhithstanding huge, high performance PVFS2
installations, for instance.

> The DEC Alpha was a powerhouse in its day

Yes, and tragically under-appreciated in my sphere. I tried in vain to get the
maintainer of the principal data reduction program in a different field to
replace the bottleneck of the disk-based sort (written for a PDP-11). Each
image would normally fit in cache and the whole dataset largely in memory, but
that somehow wasn't relevant, contributing to "the computer is slow". (I think
the sizes were 10MB and 1GB.)

------
Animats
The problem with monolithic kernels is their tendency to grow without bound.
The Linux kernel passed 15 million lines a few years back. Linus's arguments
for a monolithic kernel were stronger when it wasn't such a monolith. Anything
with an attack surface that big is hopeless from a security standpoint.

There are now a few secure microkernels. It really is possible to fix the Mess
at the Bottom. If the kernel isn't secure, nothing else can be. Patch-and-
release security just doesn't work any more. The serious attacks today are
from organized crime and governments, not script kiddies, and they develop
their own zero-day exploits.

~~~
pjc50
So, suppose we deploy sel4. It needs some device drivers, and crucially its
proof of security depends on not exploiting DMA. The user-mode drivers then
become the attack surface: if you can persuade the target graphics card to DMA
over the secure kernel space, you win. (ARM TrustZone actually prevents this)

I wonder how feasible a modern, graphics-capable POSIX OS built on a core of
sel4 is?

~~~
xj9
I don't have anything worth sharing at the moment, but this is something that
I'm currently exploring. I'm not sure I'm entirely sold on POSIX (I'd much
rather build a Plan 9 system), but I do see the value in interoperability.

~~~
pjc50
It depends on whether you want to run software on it or attempt to build your
own ecosystem entirely, which only works if you're already a giant vendor.

------
tildeleb
"As a comparison, Mach’s micro-kernel without device drivers has 25530 lines
of C code (calculated, we’re told, by counting semi-colons). By the same
metric our minimal kernel is only 4622 lines long, less than 1/5 the size. In
fact, our kernel with every file system included is still less than half the
size of their micro-kernel."

Plan 9, A Distributed System Dave Presotto Rob Pike Ken Thompson Howard
Trickey 1991 (I think)

'nuff said

~~~
renox
Except that it is again a comparison with Mach, what's the size of L4?

~~~
vezzy-fnord
It's a red herring, anyway. Microkernels aren't so much about size as they are
about separation of concerns. Otherwise we'd have to conclude V7 Unix and DOS
are both microkernels, which is preposterous.

------
iofj
All of these optimizations for microkernels would accelerate monolithic
kernels as well. What seems to be very much missing from presentations like
this is a reason that these would accelerate microkernels more then they would
accelerate monolithic kernels.

I do wonder if we could do the opposite of microkernels. A microkernel would
be free if it was running a (pre/recompiled ?) VM and things like memory
mapping/protection/... would be compiled into user programs before execution.
After this these could run without any actual user/kernel space isolation and
just eliminate things system call overhead entirely.

~~~
carussell
[https://en.wikipedia.org/wiki/Singularity_%28operating_syste...](https://en.wikipedia.org/wiki/Singularity_%28operating_system%29#Security_design)

~~~
iofj
Heh, I just realized I unwittingly made the "sufficiently smart compiler"
argument. The only reason a library OS would outperform a non-library OS is
that software can do things faster than hardware. This compiler/VM would have
to outperform the cpu at executing ... cpu instructions, which is never going
to happen, or it would have to optimize better than the hardware can.

In theory this is possible, in practice, in low-level optimized piece of
software, like the kernel, this is never true. This just can't work, at least
for the next few decades, but potentially forever.

------
byuu
I understand the article was from 2004, when PCs were about 2-4x slower.

But to me, a microkernel isn't about performance. It's a complement to
monolithic kernels. When I am writing source code, or typing up an important
document, or archiving files for preservation ... I want to be doing this on a
microkernel OS.

I have had my FreeBSD system crash completely due to the video card driver
(from nVidia, of course) accessing unmapped memory. That's completely
illogical. Drop me to a text terminal, or hell, force me to SSH in to unmount
and reboot cleanly. But don't drop the entire system. Yet with a monolithic
kernel, how can you be sure the video driver didn't corrupt some other part of
the kernel space?

Likewise, if I want to play the latest video game, or I'm using it as a render
farm, or to mine bitcoins, or I need to run a web server that needs
scalability more than stability ... then I want to be doing that on a
monolithic kernel.

We have most OSes that focus on raw performance, then we have NetBSD that
focuses on portability, and then OpenBSD that focuses on security. Where's our
(non-toy) OS whose primary focus is on _stability_? So far, Minix 3 looks like
the most promising option, but it has no enterprise file system like ZFS, and
is severely lacking in manpower.

~~~
KMag
> Likewise, if I want to play the latest video game, or I'm using it as a
> render farm, or to mine bitcoins, or I need to run a web server that needs
> scalability more than stability ... then I want to be doing that on a
> monolithic kernel.

For the latest video game, with a microkernel (and an IOMMU keeping the video
card from trashing memory), when switching to fullscreen mode, your OpenGL
library could request that your video game process become the video driver,
with no loss of security or stability. That gives you lower overhead access to
the video card than a monolithic kernel.

For compute-bound tasks such as bitcoin mining, a small realtime microkernel
will give you better instruction cache utilization. (Back in university system
architecture class, I ran cache benchmarks under Linux, QNX, and Win2k on my
triple-boot desktop, QNX gave me the best cache performance.)

If you need to run a high-performance webserver, if you need absolute
performance, run it in-kernel. If not, a microkernel (and a properly
configured IOMMU) allows you to safely make your process the network driver as
well as filesystem and disk drivers for your dedicated www data drive. Now,
you'll want a second network port with its own chipset (perhaps on its own
card) for being able to ssh in if the webserver crashes, but you'll have
higher performance than a monolithic kernel.

In short, microkernels along with appropriate hardware protections allow you
to safely make your device drivers just ordinary libraries that run inside
your critical processes, giving you higher performance without sacrificing
security or stability.

EDIT: I should also point out that Oracle, Postgress, and other high-
performance RDMSes go through a lot of work to do "kernel bypass". A
microkernel with minimum cache footprint would better get out of their way and
more safely turn over more functionality to the application.

~~~
chousuke
> Now, you'll want a second network port with its own chipset (perhaps on its
> own card) for being able to ssh in if the webserver crashes

I don't think this is necessary with modern NICs and SR-IOV. I imagine you'd
just configure one virtual function for the web server, and one for
management. It's pretty common already.

Similarly you might have a RAID controller capable of LVM-like "partitioning"
of the disk array so that you can present two virtual disks that can be
dedicated to whatever purpose you need. I don't know if these actually exist
yet, though.

------
pjmlp
The proof that Linus was wrong is that besides the majority of UNIX clones,
all modern OS are micro-kernels (L4, high integrity rtos, Symbian, minix),
hypervisor based (e.g. mainframes, unikernels). hybrid (Mac OS X, iOS ,
Windows).

Linux success is related to the ease people could copy stuff from commercial
UNIXes into GNU/Linux, not the kernel architecture.

~~~
zurn
So are the cases of "hybrid" kernels a victory for microkernel side of the
debate, or the monolithic side?

Neither iOS/OS X or NT manage to get the robustness, security or
compartmentalization advantages of microkernels. I guess they get modularity
and dynamic loading, but so does Linux. So Linux, X and NT are all mongrels,
with iOS/OS X having microkernel roots and Linux having monolithic roots, and
NT being a ground-up designed mongrel.

~~~
pjmlp
I love how Windows hate runs in HN.

Both Mac OS X/iOS get it partially, by having kernel level RPC to communicate
between modules and having moved a great part of their driver infrastructure
to user space.

Let me know when your graphic card crashes don't require a reboot any longer
on GNU/Linux.

~~~
zurn
No Windows hate intended, I called all of them mongrels.

So is there a the mechanism in OS X and/or Windows that prevents a graphics
card hardware or driver from corrupting OS state and enables robust reset of
card and driver? Or if it's a 90% solution, it significantly better because of
some microkernelesque features missing from Linux?

~~~
pjmlp
Yes, since Windows Vista, Microsoft kind of returned to the original NT model
and introduced User-Mode Driver Framework (UMDF), which is requirement for
graphics drivers. Thus moving them out again from the kernel.

[https://msdn.microsoft.com/en-
us/library/windows/hardware/dn...](https://msdn.microsoft.com/en-
us/library/windows/hardware/dn384105%28v=vs.85%29.aspx)

[https://msdn.microsoft.com/en-
us/library/windows/hardware/ff...](https://msdn.microsoft.com/en-
us/library/windows/hardware/ff570114%28v=vs.85%29.aspx)

A driver crash will just force it to be reloaded.

Likewise using kernel level RPC adds extra validation layers via the data
marshaling than a simple function call would do.

As for OS X, only drivers for disks, network controllers, and keyboards are
required on the kernel and use mostly Mach calls, not BSD ones. Anything else
can be exposed via so called nubs to user space.

[https://developer.apple.com/library/mac/documentation/Device...](https://developer.apple.com/library/mac/documentation/DeviceDrivers/Conceptual/IOKitFundamentals/ArchitectOverview/ArchitectOverview.html#//apple_ref/doc/uid/TP0000013-TPXREF107)

[https://developer.apple.com/library/mac/documentation/Device...](https://developer.apple.com/library/mac/documentation/DeviceDrivers/Conceptual/IOKitFundamentals/ArchitectOverview/ArchitectOverview.html#//apple_ref/doc/uid/TP0000013-BEHEGHEG)

~~~
zurn
Thanks. I didn't see explanations of the graphics driver restart in those MSDN
links, but I found one here: [https://msdn.microsoft.com/en-
us/library/windows/hardware/ff...](https://msdn.microsoft.com/en-
us/library/windows/hardware/ff570087\(v=vs.85\).aspx)

And third party articles:
[https://www.blackhat.com/docs/us-14/materials/us-14-vanSprun...](https://www.blackhat.com/docs/us-14/materials/us-14-vanSprundel-
Windows-Kernel-Graphics-Driver-Attack-Surface.pdf)
[http://bsodtutorials.blogspot.fi/2013/12/timeout-
detection-a...](http://bsodtutorials.blogspot.fi/2013/12/timeout-detection-
and-recovery-stop.html)

So there's a user-space part to the GPU driver and a kernel-space part, much
like on Linux.

For the recovery functionality it sounds like the graphics card's kernel-side
GPU driver just registers a callback that is used by Windows when it thinks
the GPU or driver is stuck, but there aren't any special arrangements to make
this robust against the driver corrupting OS state or other hardware. Same
kind of mechanism could be implemented in Linux.

I'll save those OS X links for later when I have time to look into that one!

~~~
pjmlp
Of course one could eventually migrate Linux to an hybrid kernel, but the will
to do so isn't there.

Specially since it requires a stable kernel ABI.

~~~
zurn
I'm not sure we are on the same page. I attempted to explain how the
functionality in question is not related to kernel architecture differences,
and that Windows is unable to implement the function reliably.

------
ben_bai
I feel like those benchmarks should be redone with modern hardware (x86_64)
and a modern microkernel (minix 3.3).

CPU cycles are cheap this days, yet cache misses are even more expensive (need
to touch main memory? please wait 300 cycles).

