
Examining the Legendary Hurd Kernel (2008) - vezzy-fnord
http://www.informit.com/articles/printerfriendly/1180992
======
Animats
It's sad. Neither Hurd nor Mach reached the performance level of QNX. L4
finally did, although L4 now uses shared memory too much. (While it simplifies
the kernel to do message passing via shared memory, it makes it easier for one
process to mess up another. Queues in shared memory have to be managed very
carefully.)

The real issues in microkernel performance are subtle. Message passing and CPU
dispatching have to be well-integrated. Otherwise, when you pass a message
from one process to another in a subroutine/service like way and get a
response, you go to the end of the line for CPU time twice. A good test of a
message passing system is to write a client/server which does some trivial
computation on the server and then returns a reply. Benchmark it. Then repeat
the benchmark with the system CPU-bound. If performance drops by orders of
magnitude under load, the message passing / CPU dispatch relation has been
botched.

On the security front, the "critique" says _" programs are assumed to
represent the interests of the user and, as such, are run with the user’s
authority."_ This, of course, comes from the UNIX/Linux model and still gives
us trouble. We still have trouble running something like Flash or a browser on
a desktop without giving it far too much authority.

(Of course, if you do have a more granular authority model, apps ask for too
much, as the mobile world has learned. Grumble of the week: Firefox on Android
now demands access to my phone book. Not because the browser needs it, but
because their ancillary products, Sync or Pocket, might.)

~~~
betaby
I keep hearing those stories of "performance level of QNX" since at least
2000. Perhaps it's even true (while I didn't see that during 4.x QNX era, but
I blame lack of experience back then). I've seen even various synthetic tests
QNX vs L4 couple of years ago. Again QNX source code has been released about 5
years ago if I'm not mistaken. So what was that super performing sauce of QNX
which other micro kernels lack. What are those concepts/ideas? Why they
weren't re-implemented on L4/Hurd (you can't just borrow, QNS is NOT free
software)?

~~~
nickpsecurity
I'd be interested in hearing Animats' opinion. My guess is that it stayed
simpler than Mach but not too simple. Right tradeoffs in the kernel along with
fitting into the tiniest of cache and [typical of RTOS's] smart choice of
instructions for predictable timing. Here's a paper on an older version of it:

[http://cseweb.ucsd.edu/~voelker/cse221/papers/qnx-
paper92.pd...](http://cseweb.ucsd.edu/~voelker/cse221/papers/qnx-paper92.pdf)

In 1992. it could execute in 8KB of cache with the microkernel and interrupt
handler. That by itself should buy it some performance. :)

~~~
Animats
The most important operation in QNX is MsgSend, which works like an
interprocess subroutine call. It sends a byte array to another process and
waits for a byte array reply and a status code. All I/O and network requests
do a MsgSend. The C/C++ libraries handle that and simulate POSIX semantics.
The design of the OS is optimized to make MsgSend fast.

A MsgSend is to another service process, hopefully waiting on a MsgReceive.
For the case where the service process is idle, waiting on a MsgReceive, there
is a fast path where the sending thread is blocked, the receiving thread is
unblocked, and control is immediately transferred without a trip through the
scheduler. The receiving process inherits the sender's priority and CPU
quantum. When the service process does a MsgReply, control is transferred back
in a similar way.

This fast path offers some big advantages. There's no scheduling delay; the
control transfer happens immediately, almost like a coroutine. There's no CPU
switch, so the data that's being sent is in the cache the service process will
need. This minimizes the penalty for data copying; the message being copied is
usually in the highest level cache.

Inheriting the sender's priority avoids priority inversions, where a high-
priority process calls a lower-priority one and stalls. QNX is a real-time
system, and priorities are taken very seriously. MsgSend/Receive is priority
based; higher priorities preempt lower ones. This gives QNX the unusual
property that file system and network access are also priority based. I've run
hard real time programs while doing compiles and web browsing on the same
machine. The real-time code wasn't slowed by that. (Sadly, with the latest
release, QNX is discontinuing support for self-hosted development. QNX is
mostly being used for auto dashboards and mobile devices now, so everybody is
cross-developing. The IDE is Eclipse, by the way.)

Inheriting the sender's CPU quantum (time left before another task at the same
priority gets to run) means that calling a server neither puts you at the end
of the line for CPU nor puts you at the head of the line. It's just like a
subroutine call for scheduling purposes.

MsgReceive returns an ID for replying to the message; that's used in the
MsgReply. So one server can serve many clients. You can have multiple threads
in MsgReceive/process/MsgReply loops, so you can have multiple servers running
in parallel for concurrency.

This isn't that hard to implement. It's not a secret; it's in the QNX
documentation. But few OSs work that way. Most OSs (Linux-domain messaging,
System V messaging) have unidirectional message passing, so when the caller
sends, the receiver is unblocked, and the sender continues to run. The sender
then typically reads from a channel for a reply, which blocks it. This
approach means several trips through the CPU scheduler and behaves badly under
heavy CPU load. Most of those systems don't support the many-one or many-many
case.

Somebody really should write a microkernel like this in Rust. The actual QNX
kernel occupies only about 60K bytes on an IA-32 machine, plus a process
called "proc" which does various privileged functions but runs as a user
process. So it's not a huge job.

All drivers are user processes. There is no such thing as a kernel driver in
QNX. Boot images can contain user processes to be started at boot time, which
is how initial drivers get loaded. Almost everything is an optional component,
including the file system. Code is ROMable, and for small embedded devices,
all the code may be in ROM. On the other hand, QNX can be configured as a web
server or a desktop system, although this is rarely done.

There's no paging or swapping. This is real-time, and there may not even be a
disk. (Paging can be supported within a process, and that's done for gcc, but
not much else.) This makes for a nicely responsive desktop system.

~~~
anon4
This sounds a lot like Android's binder mechanism for inter-process
communication.

~~~
Animats
Binder was created to solve the same problem (Linux IPC was too slow) but uses
somewhat different approaches.

------
sergiolp
Some years ago, I've spent a lot of time studying GNU Mach and Hurd (I've also
made some small contributions). I think I can say that I now both pretty well.
I even started a project to preserve OSF Mach + MkLinux source code
([https://github.com/slp/mkunity](https://github.com/slp/mkunity)), a very
cool project for its time (circa 1998).

These days I prefer to do my kernel hacking on monolithic kernels, mainly
NetBSD. I've stopped working on Mach, Hurd and other experimental microkernels
(there're a bunch out there) because it was becoming increasingly frustrating.

If you'd ask me to define the problem with microkernels with one word, that
would be "complexity". And its a kind of complexity that impacts everything:

\- Debugging is hard: On monolithic kernels, you have a single image, with
both code and state. Hunting a bug is just a matter of jumping into the
internal debugger (or attaching an external one, or generating a dump, or...)
and looking around. On Hurd, the state is spread among Mach and the servers,
so you'll have to look at each one trying to follow the trail left by the bug.

\- Managing resources is hard: Mach knows everything about the machine, but
nothing the user. The server knows everything about the user, but nothing
about the machine. And keeping them in sync is too much expensive. Go figure.

\- Obtaining a reasonalbe performance is har... imposible: You want to read()
a pair of bytes from disk? Good, prepare a message, call to Mach, yield a
little while the server is scheduled, copy the message, unmarshall it, process
the request, prepare another message to Mach to read from disk, call to Mach,
yield waiting for rescheduling, obtain the data, prepare the answer, call to
Mach, yield waiting for rescheduling, obtain your 2 bytes. Easy!

In the end, Torvalds was right. The user doesn't want to work with the OS, he
wants to work with his application. This means the OS should be as invisble as
possible, and fulfill userland requests following the shortest path.
Microkernels doesn't comply with this requirement, so from a user's
perspective, they fail natural selection.

That said, if you're into kernels, microkernels are different and fun! Don't
miss the oportunity of doing some hacking with one of them. Just don't be a
fool like me, and avoid become obsessed trying to achieve the imposible.

~~~
userbinator
It's like the argument about excessive modularity in software design in
general: you can split a system into so many little pieces that each one of
them becomes very (deceptively) simple, but in doing so you've also introduced
a significant amount of extra complexity in the communication between those
pieces.

Personally, I think modularity is good up to the extent that it reduces
complexity by removing duplication, but beyond that it's an unnecessary
abstraction that obfuscates more than simplifies.

~~~
nickpsecurity
The communication would've happened anyway. Now it just happens through a
common mechanism with strong isolation. That all the most resilient systems,
especially in safety-critical space, are microkernels speaks for itself. For
instance, MINIX 3 is already quite robust for a system that's had hardly any
work at all on it. Windows and UNIX systems each took around a decade to
approach that. Just the driver isolation by itself goes a long way.

Now, I'd prefer an architecture where we can use regular programming languages
and function calls. A number of past and present hardware architectures are
designed to protect things such as pointers or control flow. Those in
_production_ are not, but have MMU's & at least two rings. So, apps on them
will both get breached due to inherently broken architecture and can be
isolated through microkernel architecture with interface protections, too. So,
it's really a kludgey solution to a problem caused by stupid hardware.

Still hasn't been a single monolithic system to match their reliability,
security, and maintenance without clustering, though.

~~~
coldtea
> _For instance, MINIX 3 is already quite robust for a system that 's had
> hardly any work at all on it. Windows and UNIX systems each took around a
> decade to approach that. Just the driver isolation by itself goes a long
> way._

MINIX3 also has hardly any work done WITH it, so I don't think we can compare
it to Windows and UNIX systems regarding robustness, unless we submit it to
the same wide range of scenarios, use cases and work loads...

~~~
nickpsecurity
I'd like to see a battery of tests to see where it's truly at. Yet, there's
still not a MINIX Hater's Handbook or something similar. That's more than
UNIX's beginnings can say. ;)

------
krylon
This makes me sad in three ways.

1\. Hurd is still not here, yet.

2\. With Duke Nukem Forever actually finished, and with Perl 6 looking like it
might get finished this year, we kind of run out of good jokes to crack about
the Hurd. What will be the next killer-app that will run on Hurd out of the
box, once it's finished?

3\. Unless I've been misinformed, there are multi-server microkernel systems
out there that _do_ work. So it's not like the Hurd was a totally misguided
idea. And yet, it's still out there, like cold fusion, artificial intelligence
and hypo-allergenic kittens, it's tantalizingly out of reach...

Well, one day

~~~
vezzy-fnord
As tired as it may sound, I think there just _might_ be a convenient time for
the Hurd to rise, albeit in an unconventional way.

You already have a de facto GNU OS at the moment in the form of the Guix
System Distribution (GuixSD), what with it letting you configure the entire
system in Scheme, perform transactional upgrades, system state rollbacks, it
has its own init daemon and service scripts in Scheme, etc.

At the same time, we've been seeing some efforts in creating tiny builds of
the Linux kernel [1] that go as far as to reduce syscalls, VMM algorithms,
capabilities, character devices and so forth. There's also the recent Linux
libOS effort to create a network stack in userspace as a shared library,
though that seems about the extent of it. NetBSD then gives you a userland
driver framework with rump kernels.

So what one can do is build a really stripped down Linux kernel, write a Mach
IPC compatibility layer, run the Hurd servers on top of it and plug in the
resulting product with GuixSD. And you now have the complete GNU system. As
far as you'll ever get, anyway.

This has already been discussed before [2], but it's not something that has
ever been given any priority. With the current climate, it might be worth a
shot.

As to why anyone would want to do this... well, for one you get a full OS that
is entirely configurable in Scheme with all of its services running as
userland file servers (translators) in a sort of Plan 9-ish way, all the while
supporting the Linux API. That's still pretty ahead of GNU/Linux. The Hurd
also has features like running multiple instances of itself in user mode
(subhurds) that can serve a similar purpose to containers.

It's obvious no one's going to port the Hurd servers away from GNU Mach, so
this is the best shot anyone has. Any Hurd or Guix devs in here to comment on
this?

[1] [https://tiny.wiki.kernel.org/](https://tiny.wiki.kernel.org/)

[2]
[https://www.gnu.org/software/hurd/open_issues/linux_as_the_k...](https://www.gnu.org/software/hurd/open_issues/linux_as_the_kernel.html)

~~~
nickpsecurity
I'm not Hurd supporter but that's an interesting idea. I'd say replace Scheme
with Python or something mainstream that's similarly powerful. Adding LISP to
stuff usually kills it off unfortunately, with Clojure a freak occurrence. The
transactional upgrades and rollbacks feature by itself would be compelling to
some audiences.

~~~
rekado
"Adding LISP to stuff usually kills it off"

What? Oh, you mean like Emacs. Scheme is a beautiful language; it's flexible
and trivial to learn. Replacing it with Python would be silly. Besides, Guile
Scheme is the designated extension language of the GNU system. Gnucash, dmd
(the init system), the Gimp, Guix, gEDA and many more GNU applications all can
be hacked in or extended with Guile. In the GNU system Guile Scheme _is_
"mainstream".

~~~
nickpsecurity
"Oh, you mean like Emacs."

No I mean like... surveys the market... nearly everything else. The LISP field
is almost non-existent outside of academia and a few companies. Almost nothing
mainstream in proprietary or FOSS is built with it. Plus, most programmers
hate it. So, using something like in a FOSS app intended to go mainstream is
quite a risk.

"Besides, Guile Scheme is the designated extension language of the GNU
system."

Oh Ok. That they're pushing people to use an unpopular language at least
explains why the apps you cited used it.

" In the GNU system Guile Scheme is "mainstream"."

Nah, most programmers haven't heard of it. Of those that have, I'd bank on
only a subset even using it. I'd say it's used by a tiny set of developers,
esp extension builders. And for tools who mostly each represent a small amount
of developers or users.

You're really making my point for me by redefining the word mainstream to be
"barely known software barely making it with a few exceptions." They should
consider something that actually is mainstream as an experiment in increasing
adoption. Probably some other things to reconsider while they're at it.

~~~
rurban
You mean like AutoCAD was killed by all the other CAD packages with a more
conventional language? I'd like to see such a counterexample. Lisp is only
killed by managers who are afraid to hire cheap labour. Like yahoo and their
store.

~~~
nickpsecurity
What language is AutoCAD written in? And what percentage of successful
products and projects use LISP? That's what I mean. It's the status quo of
where LISP is in the marketplace. It's you that have to provide dozens to
thousands of counterexampes to even illustrate a trend. They're not there.

Whereas companies adopting PHP, Python, Ruby, Lua, and so on have no time
finding help, libraries, or customers. Because they're mainstream and attract
such people. See the difference now?

------
InTheArena
I sit on our company's architecture review board, and we do a pass on every
product that the company does. There are three links that I regularly have
grounds to give: The "Linux is obsolete" Debate -
[http://www.oreilly.com/openbook/opensources/book/appa.html](http://www.oreilly.com/openbook/opensources/book/appa.html)
Taligents Guide to Designing Object Oriented Software:
[http://www.amazon.com/Taligents-Guide-Designing-Programs-
Obj...](http://www.amazon.com/Taligents-Guide-Designing-Programs-Object-
Oriented/dp/0201408880) And "The Critique"
[http://walfield.org/papers/200707-walfield-critique-of-
the-G...](http://walfield.org/papers/200707-walfield-critique-of-the-GNU-
Hurd.pdf)

~~~
nickpsecurity
And yet, the monolithic systems (esp Linux) proved to have all the problems
predicted. Shapiro did a nice job showing how ridiculous Linus's claim is in
Round 2 of the debate, which you left off:

[http://www.coyotos.org/docs/misc/linus-
rebuttal.html](http://www.coyotos.org/docs/misc/linus-rebuttal.html)

Can't help but repeat a main point: there are many high assurance (reliability
or security) microkernel-based systems that have been fielded but _not a
single one_ has achieved this based on the monolithic model Linus loves. QED.

------
vander_elst
It would be nice seeing Hurd on L4se now that its source is freely available

~~~
lelf
[https://www.gnu.org/software/hurd/history/port_to_another_mi...](https://www.gnu.org/software/hurd/history/port_to_another_microkernel.html)

~~~
vander_elst
I meant the mathematically proved fork of L4

~~~
coupdejarnac
Looks like your homemade mouse needs its buttons debounced.

------
tomcam
_Firefox..fills up available RAM with page caches. Often, swapping these pages
back in from disk can be slower than re-fetching from the Internet._

WTF? I'm not a kernel guy and would like to know why this is. Very, very
counterintuitive to me.

~~~
icebraining
Non-SSD disks are really slow for random accesses, like you might need if the
cached files are not contiguous, and often congested due to background
processes competing for their use. Meanwhile, the server most likely has those
files in memory, ready to push them.

~~~
MichaelGG
Disks are what, 10ms random seek? You'd need a good, nearby server to service
in under that time. And that assumes you've got a connection already
established. If not, then add another roundtrip. I'm not sure "often" is
appropriate here.

~~~
TheLoneWolfling
That's best-case. (Well, not really. Best case is that your disk isn't doing
anything, the data is in one contiguous region, the head is in the right
position, and the data is about to pass under the head.)

In actuality, it can be a _lot_ worse. I've seen latencies of 500ms+ before,
when the disk is in "seek hell". And right now my laptop has a median (!)
latency of ~100ms. Mind you I'm listening to music and copying files in the
background. But still.

------
octatoan
> Since those days, HURD has acquired the same reputation in operating system
> circles that Duke Nukem Forever has among gamers.

What reputation does Duke Nukem Forever have among gamers?

~~~
detaro
Vaporware. Announced 1997. Restarted/largely rebuild multiple times, released
finally in 2011.

------
chatman
It’s possible to have a complete and usable system running nothing other than
GNU code.

