
Mach kernel - thatguyagain
https://en.wikipedia.org/wiki/Mach_(kernel)
======
mcguire
I worked at IBM Austin on the performance team when the Mach-based IBM
Microkernel/Workplace OS was ongoing. (Ask me anything! :-))

Performance was the big problem. (At one point, a disk read was CPU-bound.)

1\. The Mach Interface Generator (mig) generated code with wildly different
performance, with no obvious relationship to the mig spec.

2\. Context switching is always the big topic, but it's a red herring. Our
problem was primarily data transfer; copy-on-write took so long to set up that
it was frequently cheaper to just do the copy.

3\. Making a syscall to get the current PID is a stupid idea.

~~~
Animats
_copy-on-write took so long to set up that it was frequently cheaper to just
do the copy._

Right. QNX just copied for interprocess communication. Mach did lots of
messing with the MMU to create temporary shared pages. That seemed to be a
lose.

* What you want for interprocess communication in a microkernel is something that works like a function call - send data, wait for reply. If you have to build that out of unidirectional messages, it means more trips through the CPU dispatcher, and probably going to the end of the line waiting for a turn at the CPU. Call-like approach: Process A calls process B, control transfers immediately from A to B, B returns to A, control transfers immediately to process A. No need to look for the next task to run; who runs next is a no-brainer. Pipe-like approach: A sends to B, both are active for a moment, A soon blocks on a lock, B wakes up and does its thing, B sends to A, both are active for a moment, B soon blocks on a lock. A and B fight for the CPU.

The "going to the end of the line" effect is that when other tasks want the
CPU, the pipe-like approach means other tasks get a chance to run during the
handoff. The effect is that message passing performance falls off a cliff when
you're CPU-bound. QNX used a call like approach (MsgSend/MsgReceive), while
Mach used something more like pipe I/O.

* Mach started from the BSD code base. Trying to build a microkernel by hacking on a macrokernel didn't end well. Microkernels are all about getting the key primitives at the bottom working very fast and reliably. There was an eventual rewrite, but I gather that BSD code remained.

~~~
a1369209993
Technically there's no reason why write can't context switch directly to a
process that's blocked on read (or, less commonly, vice versa). That only
solves half the problem though; if you want pure pipe-like IO, you'd also need
a single system call that combines write with select/poll/equivalent so the
caller can immediately become blocked waiting for the return message.

~~~
Animats
That runs into a problem on a quick reply. Process A does the write on pipe
AB, then gives up the CPU to B. A is still in ready to run state. B quickly
generates a reply and writes it to pipe BA. But there's no read pending on
pipe BA yet. So both processes are now in ready to run state, contending for
the CPU, along with anything else that needed it.

There's an alternative - unbuffered pipes. This is like a Go channel of length
0 - all writes block until the read empties the channel. This is better from a
CPU dispatching perspective, in that a write to a channel implies an immediate
transfer of control to the receiver. Of course, Go is doing this in one
address space, not across a protection boundary.

The QNX approach worked well for service-type requests. You could set up a
service usable by multiple processes. Each request contained the info needed
to send the reply back to the caller. The service didn't have to open a pipe
to the requestor. This even understood process priority, so high priority
requests were serviced first, an essential feature in a hard real time system.

~~~
a1369209993
> That runs into a problem on a quick reply.

Read the second sentence; process A does the write+select on AB and [BA,...],
yields to B, is _not_ ready to run; B generates a reply and writes it to BA,
which has a read (technically select) already pending.

~~~
Animats
That's the advantage of a combined read and write. But if you're going to have
that, it's more convenient to explicitly package it as a request/reply. Less
trouble with things like having two messages in the pipe and such.

~~~
a1369209993
You might be waiting (via select) for _multiple_ replies; you don't know
whether B will (say) grab something out of disk cache and chuck it back to
you, or if it'll fire off a seek command to the spinning rust and take a nap
just as your network round-trip completes. Having two (or more) messages in
(separate) pipes is the point of using select.

------
drewg123
_Given a syscall that does nothing, a full round-trip under BSD would require
about 40μs, whereas on a user-space Mach system it would take just under
500μs_

This is still true, but to a lesser extent, on MacOSX (or at least was, a
decade ago). I was writing HPC drivers for a cluster interconnect. So
performance was critical. We had been using the BSD ioctl system to
communicate with our drivers because we used ioctls in all our other drivers
(Linux, Solaris, FreeBSD, Windows). I did some microbenchmarks and noticed
that it was far slower than FreeBSD or Linux ioctls & complained. Apple
suggested that I re-write the app/driver communication using IOKit, which is
Mach based. The result was something that was twice as slow.

~~~
logicchains
>Given a syscall that does nothing, a full round-trip under BSD would require
about 40μs, whereas on a user-space Mach system it would take just under 500μs

Point of reference: I work in HFT, and it's possible to process a market data
update from an exchange, reprice a complex financial instrument, decide
whether to place an order, and send that order back towards the exchange all
in under 5us. Another point of reference: raising one float to the power of
another (a^b) could be done over a 100,000 times in the same time it takes for
Mach to do a single syscall.

~~~
gok
That 500 microsecond number is from a 1993 paper on 1991 hardware (a 50 MHz
486DX-50). Your HFT code wouldn't load on it, and in 500usec it would be lucky
to do a single float^float power operation.

~~~
AnimalMuppet
> in 500usec it would be lucky to do a single float^float power operation.

The rest of your post seems OK, but this claim is almost certainly off. 2000
float^float operations takes a full second? Even on a 486DX, no way.

~~~
gok
Ok I mixed up the DX and SX; at least the DX has the fp hardware. Still, many
of the x87 instructions you'd want to use to implement pow() take hundreds of
cycles on a 486, so I'd bet it would take tens of usec to do each pow().

------
insulanian
> _Mach 's name Mach evolved in a euphemization spiral: While the developers,
> once during the naming phase, had to bike to lunch through rainy
> Pittsburgh's mud puddles, Tevanian joked the word muck could serve as a
> backronym for their Multi-User [or Multiprocessor Universal] Communication
> Kernel. Italian CMU engineer Dario Giuse later asked project leader Rick
> Rashid about the project's current title and received "MUCK" as the answer,
> though not spelled out but just pronounced as IPA: [mʌk] which he, according
> to the Italian alphabet, wrote as Mach. Rashid liked Giuse's spelling "Mach"
> so much that it prevailed._

It's fascinating how some things get their names :)

~~~
Hitton
I always thought it was named after Mach number because of its speed. Being
named after muck couldn't be farther from that.

~~~
mcguire
Speed was not one of Mach's attributes.

------
blank_pattern
Apple even open sources their XNU kernel:
[https://opensource.apple.com/](https://opensource.apple.com/)

There's some delay between when a new macOS version comes out and when sources
get published, but it's great to see how they use the Mach kernel in practice.

~~~
ksec
Why do they update the macOS XNU Kernel and not iOS?

~~~
blank_pattern
They seem really big on keeping iOS things secret-ish, but the XNU kernel for
both iOS and macOS seem to be built from the same code base.

Up until a couple years ago, they'd strip out ARM specific things from the
releases macOS XNU code. Then they started leaving that stuff in!

~~~
vbezhenar
What about their embedded OS? Like one they're running on T2 chip or inside
Airpods? Is it still based on XNU kernel?

~~~
my123
T2 is a variant of A10 and runs XNU for the main CPU.

Firmware cores and AirPods run Apple RTKit.

------
Austin_Conlon
Oral History of Avie Tevanian starting with the Mach segment:
[https://youtu.be/vwCdKU9uYnE?t=3995](https://youtu.be/vwCdKU9uYnE?t=3995).

------
wolfspider
Actually stumbled across this the other day- had no idea the Mach kernel was
being used on Apple hardware before the Intel transition. MachTen was a paid-
for Mach kernel based around BSD4.4
[https://en.wikipedia.org/wiki/MachTen](https://en.wikipedia.org/wiki/MachTen)

~~~
pram
Technically they were anyway, since XNU/OSX were on PowerPC ;P

------
self_awareness
After being integrated into XNU through the OSFMK kernel, it's not microkernel
anymore.

~~~
pjmlp
It will become one again with the new user space drivers being introduced in
Catalina.

As announced at WWDC, it will be a progressive transition, for every new
driver model being supported as user space driver, the related kernel space
APIs will be automatically deprecated and removed in the following OS release
the year after.

~~~
favorited
Are _all_ drivers going to be userspace in Catalina, or just the user-
installed kernel extensions? I just assumed Apple's OS drivers would still be
in kernelspace.

~~~
nineteen999
Maybe I'm wrong, but I always thought that the "classic" description of a
microkernel was one that implemented not only device drivers, but filesystem
drivers and possibly other key components in userspace (memory manager?) as
well. At least that is how I remember MINIX 1.0 as described in the Tanenbaum
book.

I appreciate there is a lot of grey area with microkernels, and a lot of
hybrid designs these days, since as was the case with Mach/XNU/Windows NT,
"pure" microkernel designs have often shown less than optimal performance due
to additional context switching.

------
cpeterso
Google is developing Fuchsia as a possible replacement for Android's Linux
kernel. Are there any reports or rumors about Apple developing a next-
generation kernel (perhaps with a safe language like Swift and optimized for
mobile devices) to eventually replace Mach/XNU and its historical baggage?

~~~
pjmlp
They are moving all drivers to user space.

All the ones corresponding to the former IO Kit do require C++, all the
remaining categories are going to be supported from Swift as well.

This is planned to take place across several releases, at the end of which no
kernel drivers will be any longer allowed.

~~~
andrekandre
> They are moving all drivers to user space.

probably you mean 3-rd party drivers? or even apple internally?

[edit] i couldn’t tell that from the slides mentioned below, but maybe i
missed something

> All the ones corresponding to the former IO Kit do require C++

yea, which leads me to believe if apple was to rewrite the kernel, they
probably would go with c++ ... or maybe it’s just that swift doesn’t have its
embedded chops up to snuff yet...

relevant slides:

[https://devstreaming-
cdn.apple.com/videos/wwdc/2019/702vygot...](https://devstreaming-
cdn.apple.com/videos/wwdc/2019/702vygott3n041/702/702_system_extensions_and_driverkit.pdf?dl=1)

~~~
pjmlp
Watch the presentation, as it contains more information as the slides.

The long term roadmap is as follows:

1 - surface kernel APIs for a specific driver model as userspace API

2 - deprecate for the respective OS release the kernel entry points related to
the newly surfaced driver model

3 - remove the kernel api on the following OS release

4 - rinse and repeat untill there aren't any kernel driver APIs left

The ones being released with Catalina are just the first wave.

~~~
andrekandre
thanks!

i’ll watch the video then, very interesting

~~~
cpeterso
Here's the video of that "System Extensions and DriverKit" session from WWDC
2019:

[https://developer.apple.com/videos/play/wwdc2019/702/](https://developer.apple.com/videos/play/wwdc2019/702/)

~~~
andrekandre
thanks, watching it now ^^

