
Arrakis: The Operating System Is the Control Plane - jcr
https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter
======
Animats
_We have designed and implemented a new operating system, Arrakis, that splits
the traditional role of the kernel in two. Applications have direct access to
virtualized I /O devices, allowing most I/O operations to skip the kernel
entirely, while the kernel is re-engineered to provide network and disk
protection without kernel mediation of every operation._

That's the way IBM mainframes have worked since IBM VM, 1972.

There's a lot to be said for this. For historical reasons, most microprocessor
systems have an I/O architecture that's far too much like that of an 1970s
minicomputer - device registers with meaning determined by the peripheral.
Minicomputers had that because they couldn't afford enough transistors to do
mainframe-type channels. That problem went away a long time ago, but the
legacy architecture remains.

You need both channels with access control and peripherals designed for
channels to make this work without OS intervention. Peripherals may have
privileged functions - user programs may be restricted to an address range on
a disk, or an IP and MAC address on a network controller.

~~~
justincormack
SR-IOV is pretty much designed like this. The master PCI function has all the
privileged ops while the slave devices have none.

The Intel network cards (all the 10g ones, some 1g) at least have IP and MAC
filtering, and support at least 64 virtual network cards for each physical
port.

It has taken longer for storage, but NVMe has SR-IOV, which means you can
split out virtual drives without the OS having to check block ranges. Not
widely available yet, although Google cloud now has support[1].

[1] [https://cloud.google.com/compute/docs/local-
ssd](https://cloud.google.com/compute/docs/local-ssd)

------
stevelaz
This is pretty awesome! In the past I worked on a project which we implemented
a control-plane/data-plane separation architecture within Linux. The system
had two CPUs and we had Linux running on one with configuration apps and all
network related IO ran on the other CPU. The problem was that this was
implemented at the kernel level and whenever an application needed to share
data across the CPUs it was slow. The implementation could have probably been
better, but this was a long time ago and I can't recall everything that was
done. Regardless, Arrakis looks like a great project with a lot of potential.

Imagine this type of stack being used in an embedded system. I've worked on
embedded projects that achieved high throughput but in most cases there were
FPGAs and DSPs doing a lot work to help. Userspace-to-Kernel context switch
delays have always been a latency issue with any Embedded Linux system I've
worked on.

Arrakis looks like one would be able to achieve high performance without the
need for FPGA and or DSP (depending on the use case of course).

Side note: Cool, I noticed they're using lwip from Adam Dunkels. He's an
amazing programmer.

------
walterbell
This is using hardware isolation (IOMMU, SR-IOV) to reduce the need for
software (including kernel) isolation.

See also Intel SGX
([https://www.virusbtn.com/virusbulletin/archive/2014/01/vb201...](https://www.virusbtn.com/virusbulletin/archive/2014/01/vb201401-SGX)),
disaggregation platforms (seL4, Qubes, Genode) and userspace networking (Intel
DPDK, [https://01.org/packet-processing](https://01.org/packet-processing)).

------
jarcane
The main website:
[https://arrakis.cs.washington.edu/](https://arrakis.cs.washington.edu/)
Github:
[https://github.com/UWNetworksLab/arrakis](https://github.com/UWNetworksLab/arrakis)

Intriguing.

~~~
shmerl
Are there any bootbale images to play with? Or it's all build from source at
this stage?

------
marknadal
This seems like genuinely good stuff. I'm not a systems guy, so question: How
long would it theoretically take for somebody to hack NodeJS to use this? What
about getting NodeJS to use zero-copy buffers as be mentioned, by somehow
overriding JSON.stringify? Thanks.

~~~
Animats
Don't get too excited about zero-copy. Copying is cheap if the data was
recently referenced and is in cache. Conversely, if the data was put in memory
by a peripheral device, it costs almost as much as a copy to get it into the
CPU's caches.

The bookkeeping associated with zero-copy often exceeds the copying cost. This
was the curse of the original message-passing Mach implementation, which gave
microkernels a bad name.

~~~
marknadal
What are your thoughts on
[http://kentonv.github.io/capnproto/](http://kentonv.github.io/capnproto/) ?

~~~
Animats
That the sender can probably crash the receiver with a malformed offset in a
message.

I'd like to see marshalling as a language feature. It's compilable, done
often, and has an effect on performance. Many marshalling systems, from
OpenRPC to protocol buffers, use a precompiler. But that adds another level of
language.

~~~
geofft
> That the sender can probably crash the receiver with a malformed offset in a
> message.

Errr, that would be a major bug in capnproto. While it's definitely possible
the software has bugs, it's certainly a design constraint that the sender
absolutely cannot crash the receiver.

[http://kentonv.github.io/capnproto/faq.html#arent-
messages-t...](http://kentonv.github.io/capnproto/faq.html#arent-messages-
that-contain-pointers-a-huge-security-problem)

------
arstneio
The main enabler seems to be the fact that isolation is now redundantly
enforced in two places - both by the kernel, and by IO device drivers. In
return for making the hardware support non-optional, Arrakis eliminates the
kernel's involvement.

~~~
rbanffy
> The main enabler seems to be the fact that isolation is now redundantly
> enforced in two places

Can the kernel realize it and, if the hardware can manage isolation with OS
semantics, step aside?

This isolation support in hardware reminds me of the "Software on Silicon"
thing Oracle has shown on their new SPARC. Is offloading more and more
application and OS level logic to hardware going to explode?

------
MrDom
Darn, I thought this was going to be about audio consoles[1].

[1]: [http://arrakis-systems.com](http://arrakis-systems.com)

------
digi_owl
So in essence this is akin to running DOS inside a hardware assisted VM?

