
Demikernel: A library OS for kernel-bypass devices, now with Rust TCP/IP stack - drkp
https://github.com/iyzhang/demikernel
======
ncmncm
There is a huge amount of work around this, with a huge number of names for
what is always pretty similar in conception: parakernels, exokernels,
isokernels, now demikernels.

The common thread is that allowing the OS to interpose itself in the data path
makes it impossible to operate devices at full speed, so applications need to
bypass the OS. At the same time, you need some control over hardware access
and permissions, and some degree of hardware abstraction. How to draw the line
between application performance, on one side, and OS abstraction and hardware
sharing, on the other, is endlessly negotiated.

Those of us actually operating hardware at maximum rates, today, often use
proprietary vendor libraries to bypass the kernel. These tend to have an OS
module to map hardware resources into user process space, and a library to
operate hardware resources without adding overhead, typically involving a ring
buffer and the spin-loop polling on isolated cores that we were taught in
school indicated primitive system design. The result is that our applications
have a hundred or six lines of custom code for each vendor's gadget, that has
to be added to as new vendors enter and old ones retire.

eBPF access to devices is one interesting wrinkle on this, holding out a hope
of mainstream portability without compromising performance, running user code
directly on target hardware.

------
peter_d_sherman
[http://irenezhang.net/papers/demikernel-
hotos19.pdf](http://irenezhang.net/papers/demikernel-hotos19.pdf)

Excerpt:

"Researchers have long predicted the demise of the operating system [21, 26,
41]. As datacenter servers increasingly incorporate I/O devices that let
applications bypass the OS kernel (e.g., RDMA [12] and DPDK [15] network
devices or SPDK storage devices), this prediction may finally come true. While
kernel-bypass devices do eliminate the OS kernel from the I/O path, they do
not handle the kernel’s most important job: offering higher-level
abstractions. This paper argues for a new high-level, device-agnostic I/O
abstraction for kernel-bypass devices. _We propose the Demikernel, a new
library OS architecture for kernel-bypass devices._ "

That's the WHY of the Demikernel...

~~~
easytiger
> While kernel-bypass devices do eliminate the OS kernel from the I/O path,
> they do not handle the kernel’s most important job: offering higher-level
> abstractions

Well solarflare/openonload allow you to bypass the kernel and change literally
none of your code via LD preload.

~~~
ncmncm
That can work, but if you want maximum performance, you need to use the ef_vi
library, in openonload, that needs (a fair bit of) custom code. Exalink has
libexanic, Napatech has their thing. libexanic is surprisingly elegant, and
you can do a lot of the work (e.g. extracting a timestamp) while the rest of a
packet arrives. Netronome has an eBPF way to allow you to run packet-handling
code right in the NIC, maybe even freeing up a core.

Solarflare has ruled the roost, but Xilinx bought them out, and the future of
their NICs is cloudy. Mellanox used to be a big deal; now they are part of
NVIDIA. Mellanox and Solarflare (like Napatech) have spent a great deal of
effort to make kernel bypass work for clients running in VMs.

~~~
easytiger
Yea. Funnily enough I wrote a library in ef_vi for UDP sending with mixed
results. Just UDP/MC dispatch and it was a little like grappling with an
underdeveloped library. Pretty fast but I'm sure I hit some kind of memory
barrier bug at one point.

The thing with onload is the sheer simplicity which makes it an easy sell.
Also handy to tweak socket options with env vars. They also have a direct tcp
lib if you need a some extra nanos. Templates sends too.

Not heard of netronome and never used libexanic.

These approaches mean you have to invest more in external passive monitoring
tools as is stats are of little help.

------
easytiger
What devices is it compatible with?

