
The Case for a High-Level Kernel-Bypass I/O Abstraction - matt_d
http://irenezhang.net/blog/2019/05/21/demikernel.html
======
AstralStorm
What does it even mean by high level?

Once the API is high level enough it gets unusable by major users who are high
end networking, GPU and graphics libraries and low latency sound.

Nobody else truly needs to bypass the kernel. Even low latency can work with
good real time task handling, making the users exactly two cases, who have
special DMA handling in hardware already. If it means introducing special case
kernel bypasses for high scale computing, it's already done, and the low level
APIs just get wrapped.

And the Achilles's foot is security.

If it's arguing for making all hardware a kernel free fabric, it's essentially
a move of everything to firmware. Worst case, we get zero memory protection
and unfixable bugs.

~~~
iyzhang
High-level means not exposing hardware limitations to the application. The
primary target applications are datacenter services, which spend much of their
time processing network I/O. As network latencies lower to a few microseconds,
datacenter applications like Redis will need kernel-bypass because the kernel
will become too expensive for them. In our experiments with a 25Gb network,
the Linux kernel and POSIX interface costs Redis 60% of its latency.

~~~
yourbandsucks
So what's the API look like?

You said you don't want to make users deal with flow control and hardware
details.. does that imply a userspace bypass library which does that stuff for
us? Does it look posixy?

~~~
iyzhang
It looks POSIX-like but uses high-level queues and fixes some issues with
epoll. The lack of an atomic data unit and the overhead of the poor epoll
interface cost too much to retain for kernel-bypass. Take a look at the paper
for more details.

~~~
amluto
Where’s the paper? After looking at your site, it’s not obvious to me what
paper to look at.

~~~
iyzhang
Paper can be found here: [http://irenezhang.net/papers/demikernel-
hotos19.pdf](http://irenezhang.net/papers/demikernel-hotos19.pdf)

------
truth_seeker
> we found that 30% of the cost of the Linux kernel comes from its interface.
> This overhead is just too much to carry around while using kernel-bypass
> devices.

One third of the cost is actually expensive !

Also, ScyllaDB NoSQL database(C++ clone of Cassandra) uses Seastar framework
to achieve high IO throughput.

[http://seastar.io/networking/](http://seastar.io/networking/)

~~~
iyzhang
I've updated the blog post with our experimental results from the Redis
benchmark. Here is a link to the graph: [http://irenezhang.net/img/demikernel-
redis-exp.jpg](http://irenezhang.net/img/demikernel-redis-exp.jpg)

------
iyzhang
BTW, the Demikernel will be open-sourced shortly .. as soon as I return from
giving a talk at KubeCon Europe.

------
justicezyx
I am surprised and disappointed that the original paper and the blog post has
_zero_ reference to unikernel research, despite the fact that unikernel pretty
much is the whole encompassing idea.

I am wondering whether or not this is a missing or a different understanding
the concept.

Edit: Sorry I did not really get the difference between library OS and
unikernels.

It's still a lack of reference considering their connections.

~~~
iyzhang
The Demikernel is not a unikernel. It is a library OS compiled as a series of
shared libraries. It is not compiled together with the application and doesn’t
take into account what features the application uses. It is designed to work
with kernel-bypass hardware, like DPDK.

------
_pmf_
What about UIO?

~~~
iyzhang
It depends on the interface for the drivers to the application. However, UIO
doesn't seem to support DMA, which is a non-starter.

RDMA and DPDK both use user-space drivers, which is necessary for kernel-
bypass. I'm not advocating for a particular kernel-bypass solution. I'm
arguing that if we use kernel-bypass for I/O, we should have a common,
efficient, high-level interface for it.

