
FUSE performance improvements with eBPF [pdf] - riyakhanna1983
https://www.usenix.org/system/files/atc19-bijlani.pdf
======
zenlibs
Fuchsia has an interesting take on filesystems [1]. One can write it
completely in the user-space, avoiding expensive kernel<\-->user-space
switching. Additional benefit of storage sand-boxing comes for free, as each
app can implement it's own fs, with the rest of the system unawares of it's
existence.

I wish such a fully-user-space option existed for Linux. This work is
philosophically in the opposite direction, moving more functionality into
kernel space for perf benefits.

[1]: [https://fuchsia.dev/fuchsia-src/the-
book/filesystems.md](https://fuchsia.dev/fuchsia-src/the-book/filesystems.md)

~~~
comex
That document describes filesystems which are accessed over IPC, not
filesystem-as-library like you seem to be describing. In fact, it's the same
basic idea as FUSE. One user process (accessing the filesystem) makes an IPC
call to another user process (server that implements the filesystem), which
necessarily passes through the kernel and performs a context switch in each
direction. On the other hand, it's quite possible that Fuschia's IPC is better
optimized than FUSE, so it might have better performance in practice.

~~~
zenlibs
libfs [1] is a userspace library offered by fuchsia abstracting the
traditional vfs (virtual filesystem interface), allowing the fs to exist
wholly in userspace, without a kernel component.

Quoting: > Unlike more common monolithic kernels, Fuchsia’s filesystems live
entirely within userspace. They are not linked nor loaded with the kernel;
they are simply userspace processes which implement servers that can appear as
filesystems

[1]:
[https://fuchsia.googlesource.com/fuchsia/+/master/zircon/sys...](https://fuchsia.googlesource.com/fuchsia/+/master/zircon/system/ulib/fs/)

~~~
comex
> which implement servers

A "server" is an IPC mechanism; this is describing a way for one userspace
process to serve filesystems to other userspace processes.

It sounds like the kernel has no built-in notion of a "filesystem", and
filesystems just take advantage of the kernel's generic IPC mechanism, which
is also used by a lot of other things. That's great – but it's still true that
IPC must go through the kernel, and switching from one user process (the
client) to another (the server) is a context switch.

It may be that the code also supports locating the client and server within
the same process – I have not looked at it. But that's not what the
documentation describes, so it's at least not the main intended operating
mode.

~~~
zenlibs
A userspace program can completely avoid kernel IPC if it has no intention to
expose the fs to other processes. Client and server code can exist within same
"app", without IPC, in the same process

~~~
geofft
There are plenty of existing libraries that do exactly that. This isn't novel
to Fuchsia. A good example is GNOME's GVfs
[https://en.wikipedia.org/wiki/GVfs](https://en.wikipedia.org/wiki/GVfs) ,
which is basically a plugin architecture to the standard GLib I/O routines.
(Although as it happens, it still places the mounts in separate daemon
processes.)

Other things that come to mind are SQLite's VFS layer
[https://www.sqlite.org/vfs.html](https://www.sqlite.org/vfs.html) , Apache
Commons VFS for Java [https://commons.apache.org/proper/commons-
vfs/](https://commons.apache.org/proper/commons-vfs/) , glibc's fopencookie(3)
which lets you provide a custom, in-process implementation of a FILE *
[http://man7.org/linux/man-
pages/man3/fopencookie.3.html](http://man7.org/linux/man-
pages/man3/fopencookie.3.html) , libnfs which even comes with an LD_PRELOAD
[https://github.com/sahlberg/libnfs](https://github.com/sahlberg/libnfs) ,
etc.

(And as others have pointed out, while client and server code can exist
without IPC, as the names "client" and "server" would imply, that isn't the
primary intention. The docs you link say, "To open a file, Fuchsia programs
(clients) send RPC requests to filesystem servers ...." And even the
terminology of a file system as a "server" isn't novel to Fuchsia; that's the
approach the HURD and Plan 9 both take for filesystems, for instance.)

~~~
batbomb
And Parrot VFS

[http://ccl.cse.nd.edu/software/parrot/](http://ccl.cse.nd.edu/software/parrot/)

------
cyphar
I saw the authors' talk at Linux Conf last year. It seems like an awesome
improvement but I'm actually far more interested in the "future work" which
can be done. Namely, this system could be expanded to data "caching" whereby
the kernel could route read(2) and write (2) to an underlying "struct file"
in-kernel. This would allow for effectively zero-overhead FUSE-based overlay
filesystems (which would be super useful for container runtimes -- especially
once OCIv2 is usable).

~~~
ashishbijlani
Author here. I’ve implemented this already and the kernel changes are
available on GitHub. Please read section 5.2 in the Usenix paper for details.

~~~
cyphar
Awesome! I hadn't worked through the entire paper before commenting. I will
definitely make use of this for container runtimes (I think we talked about
this after your talk). I believe you said you were working on your PhD at the
time, I hope it's going well for you. :D

------
Scaevolus
Here's a slide deck from the same authors:
[https://events.linuxfoundation.org/wp-
content/uploads/2017/1...](https://events.linuxfoundation.org/wp-
content/uploads/2017/11/When-eBPF-Meets-FUSE-Improving-Performance-of-User-
File-Systems-Ashish-Bijlani-Georgia-Tech.pdf)

tl;dr: perform metadata caching in the kernel using eBPF, avoiding context
switches for common operations like listdir() and getattr(), and reduce FUSE
overhead from ~18% to ~6%.

------
jakegold
FUSE is particularly useful for writing virtual file systems. Unlike
traditional file systems that essentially work with data on mass storage,
virtual filesystems don't actually store data themselves. They act as a view
or translation of an existing file system or storage device.

In principle, any resource available to a FUSE implementation can be exported
as a file system.

