Hacker News new | past | comments | ask | show | jobs | submit login

libfs [1] is a userspace library offered by fuchsia abstracting the traditional vfs (virtual filesystem interface), allowing the fs to exist wholly in userspace, without a kernel component.

Quoting: > Unlike more common monolithic kernels, Fuchsia’s filesystems live entirely within userspace. They are not linked nor loaded with the kernel; they are simply userspace processes which implement servers that can appear as filesystems

[1]: https://fuchsia.googlesource.com/fuchsia/+/master/zircon/sys...




> which implement servers

A "server" is an IPC mechanism; this is describing a way for one userspace process to serve filesystems to other userspace processes.

It sounds like the kernel has no built-in notion of a "filesystem", and filesystems just take advantage of the kernel's generic IPC mechanism, which is also used by a lot of other things. That's great – but it's still true that IPC must go through the kernel, and switching from one user process (the client) to another (the server) is a context switch.

It may be that the code also supports locating the client and server within the same process – I have not looked at it. But that's not what the documentation describes, so it's at least not the main intended operating mode.


A userspace program can completely avoid kernel IPC if it has no intention to expose the fs to other processes. Client and server code can exist within same "app", without IPC, in the same process


There are plenty of existing libraries that do exactly that. This isn't novel to Fuchsia. A good example is GNOME's GVfs https://en.wikipedia.org/wiki/GVfs , which is basically a plugin architecture to the standard GLib I/O routines. (Although as it happens, it still places the mounts in separate daemon processes.)

Other things that come to mind are SQLite's VFS layer https://www.sqlite.org/vfs.html , Apache Commons VFS for Java https://commons.apache.org/proper/commons-vfs/ , glibc's fopencookie(3) which lets you provide a custom, in-process implementation of a FILE * http://man7.org/linux/man-pages/man3/fopencookie.3.html , libnfs which even comes with an LD_PRELOAD https://github.com/sahlberg/libnfs , etc.

(And as others have pointed out, while client and server code can exist without IPC, as the names "client" and "server" would imply, that isn't the primary intention. The docs you link say, "To open a file, Fuchsia programs (clients) send RPC requests to filesystem servers ...." And even the terminology of a file system as a "server" isn't novel to Fuchsia; that's the approach the HURD and Plan 9 both take for filesystems, for instance.)



And, if I remember correctly, Minix 1.0 was talking about filesystem server


If you have the capabilities to the whole device.


You said, "avoiding expensive kernel<-->user-space switching", which is wrong. Filesystems are implemented entirely in user space, just not in the same user space processes. Consumers exist in separate processes from the producers--plural, because the underlying block device storage may be managed by processes separate from the processes managing VFS state. Context switches are a necessary part of having separate user space processes, and context switching through kernel space (or at least some protected, privileged context) is necessary in order to authenticate messaging capabilities.

Note that there are ways to minimize the amount of time spent in privileged contexts. Shared memory can be used to pass data directly, but unless you want all your CPUs pegged at 100% utilization the kernel must be involved somehow to optimize IPC polling. In any event, the same strategies can be used for in-kernel VFS services, so it's not a useful distinction.


The only reason to use an IPC or go through the kernel is to expose the fs to the rest of the OS. If an app doesn't intend to expose the fs, the entirety of the fs can exist within the app process.

Quoting: > "Unlike many other operating systems, the notion of “mounted filesystems” does not live in a globally accessible table. Instead, the question “what mountpoints exist?” can only be answered on a filesystem-specific basis -- an arbitrary filesystem may not have access to the information about what mountpoints exist elsewhere."


libfs is an abstraction layer around some of the VFS bits. An analogous Unix approach be would be shifting the burden of compact file descriptor allocation (where Unix open(2) must return the lowest numbered free descriptor) to the process rather than the kernel. (IIRC this is also actually done in Fuschia as part of its POSIX personality library.)

Notice in the above that it's implied that that actual filesystem server (e.g. that manages ext4 state on a block device) is in another process altogether. And so for every meaningful open, read, write, and close there's some sort of IPC involved.

A process accessing a block device directly without any IPC is something that can already be done in Unix. For example, you can open a block device in read-write mode directly and mmap it into your VM space. Also, see BSD funopen(3) and GNU fopencookie(3)[1], which is a realization of similar consumer-side state management, except for ISO C stdio FILE handles; it's simpler because ISO C doesn't provide an interface for hierarchical FS management.

There's no denying that Fuschia's approach is more flexible and the C interface architecture more easily adaptable to monolithic, single-process solutions. But it stems from it's microkernel approach which has the side effect of forcing VFS implementation code to be implemented as a reusable library. There's no reason a Unix process couldn't reuse the Linux kernel's VFS and ext4 code directly except that it was written for a monolithic code base that assumes a privileged context. Contrast NetBSD's rump kernel architecture where you can more easily repurpose kernel code for user space solutions; in terms of traditional C library composition, NetBSD's subsystem implementations have always fallen somewhere between Linux and traditional microkernels and so were naturally more adaptable to the rump kernel architecture.

[1] See also the more limited fmemopen and open_memstream interfaces adopted by POSIX.


The catch is, if you want to securely mediate access to a shared resource, you need to have something outside your protection boundary do it, be it a kernel or user-space server.


Cool, but isn't that just an embedded database that looks a bit like a filesystem? It's nothing all that new to use an mmapped embedded database in an application, and every major operating system with a GUI includes (or tends to be distributed with) a copy of sqlite.

Also, if the point is that you want to expose the filesystem to other applications, but have it local to your own, then why not expose it through FUSE, even if it's in your own process?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: