Hacker News new | past | comments | ask | show | jobs | submit login

Except you ultimately want your buffer cache, virtual memory, and vfs tightly coupled because they're three sides of the same coin.

IMO, the ideal combo looks something like this FUSE/BPF work combined with XOK's capability based buffer cache rather than trying to split everything out into user mode.

This all gets back to the microkernel debates. As you say, it's far easier to implement such an architecture by stuffing a lot of the most important bits into a monolithic kernel. But various microkernel projects have shown this isn't necessary, and doing things this way has proven brittle and insecure.

So-called safe languages don't help, either, because the whole purpose of doing this in a monolithic, shared memory context is precisely because it's easier to move fast and break things in terms of unsafe optimizations (e.g. circular, direct pointer references) unburdened by careful, formal constraints. If written in Rust every other line of Linux code would be wrapped in unsafe{}. Most of the parts that needn't be would be better moved into user space, anyhow.

There is no uKernel out there that has anything on Linux wrt to FS perf. The uKernels have not shown that they've solved the FS problem as well as monolithic kernels.

This isn't due to Linux being faster, a capability based system that exposed an area of storage with an fs library would basically be app -> hardware vs app -> expensive cxt switch -> vfs -> fs -> hardware.

Right, the claim is that no one has actually demonstrated that, though. It sounds like it should be possible in theory but no microkernel has actually done so.

One big reason, I suspect, is that very little of the work in the VFS layer touches hardware. Reads of directory structure, metadata, and data are all handled from cache whenever possible, and writes are buffered. When reading, blocks are prefetched so nearby data is in memory, and when flushing writes, the kernel optimizes and orders them to be most efficient. A library that always handles reads and writes from hardware will be slower because it actually goes to hardware, no matter how many context switches it saves. And if you can't share a read cache and write buffers across processes, you're effectively going to hardware for all initial reads and when the program exits.

(Also, context switches aren't that expensive. They're not free, sure, but it's easily possible for software costs to outweigh it.)

The one system I've seen do that in a way that still allows you to multiplex the device and the buffer cache was XOK. But the case of sharing both securely required a very custom filesystem, and there was still a lot of work to totally validate the concept.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact