> It doesn't really depend on NVMe, that's just the OS sucking. I've spent so mu...

Ericson2314 · 2025-09-02T14:15:44 1756822544

We're still talking about different things here. I'm saying the entire "VCS scans file system to sync state" is the wrong algorithm. It's unecessary work because there are two sources of truth.

Forgot the constant factors of FUSE, and imagine an in-kernel git implementation. If you have a Merkel CoW filesystem, then when (ignoring journals) you modify child files, you need to update parent directories on disk anyways, this is a great time to recompute VCS hashes too.

"git status" is, if the journal is flushed and hashes are up to date, always an O(1) operation.

sunshowers · 2025-09-03T00:53:32 1756860812

You might be interested in how this problem was solved by our team at Meta, in EdenFS (https://github.com/facebook/sapling/blob/main/eden/fs/docs/O...) and Watchman: https://github.com/facebook/watchman.

What you're describing is reasonably similar to EdenFS, except EdenFS runs in userspace.

Watchman layers a consistent view of file metadata on top of inotify (etc), as well as providing stateless queries on top of EdenFS. It acts as a unified interface over regular filesystems as well as Eden that provides file lstat info and hashes over a Unix domain socket.

Back in the day, Watchman sped up status queries by over 5x for a repo with hundreds of thousands of files: https://engineering.fb.com/2014/01/07/core-infra/scaling-mer... I worked directly on this and co-wrote this blog post.

In truth, getting these two components working to the standard expected by developers was a very difficult systems problem with a ton of event ordering and cache invalidation concerns. (With EdenFS, in particular, I believe there was machine learning involved to detect prefetch patterns.) For smaller repos, it is much simpler to do linear scans. Since it is really fast on modern hardware anyway, it is also the right thing to do, following the maxim of doing the simplest thing that works.