> It doesn't really depend on NVMe, that's just the OS sucking.
I've spent so much of my professional career profiling source control access patterns. Hot cache tends to be OS VFS layer performance, but the moment you hit disk that dominates, unless the disk is NVMe (or, back in the day, PCIe flash storage). Further compounding this is the use of a naive LRU cache on some OSes, which means that once the cache size is exceeded, linear scans absolutely destroy performance.
> FUSE
So you might think that, but FUSE turns out to be very hard to do correctly and performantly. I was on the source control team at Facebook, and EdenFS took many years to become stable and performant enough. (It was solving a harder problem though, which was to fetch files lazily.)
I believe Microsoft tried using a FUSE equivalent for the Windows repo for a while, but gave up at some point.
We're still talking about different things here. I'm saying the entire "VCS scans file system to sync state" is the wrong algorithm. It's unecessary work because there are two sources of truth.
Forgot the constant factors of FUSE, and imagine an in-kernel git implementation. If you have a Merkel CoW filesystem, then when (ignoring journals) you modify child files, you need to update parent directories on disk anyways, this is a great time to recompute VCS hashes too.
"git status" is, if the journal is flushed and hashes are up to date, always an O(1) operation.
What you're describing is reasonably similar to EdenFS, except EdenFS runs in userspace.
Watchman layers a consistent view of file metadata on top of inotify (etc), as well as providing stateless queries on top of EdenFS. It acts as a unified interface over regular filesystems as well as Eden that provides file lstat info and hashes over a Unix domain socket.
In truth, getting these two components working to the standard expected by developers was a very difficult systems problem with a ton of event ordering and cache invalidation concerns. (With EdenFS, in particular, I believe there was machine learning involved to detect prefetch patterns.) For smaller repos, it is much simpler to do linear scans. Since it is really fast on modern hardware anyway, it is also the right thing to do, following the maxim of doing the simplest thing that works.
I've spent so much of my professional career profiling source control access patterns. Hot cache tends to be OS VFS layer performance, but the moment you hit disk that dominates, unless the disk is NVMe (or, back in the day, PCIe flash storage). Further compounding this is the use of a naive LRU cache on some OSes, which means that once the cache size is exceeded, linear scans absolutely destroy performance.
> FUSE
So you might think that, but FUSE turns out to be very hard to do correctly and performantly. I was on the source control team at Facebook, and EdenFS took many years to become stable and performant enough. (It was solving a harder problem though, which was to fetch files lazily.)
I believe Microsoft tried using a FUSE equivalent for the Windows repo for a while, but gave up at some point.