Hacker News new | past | comments | ask | show | jobs | submit login
Copy-on-write performance and debugging (microsoft.com)
80 points by meysamazad 4 days ago | hide | past | favorite | 34 comments





I had to read an early blog to figure out what it was:

https://devblogs.microsoft.com/engineering-at-microsoft/dev-...

"Copy-on-write (CoW) linking, also known as block cloning in the Windows API documentation, avoids fully copying a file by creating a metadata reference to the original data on-disk. CoW links are like hardlinks but are safe to write to, as the filesystem lazily copies the original data into the link as needed when opened for append or random-access write. With a CoW link you save disk space and time since the link consists of a small amount of metadata and they write fast."

It seems there is a MacOS implementation: https://github.com/dotnet/runtime/pull/79243

But it seems that this is .Net specific and not something that would speed up other build systems? It is confusing if this can apply to other build technologies other than .NET. Can it speed up TypeScript/JavaScript builds? Can it speed up Rust builds? Also what are the speed ups on these other platforms like MacOS and Linux?

Is this something that all build systems and all OSes would benefit from?

I guess this blog post for me raises more questions than it answers.


The block cloning feature in newer versions of Windows is enabled by copy-on-write filesystems. macOS ships with one by default - APFS. Linux also has BTRFS, and before that there was the ZFS-on-Linux project. Microsoft is now shipping a CoW filesystem for Windows that appears to be a ReFS derivative.

This is .NET specific insamuch as this is getting MSBuild to take advantage of ReFS features; but there's no particular reason why other build systems couldn't take advantage of ReFS in the same way MSBuild can take advantage of APFS. The main question is if the build system needs to make lots of copies of files that may not ever be updated. I imagine anything that does dependency fetching (especially Node/NPM) would benefit.


Linux XFS has also added copy-on-write a few years ago.

Initially I was not aware of this and I was surprised when I have copied a directory with a total size greater than 50 GB and the copy was instantaneous. At first I believed that I had given some wrong command, but then I searched the XFS documentation and I saw that this was a new feature at that time.


Volume Shadow Copies (Win Svr 2003) on NTFS were the first implementation by Microsoft of CoW. But it was limited to VSS snapshots, so not useful for day-to-day storage usage.

DevDrive is not a derivative of ReFS, it is ReFS with some file system filter bits turned off among a couple of other things. DevDrive is a collection of features centered around ReFS for the purposes of speeding up file read/writes (think node modules).


I know that npm now has a per-user cache in ~/.npm:

https://docs.npmjs.com/cli/v7/commands/npm-cache

I am not sure if it uses CoW to bring those packages into each project. If it did, that would be efficient and speed up "npm install" if the cache was warm.


Language package managers don't need copy-on-write, because there's no "write" — the files that make up dependencies are immutable from the perspective of the projects that they get installed into. There's no advantage to using CoW to "deploy" such files into work trees, over using plain-old hard links to do so. (And hard-linking these files is indeed what all the Node package managers — other than NPM — already do.)

Not npm but it’s used by pnpm for awhile.

https://github.com/pnpm/pnpm/issues/1505


Certain filesystems like XFS do support CoW copying, and ZFS also does chunk-based deduplication. You'd typically use it through `cp --reflink` and similar.

It could speed up other build systems, but the .NET build system (MSBuild) has a particular design issue where by default it will copy dependencies local to each project that's using them (Copy Local). This leads to assemblies being copied multiple times throughout the filesystem according to the build process.

The article talks about CoW as feature of ReFS, while the linked PR in dotnet/runtime is about adjusting the way File API issues calls on macOS so they take advantage of APFS's CoW instead.

Sorry, I was confused trying to decipher exactly what was going on. I apologize.

Have been running ReFS on a drive on my Windows 10 workstation for about three years, and recently started using a dev drive equivalent on Windows 10 for the past two months. Our Unreal Engine project is quite large, 600+GB straight from the P4 depot before building. I need to keep a few separate workspaces around, one for current development work, one for swarm reviews, one for "let me test out a thing that might break" because as we know, branching in Perforce can quite painful, especially on large depots. At one point I needed to have dozens of workspaces synced to specific changelists whilst we hunted down a bug in one of our levels.

ReFS, with block de-duplication and LZ4 compression has reduced the per-workspace footprint to around 10% of what it was previously. Decreased build times by around 5% and decreased archive, stage and package times by about 80% by deploying MSBuild SDK CopyOnWrite. I also moved the DDC onto the VHDX where the project resides which has further reduced the footprint of the project.

Windows 11 canary channel (still in canary I think) has a modified Win32 that supports CoW FileCopyEx. You can get similar gains by other means on Win10 and Win11 by using ReFS CoW aware utilities.

Have used XFS, BTRFS, APFS and others extensively over the years, so I am glad that Windows is finally getting in on the action.


> We ran into this problem on one machine that had run continuous CoW builds for weeks under a prerelease CoW-in-Win32 implementation, so we don’t expect this to appear in the wild very often.

That's not exactly confidence inspiring.


I'm hoping they mean the prerelease implementation was only creating leaks due to bugs that have been fixed, so a machine that runs the release implementation for the same amount of time wouldn't see such behavior.

You know, I've always been kinda amused that something very simple like "cat a b >c" or even "fa = open("a", O_APPEND | O_WRONLY); fb = open("b", O_RDONLY); sendfile(fa, fb, NULL, 0x7ffff000);" doesn't really have either user-visible specialized API nor under-the-hood speed ups in the FS implementations. It's just gluing two files together, it's got to be a very popular operation, about as popular as "prepend the contents of file A to file B". But you can't do it in-place which is kinda annoying when you have to preserve the existing files' attributes.

> about as popular as "prepend the contents of file A to file B"

I've never understood why filesystems don't easily support prepending data, or to truncate the start of a file.

It should be, as far as I can see, about as trivial to support as appending and truncating the end, and is something that comes up quite often in application code. Even if it is a bit more tricky, I think the benefits would be great in the cases where it's needed.

Instead applications are left having to rewrite the contents.


A queue is simpler to implement than a deque and the same is true of a file system: supporting growing the file in both directions is more complicated than supporting growing the file in one direction. In practice, append is much more common than prepend, so the extra bookkeeping and code doesn't seem to be worth it in general.

That's much less of a concern ever since everybody switched to extent-based filesystems.

The real concern is block alignment.


On some Linux filesystems you can do this if the chunk to be inserted is a multiple of the block size.

See `FALLOC_FL_COLLAPSE_RANGE` and `FALLOC_FL_INSERT_RANGE` in `fallocate(2)`


> It's just gluing two files together, it's got to be a very popular operation, about as popular as "prepend the contents of file A to file B".

Realistically I don't think that's going to happen very often. What would be the use case? The only case I can think of is something like tar where you pack up a bunch of files as a single file, and you usually do compression at the same time in that case.


> What would be the use case?

Every time you need to extend the length of the field in the middle of your data (e.g., increase a variable-length-field in the header). Many file formats resort to trailer chunks specifically because of that (see ID3v2 of MP3 files, or ZIP). Or think about how SQLite's on-disk file format uses freelist of file pages: it's literally an embryonic file system in itself, built on top of the existing one.

Or hey — what about linking object files together? It's kind of like the merge sort, since you need to re-slice text/data/misc and then re-glue them. Traditional linkers would create several temporary files for each combined section and then merge those files together.


Yes, in theory, any filesystem could trivially add a feature of "ref-counted immutable extents" — where a special syscall equivalent to `cat a b c > d` could be implemented that creates an inode d that consists of references to the existing extents of a+b+c.

(The shared extents have to be immutable, because on non-CoW filesystems, filesystem locks apply to "byte ranges of inodes", not to extents or slices thereof; so extents could only be safely shared between inodes if they forced the inodes referencing them to act as if they were always reader-locked.)

You could even implement this on e.g. Linux ext4 today — you could consider extents immutable if they're part of an immutable (chattr +i) file that has no additional hard links; and you could prevent any files that are "sharing" immutable extents from being made non-immutable (where in the above, the syscall would create a file that is immediately immutable.)

This would basically result in the same semantics + efficiencies that you get with "composite uploads" in an object store.

---

Given a CoW filesystem, you could probably extend this concept to allow arbitrary CoW blocks to be explicitly referenced from file A into file B without any need for immutability — it'd just be an explicit "partial" reflink. (This is already possible for the simple A->B case, by starting with a CoW clone, and then overwriting the blocks that shouldn't be shared. But more complex cases like A+B+C->D above, aren't possible; nor is having those shared blocks be in a different position in the clone than they are in the original; and so forth.)

It wouldn't quite work like you're imagining with sendfile(2), though, because the CoW sharing could only occur at filesystem-block boundaries. You still wouldn't be able to use partial reflinking to optimize the operation of e.g. adding three bytes of header to a file (unless you also added BLKSZ-3 bytes of padding.)


So if I prepend 17 bytes to a file, where are they stored? And if I prepend another 47 bytes, etc.? How would this be tracked?

Same as going forward, you'd grab a free block and simply fill it backwards from the end instead of forwards from the beginning. But the file system would have to support file data starting mid-block and new blocks getting added to the head of the file. The problem is that there'd be more bookkeeping data to store, more code to implement it, and more edge cases to handle for concurrent writes.

That'd be hell for mmap, too.

Unironically, I think it'd be easier for purely mmap-based API to support such scenarios.

What attributes would be worth preserving that you wouldn’t otherwise be able to do?

Having the same permissions and the owner would be nice, which a bit annoying to pull off with the "write to a temporary file, then rename it over the original one" approach. Also, mtime/atime. And the xattrs, of course.

I think they use a lot of extra words to say that ReFS will support the equivalent of cp --reflink.

Oh man. My dream Git Successor combines a Virtual File System with a Copy-on-Write cache to allow repos to trivially commit all their dependencies including compiler toolchains.

Windows having CoW makes my far fetched dream a possibility.


Does anyone know if there is a way to convert the entire windows installation and all files into a Dev Drive format? Without losing any data or corruption

>Dev Drive was released

I tried that Dev Drive thing and I havent seen perf improvement when building C++ code, sadly.


And with all that said WSL2 still buffers file transfers in RAM...

Crazy to see Microsoft talking about performance like they have any expertise in the matter.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: