
Show HN: Deduplicating NTFS file systems (fsdup) - binwiederhier
https://github.com/binwiederhier/fsdup
======
binwiederhier
There's a write-up here with some details on the why/how and a few experiments
(deduping >56T in 125 raw disks):
[https://blog.heckel.io/2019/07/22/deduplicating-ntfs-file-
sy...](https://blog.heckel.io/2019/07/22/deduplicating-ntfs-file-systems-
fsdup/)

~~~
tiernano
nice write up. interesting idea too... very cool... one question though: I
wonder what ZFS + DeDup or Windows Storage Spaces + DeDup would do with the
same data, just out of curiosity...

~~~
binwiederhier
The idea here is to do this on a very very very large scale. My employer has a
> 500 PB cloud of (mainly Windows) backups. Right now everything is on ZFS,
but we're thinking about moving to a block store (like Ceph or Swift) with
this dedup idea.

~~~
binwiederhier
That said, block level dedup on NTFS will lead to better dedup ratios, but at
the cost of a lot more metadata and a very high metadata to data ratio.

~~~
tiernano
That makes more sense then. Plus, the larger the data set, the more chance (in
theory) of duplicates... especially windows system files, etc.

