There's a write-up here with some details on the why/how and a few experiments (...

binwiederhier · on Oct 8, 2019

(Side note: I can't believe that the day I post this on HN my hosting provider is deciding to do a 10 hour maintenance...)

tiernano · on Oct 8, 2019

nice write up. interesting idea too... very cool... one question though: I wonder what ZFS + DeDup or Windows Storage Spaces + DeDup would do with the same data, just out of curiosity...

binwiederhier · on Oct 8, 2019

The idea here is to do this on a very very very large scale. My employer has a > 500 PB cloud of (mainly Windows) backups. Right now everything is on ZFS, but we're thinking about moving to a block store (like Ceph or Swift) with this dedup idea.

binwiederhier · on Oct 8, 2019

That said, block level dedup on NTFS will lead to better dedup ratios, but at the cost of a lot more metadata and a very high metadata to data ratio.

tiernano · on Oct 9, 2019

That makes more sense then. Plus, the larger the data set, the more chance (in theory) of duplicates... especially windows system files, etc.