Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Deduplicating NTFS file systems (fsdup) (github.com)
3 points by binwiederhier 13 days ago | hide | past | web | favorite | 6 comments

There's a write-up here with some details on the why/how and a few experiments (deduping >56T in 125 raw disks): https://blog.heckel.io/2019/07/22/deduplicating-ntfs-file-sy...

(Side note: I can't believe that the day I post this on HN my hosting provider is deciding to do a 10 hour maintenance...)

nice write up. interesting idea too... very cool... one question though: I wonder what ZFS + DeDup or Windows Storage Spaces + DeDup would do with the same data, just out of curiosity...

The idea here is to do this on a very very very large scale. My employer has a > 500 PB cloud of (mainly Windows) backups. Right now everything is on ZFS, but we're thinking about moving to a block store (like Ceph or Swift) with this dedup idea.

That said, block level dedup on NTFS will lead to better dedup ratios, but at the cost of a lot more metadata and a very high metadata to data ratio.

That makes more sense then. Plus, the larger the data set, the more chance (in theory) of duplicates... especially windows system files, etc.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact