Hacker News new | past | comments | ask | show | jobs | submit login

There's a write-up here with some details on the why/how and a few experiments (deduping >56T in 125 raw disks): https://blog.heckel.io/2019/07/22/deduplicating-ntfs-file-sy...



(Side note: I can't believe that the day I post this on HN my hosting provider is deciding to do a 10 hour maintenance...)


nice write up. interesting idea too... very cool... one question though: I wonder what ZFS + DeDup or Windows Storage Spaces + DeDup would do with the same data, just out of curiosity...


The idea here is to do this on a very very very large scale. My employer has a > 500 PB cloud of (mainly Windows) backups. Right now everything is on ZFS, but we're thinking about moving to a block store (like Ceph or Swift) with this dedup idea.


That said, block level dedup on NTFS will lead to better dedup ratios, but at the cost of a lot more metadata and a very high metadata to data ratio.


That makes more sense then. Plus, the larger the data set, the more chance (in theory) of duplicates... especially windows system files, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: