Very useful for identifying files that may need to get deduplicate or that can b...

JackSlateur · on Nov 5, 2023

Yes

Hardlinking files would be a dangerous idea, in my opinion

What is good with the FIDEDUPERANGE ioctl is that everything is transparent from the user space point of view: whatever are the files, whatever they are used for, nothing have changed

When you hardlink, you see that, at the current moment, files are the same, but then you assume that userspace wants those files to stay the same

To me, this sounds like a recipe for a disaster

This ioctl lives at the vfs layer. As of now, btrfs and xfs has implemented it. There is a merge request lurking around for zfs, and I have yet to check what is bcachefs' status on this The only top player left is ext4 :/

(disclosure: I work on duperemove)

jcynix · on Nov 5, 2023

> Unfortunately, I don't think this will also find identical directories.

Generate a hash over the list of hashes for a directory's content. That would allow you to detect identical directories. That directory hash is rather volatile and would need regeneration every so often, but that shouldn't be a major problem.

yjftsjthsd-h · on Nov 5, 2023

> ZFS has its own de duplicator built in, which is nice. It should just deduplicate files and individual extents of files by itself once you enable it. Probably not a good idea on very write-heavy disks, but it's an option.

It's also a memory hog, and I feel like there were other caveats but I don't remember for sure

DannyBee · on Nov 5, 2023

I sent a pull request to zfs that adds support for fiduperange, which makes all these tools work without having to turn on the large scale online deduplication. Instead it uses block cloning. Test it if you are willing!

keep_reading · on Nov 5, 2023

You can now dedicate an SSD or something fast to holding the deduplication tables so it's not a memory hog anymore

buildbot · on Nov 5, 2023

Huge memory impact, that never goes away - if you turn of dedup, you still need the massive deduplication map in ram for things that already where deduplicated.

JackSlateur · on Nov 5, 2023

Actually, you can, if you do offline deduplication, because memory can be replaced by storage

buildbot · on Nov 5, 2023

Interesting! I did not know that.