Hacker News new | past | comments | ask | show | jobs | submit login

"Basically never" is an overstatement, but it is true to the point of "Never unless you already know why I said 'basically never'"

It boils down to the fact that ZFS maintains a mapping from hashes to LBNs. This allows write-time deduplication (as opposed to a scrubber that runs periodically and retroactively deduplicates already written blocks). This is somewhat memory intensive though. For smaller ZFS pools you can get away with just having lots of RAM (and with or without dedupe ZFS performs better the more RAM you have). For larger ones, you can add a SSD to act as additional disk cache.

Here's a quick description of that setup:


Note in this example that they were already showing 128GB of RAM for a 17TB pool; the L2ARC was to augment that. In general, ZFS was designed with a much higher RAM/Disk ratio than a workstation typically has.

ZFS is also very far away from the state of the art in online dedup. For instance, http://users.soe.ucsc.edu/~avani/wildani-icde13dedup.pdf has a theoretical dedup regime that needs only 1% of the RAM for 90% of the benefit.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact