
ZFS Deduplication - bensummers
http://blogs.sun.com/bonwick/entry/zfs_dedup
======
gstar
That's excellent - dedupe makes a huge difference in most storage
environments.

Shame ZFS has a slightly indeterminate future.

~~~
nailer
Sun's storage team would work really well either as an autonomous unit within
Oracle or a seperate company (unfortunately the last is unlikely unless the
staff themselves do a 'JRuby').

Solaris is doing poorly as a general purpose Unix OS to sell Sun support /
hardware for, but as an embedded hardware/software appliance Sun kit is both
better and cheaper than its counterparts, and they have some extremely
talented engineers (though they've bled 27% of their staff since last year).

~~~
gaius
_a seperate company_

One word: 3Par.

(Longer version: 3Par was founded by ex-Sun types who wanted to make storage
appliances based on Solaris. Sun responded by charging them outrageous
licensing fees. They went with Linux and now their business is doing just
fine...)

~~~
bensummers
Now Solaris is open source as OpenSolaris, it's possible for someone other
than Sun to build a storage appliance without having to worry about license
fees.

As an example, here's some people doing it right now:
<http://www.nexenta.com/>

------
dcurtis
Why would this be very useful? How often do you store two copies of the same
data on one disk?

Edit: let me be more clear. When do you have situations where duplicate data
is stored on the same disk and the best way to deal with it is through the
filesystem?

If you're talking about a webapp with lots of users uploading the same photo
or something, isn't that better handled before you hit the filesystem, so that
you have dedupe over a number of independent disks/locations?

~~~
a2tech
All the time. One of Netapp's HUGE selling points is dedupe support.

~~~
nailer
And in the SAN space too - Hitachi make a lot of money on dedupe products.

------
m_eiman
Does anyone have real-world data for the likelihood of SHA256 hash collisions
for actual user data (music, movies, documents, source code, etc)?

In my toy backup app I'm doing pretty much what they're doing - I assume that
each block with 1MB of data will have a unique hash. I haven't tested it on
serious amounts of data though, and so I'm very curious to know how likely
this scheme is to survive an encounter with the real world.

~~~
viraptor
I don't like the way they present the result of how unlikely hash collisions
are... The number (2 __-256) gives much more comfort than saying:

If you hash 4KiB-long blocks, then every possible block will share the hash
value with (on average) 128 different 4KiB blocks. And on a standard 200GiB
disk you can fit (more or less) 52,400,000 blocks.

This explanation is a bit less reassuring. Now consider the fact that your
data is never random and you hit the same patterns all the time (loads of
zeros / ascii letters / x86 code)

~~~
bensummers
The same patterns thing is mitigated by the property that changing one bit in
the input is (supposed to) change on average half of the bits in the hash.

You can also be reassured that there's lots of research going on about how
likely these collisions are and how to find them. People are actively trying
to break these hash algorithms, so it's not just in theory.

------
atamyrat
Compress the folder with your favorite zip program and the compression ratio
would be very good approximation of performance gain you will get out of zfs
dedup.

Web archiving is another application that can benefit from this. You can crawl
the same website 10 times a day and just store all of the files as it is.

~~~
bensummers
Compression works inside relatively short chunks of the files, compared to
huge amounts of storage. Plus if you use zip, you're only looking at
similarities within individual files. I don't think the results would be
useful.

Incidentally, ZFS also has an option to compress the data. You have a choice
of a fast but not so wonderful algorithm, or gzip level 1 to 9. Since de dupe
is at the block level, I believe you can combine the two.

~~~
atamyrat
Why it wouldn't be useful?

While it might be true for some compression formats/programs, it is not the
case for .tar.*, as the directory is archived to the single file first (tar),
and then compressed. So if you have similarities within 2 different files,
that will be exploited.

I think to make something work for "large blocks" that already does a good job
for "small chunks" is just finding the right values for parameters of
compression algorithm used.

~~~
bensummers
It's not a useful measure because it's not the same thing at all. Dedup only
works if the block is aligned to a ZFS block size. Compression will find
blocks which have no particular alignment.

~~~
atamyrat
If they were same thing, you'd get the exact result, but not an approximation.

tar.x will compress chunks of data that is smaller than zfs block size. ZFS
dedup + ZFS compression will compress them as well, so what's the problem?

------
rbanffy
Let me guess: SPARC CPUs are very fast in creating SHA256 hashes, right?

And, BTW, I am in no means implying this is bad. Quite the contrary - I would
love to have an inexpensive SPARC-based desktop. If it existed.

~~~
Andys
New Intel CPUs now have a CRC32 instruction, which I am hoping Sun engineers
will take advantage of soon and add as a choice for ZFS.

~~~
rbanffy
CRC32? Compared to SHA256? Seriously?

~~~
Andys
I can't bring myself to trust dedupe without verification after every hash
match, so I plan to use it with a fast hash algorithm with full verify
enabled.

CRC32 is faster than ZFS's default of Fletcher2 and has less frequent
collisions.

