
Performance of Solaris' ZFS LZ4 Compression - d0vs
http://jomasoftmarcel.blogspot.com/2017/04/is-there-performance-impact-when-using.html
======
DiabloD3
Please note: Oracle does not implement the same version of ZFS as everyone
else does. Sun chose the OpenZFS project as the steward of ZFS, and Oracle
chose to never integrate OpenZFS upstream into their version of Solaris (which
it itself is also an incompatible fork of the actual Solaris steward project,
Illumos née OpenSolaris).

Since OpenZFS already implements LZ4 compression (and has so for quite some
time), this is yet another feature that, once enabled, will stop you from
importing your incompatible pool into anything that actually implements ZFS.

~~~
throwaway2048
I think its a bit much to pretend that oracle somehow doesn't have "real" ZFS
and solaris, even if you dont like what they have done with them and they are
incompatable.

~~~
al452
Meh. Whatever one calls "real", "incompatible with everyone else" was the real
point, and it's a strong one.

------
CJefferson
This really is too brief a study (although it's obviously fine for someone to
write a quick blog-post about whatever they want).

Most importantly, how fast is the disk? I suspect (but would benchmark if I
really needed to know) that the effects of compressions will be greatly
different on an older 7,200 rpm spinning disk, vs a modern SSD.

~~~
lathiat
It's a very good question because his copies are stupidly slow. Only 15MB/s.
You could probably compress that in real time on a raspberry Pi!

It's a very poor test.

------
herf
Most people say lz4+ZFS is a net win and you should usually enable it by
default.

The big "gap" is probably between lz4 and gzip. e.g., for compressing logs,
where gzip compresses a _lot_ more but is terribly slow.

I hope zstd could be used for this case someday:
[http://facebook.github.io/zstd/](http://facebook.github.io/zstd/)

~~~
dmit
I imagine zstd's license will hamper its corporate adoption, especially among
the big players.

    
    
      The license granted hereunder will terminate,
      automatically and without notice, if you (or any
      of your subsidiaries, corporate affiliates or
      agents) initiate directly or indirectly, or take
      a direct financial interest in, any Patent
      Assertion: (i) against Facebook or any of its
      subsidiaries or corporate affiliates...
    

[https://github.com/facebook/zstd/blob/dev/PATENTS](https://github.com/facebook/zstd/blob/dev/PATENTS)

------
gtirloni
It must be fine on a small test system, with CPU idling, etc.

I've worked with a few "ZFS appliances" from Sun (256-512TB range, NFS/iSCSI
shares, 1-2k clients) and would never enable any advanced features on those
(compression, dedup, etc). They were awfully unstable when we did that.

Granted, that was 5 years ago but I don't see any indication this technology
has evolved significantly with all the drama surrounding Oracle, licensing,
forks, etc. Just not worth the trouble these days, IMHO.

~~~
feld
Conpression is fine. Dedup has always been the problem because it was rushed.

~~~
rincebrain
I don't think it's dedup being "rushed" that's a problem - implementing dedup
is often done "offline" (like with NTFS's implementation, or btrfs), so the
data gets written as unique at first, and then eventually something runs
through, finds duplicates, and rewrites history to point all the duplicate
instances to one copy.

But ZFS deeply hardcodes assumptions which mean you don't get to rewrite
history like that, so it gets to do it synchronously (and keep all the ever-
growing data structures required for this in memory for all writing).

I don't think an arbitrarily larger amount of time or money behind it would
have permitted a better implementation, short of a ZFS2 and an in-place
migration tool.

~~~
gigatexal
Dedup can be done right if the system has enough ram.

~~~
dom0
I don't know much about ZFS' deduplication, just heard that it requires a lot
of memory, in a "hard minimum amount" way, to do it. This suggests, to me,
that at least one design element of their deduplication engine is poor.

Efficient deduplication is design-wise a rather difficult problem with many
trade-offs and issues which can blow your lower torso clean off when done
wrong.

I don't think there is a system (beyond sheer coincidence, which seems rather
unlikely given the complexity of the problem space) that can support good
deduplication in an "added on later" way.

E.g. ext4 and btrfs have extent sharing which _does work_ , but is completely
inefficient (time). ZFS seems to be inefficient as well (space).

I'm off the cuff not aware of an open source deduplicating file system that
does not have these issue. There are the deduplicating archivers (borg,
restic, some others), but these are neither meant nor want to be general-
purpose filesystems (although borg offers a ro FUSE FS with satisfactory
performance).

~~~
wolfy
Dragonflybsd's HAMMER filesystem seems to fit the bill nicely. There's even an
option to limit the maximum amount of memory used for deduplication. Look up
memlimit in the manual page: [https://leaf.dragonflybsd.org/cgi/web-
man?command=hammer&sec...](https://leaf.dragonflybsd.org/cgi/web-
man?command=hammer&section=8)

