
Introduce ZSTD compression to ZFS - c0d3z3r0
https://github.com/openzfs/zfs/pull/10278
======
throw0101a
See BSDCan 2018 "Implementing ZSTD in OpenZFS on FreeBSD" by Allan Jude:

* [https://www.bsdcan.org/2018/schedule/events/947.en.html](https://www.bsdcan.org/2018/schedule/events/947.en.html)

* [https://twitter.com/allanjude](https://twitter.com/allanjude)

~~~
c0d3z3r0
This _is_ an iteration of Allan's code

~~~
DogRunner
It looks pretty mature. Why is this compression not implemented? I checked out
the youtube channel of ZFS, but it is nowhere mentioned in the years of
monthly sync ups.

------
d33
There's one thing I don't understand. Each time a new compression algorithm is
introduced, it's the Next Big Thing. Why isn't the implementation of the
algorithm as simple as linking in the related library, assuming they'd all
have a similar interface? After all, it seems like what you need is a header
and a function that converts a compressed block to a decompressed one and the
other way round. Where's the complexity from the API/implementation
perspective given that a library is already there?

~~~
tobias3
In this case it could use the Linux crypto API, which already has ZSTD support
and provides compress/decompress functions. But that is exported as GPLv2 so
ZFS needs to do its own thing. And idk if the API is sufficient w.r.t. to e.g.
workspace management/reuse.

W.r.t. to ZSTD: The usual thing is to provide a zlib compatible API, which
ZSTD does (
[https://github.com/facebook/zstd/tree/dev/zlibWrapper](https://github.com/facebook/zstd/tree/dev/zlibWrapper)
). But the ZSTD zlib compatibility layer causes lower performance, so it is
better to use it directly. Maybe all future compression algorithms can provide
a ZSTD compatible API, so we have to do less work in the future?

~~~
glandium
Zstd is dual licensed BSD and GPL. It is kind of weird the kernel limits it to
GPL modules.

------
foepys
lz4 is performing pretty well. Better than most zstd-fast settings. It will
not be obsolete after adding ZSTD and most likely stay the go-to algorithm if
you are unsure what to choose.

~~~
rakoo
FYI lz4 and zstd are made by the same guy, so there is no surprise both of
them complement each other. lz4 targets the (de)compress-as-fast-as-possible
domain, while zstd is for the rest (good (de)compression with slower speed).

For something that is write-once read-many-times, like a filesystem, a good
compression algorithm might be more interesting in the future. lz4 is more
targeted at write-once read-once, like file transfers for example

~~~
jfkebwjsbx
Isn't it the opposite? If LZ4 is optimized for decompression speed as you say,
then you would want to use it when you read many times, the same file, very
fast.

From a quick read, ZSTD looks more about saving space while keeping reasonable
speeds, both at write and read.

And I'd assume there are other algorithms that focus only on size, trading
speed for it.

~~~
rakoo
Depends what "read many times" means.

\- If you mean that the same archive will be distributed to many peers, such
as is the case in package distribution, then in practice archives will be read
only once by each process, so one "slow" compression will translate into
significant gains in added decompression speed. That's the reason Archlinux
switched to zstd for its packages ([https://www.archlinux.org/news/now-using-
zstandard-instead-o...](https://www.archlinux.org/news/now-using-zstandard-
instead-of-xz-for-package-compression/))

\- If you mean that the same archive will be read multiple times by the same
machine, I don't really know what kind of scenario that is; I'd deflate the
archive into its initial representation once and then let processes access
that folder directly. Note that zstd claims that it isn't that much slower in
decompression than competitors, even if you always use compressed archives the
difference will be minimal

zstd was built more or less to "replace" all formats that favor compression
over speed. From their benchmarks (which means what it means) whatever the
compression/speed ratio you want, zstd is going to be better than all of them,
with a hard exception on extremely fast speed that is still the kingdom of
lz4.

~~~
microcolonel
It also depends on your workload and the speed of your disks.

If your disks are faster than your decompression algorithm when that algorithm
is running alongside the rest of your workload (generally not the case) then
it can make sense to use the faster decompressor (lz4). In my understanding of
the tradeoffs of zstd though, having used it recently in an application,
chances are you have a free hardware thread that can saturate your disk
without affecting your compute workload.

~~~
jfkebwjsbx
Considering that nowadays many people have an SSD, for boot files that would
mean LZ4 is best.

But perhaps for your data files that you don't open often, ZSTD is best
because you save space on the SSD.

~~~
microcolonel
> _Considering that nowadays many people have an SSD, for boot files that
> would mean LZ4 is best._

That depends on how concurrent boot is, and how fast your CPU and memory are.
It may be true on a Celeron, but maybe not on a ThreadRipper.

Furthermore, if you look at the performance testing, the sequential read
performance was almost always better with zstd than with lz4, in ZFS; and the
zstd-fast mode was about as fast as the lz4 mode in sequential writes. This
may be a matter of their specific integration of lz4, but nonetheless it pays
to look at the actual numbers before drawing conclusions.

------
tpetry
An option in the future to write data with a fast zfs level so everything is
speedy and recompress blocks which have not changed in some time with a more
efficient compression ratio would be really great. So you would have almost no
performance penalty writing data and very high compression ratio for old data.

~~~
cmurf
Fast compression levels of a given algorithm means lower compression ratio.

I don't know if ZFS supports variable compression levels (maybe per dataset),
but Btrfs ZSTD support uses a mount option, e.g. mount -o compress=zstd:[1-15]

Thus it's possible to use a higher level (high compression ratio, slower
speed, more CPU and RAM) for e.g. an initial archive. And later use a lower
level (or even no compression) when doing updates. Writes use the compression
algorithm and level set at mount time; and it's possible to change it while
remaining mounted, using -o remount.

~~~
namibj
Btrfs is soon getting support for specifying the level on a per-file basis:
[https://github.com/kdave/btrfs-
progs/issues/184](https://github.com/kdave/btrfs-progs/issues/184)

