
Improving compression at scale with Zstandard - felixhandte
https://code.fb.com/core-data/zstandard/
======
karavelov
> Two years ago, Facebook open-sourced Zstandard v1.0...

Bullshit, Zstd was open-source from the very beginning, they just hired Yann
and moved the project under facebook org. How do I know? I have written the
JVM bindings [1] since v0.1 that are now used by Spark, Kafka, etc.

EDIT: Actually, my initial bindings were against v0.0.2 [2]

Kudos to FB for hiring him and helping Zstd getting production ready. This is
just a PR false claim.

[1] [https://github.com/luben/zstd-jni](https://github.com/luben/zstd-jni)

[2] [https://github.com/luben/zstd-
jni/commit/3dfe760cbb8cc46da32...](https://github.com/luben/zstd-
jni/commit/3dfe760cbb8cc46da3268af6aa73dce6014298ef)

~~~
felixhandte
Given that it was Yann himself who wrote that sentence, I think that's a
needlessly uncharitable interpretation. Maybe a better wording would have been
"Two years ago, we released Zstandard v1.0, an open source ...". But I don't
think we anticipated anyone would read that much into it.

~~~
jedisct1
Blog posts from companies are always reviewed by the marketing team, that
frequently changes what the author initially wrote, or adds random stuff to
make it sound more like a company effort.

~~~
mikorym
This also seemed to me like one of those "team effort", "good job, guys"
surreptitious edits.

------
IvanK_net
My browser loaded that website with a header: accept-encoding: gzip, deflate,
br ("br" means Brotli by Google)

The response had a header: content-encoding: gzip

Zstandard looks like an improvement of DEFLATE (= gzip = zlib) and its
specification is only 3x longer (even though it is introduced 22 years later):
[https://tools.ietf.org/html/rfc8478](https://tools.ietf.org/html/rfc8478)

Since Zstandard is so simple and efficient, I thought it would get into
browsers very quickly. Then, it could make sense to compress even PNG or JPG
images, which are usually impossible to compress with DEFLATE.

~~~
JyrkiAlakuijala
With internet speed significantly above the decompression speeds zstd is
favorable. With internet speeds below the decompression speeds brotli is
favorable because less bytes need to be transmitted.

Usual users have internet speeds of a 10 MB/s or so, and brotli is more
favorable up to around 200 MB/s (2 gbps) internet speeds. It is also not just
about speed, but mobile users need to pay less with brotli as less bytes are
transferred.

Further (I'm not an expert on this, somewhat speculative) but likely the
streaming properties of brotli are slightly better, i.e., less bytes are
needed to hide in a buffer to be able to decode bytes. This may allow the
browser to issue new fetches for urls in an html document earlier with brotli
than with zstd.

~~~
IvanK_net
From what you say, it sounds like you think, that Brotli has a better
compression ratio, than Zstdandard.

According to this chart, Zstandard has a better compression ratio, while
compressing and decompressing faster than Brotli:
[https://facebook.github.io/zstd/](https://facebook.github.io/zstd/)

~~~
JyrkiAlakuijala
It is a cherry picked data point, and not representative for brotli. There,
facebook shows numbers with quality 0. However, when they compress for the
web, they use brotli with quality 5 for dynamic content and quality 9 (or 11?)
for static content.

At higher quality settings brotli wins in compression density, and
particularly so with quality 11 or with short files.

Quality 0 is an irrelevant corner case of maximal compression speed, and is
completely impractical for most uses.

If you plot the whole curve, brotli's density is above that of zstd's for
compression speed/density -- when both algos are used with same backward
reference window size.

------
m0zg
I hope they pay greater attention to the low and high end of their compression
ratio spectrum. On the low end, it'd be great if it could exceed lz4 in terms
of speed and memory savings. On the high end it'd be great to exceed XZ/LZMA.

Right now it's impressive "in the middle", but I find myself in a lot of
situations where I care about the extremes. I.e. for something that will be
transferred a lot, or cold-stored, I want maximum compression, CPU/RAM usage
be damned, within reason. So I tend to use LZMA there if files aren't too
large. For realtime/network RPC scenarios I want minimum RAM/CPU usage and
Pareto-optimality on multi-GbE networks. This is where I use LZ4 (and used to
use Snappy/Zippy).

At their scale, though, FB is surely saving many millions of dollars thanks to
deploying this, both in human/machine time savings and storage savings.

~~~
terrelln
We recently added negative compression levels that extends the fast end of the
spectrum significantly. We are also working on incrementally improving the
strong end of the compression ratio spectrum. We don't expect plain zstd to
compress stronger than xz, but we hope to close the gap some.

~~~
m0zg
Ah, I didn't know, thanks! I'll have to re-test.

~~~
JyrkiAlakuijala
With zstd you end up on average 5-6 % worse than LZMA in density on a variety
of large test corpora, but decodes 8x faster. Brotli with --large_window 30
can get within 0.6 % of LZMA, and decodes 5x faster than LZMA.

[https://encode.ru/threads/2947-large-window-brotli-
results-a...](https://encode.ru/threads/2947-large-window-brotli-results-
available-for-a-several-corpora)

------
valarauca1
I'd really like to thank Cyan for their contributions. `zstd` and `lz4` are
great. I'm pretty much exclusively using `zstd` for my tarball needs in the
present day as it beats the pants off `gzip` and for plane text code (most of
what I compress) it performs amazingly. (shameless self promotion) I wrote my
own tar clone to make usage of it [1].

It is nice to have disk IO be the limiting factor on decompression even when
you are using NVMe drives.

[1] [https://github.com/valarauca/car](https://github.com/valarauca/car)

------
stochastic_monk
The best thing about zstd is its zlibWrapper, which lets you write code as if
you’re consuming zlib-compressed files while transparently working with zlib-,
zstd-, or uncompressed files. I build several of my tools with zstd for this
reason.

~~~
loeg
zlibWrapper seems useful if you're specifically using the low level zlib APIs
already and don't want to change your code very much. But if you just use
zopen() or similar, or are willing to make minor changes, I don't see much
benefit. (Especially given the performance gap vs native zstd APIs).

I have seen some fopencookie(3)-based zstd/zlib/xz/etc FILE-object wrappers
floating around that make it pretty easy to work with any compression
library's streaming APIs.

~~~
stochastic_monk
zlib is standard in my field, so being compatible is a big plus. I should do
some testing with the zstd API, though. Thanks for the performance discrepancy
heads-up!

------
golergka
I used zstandard to compress mesages in P2P multiplayer game engine, and,
taught on our real-life packets, it got us 2x-5x improvement. Awesome library,
will use it in any similar project from now on.

------
josephg
How does zstd compare with brotli? Would it be a better compression standard
for http responses?

~~~
felixhandte
They compare pretty closely. The squash benchmark is a nice interactive
comparison tool that usually agrees with my own benchmarks. See for example
[1]. (Note that you should experiment with different input texts to get a
sense for the variability in relative performance.)

I am working on that very question! Zstd's support for creating and using
custom dictionaries opens the door to significant efficiencies. As described
in the post though, dictionaries make compression a more complicated thing to
use, and so there are lots of questions about how to apply that to the public
internet in a way that's cross-compatible and secure. In short: it's something
we're actively exploring.

[1] [http://quixdb.github.io/squash-
benchmark/unstable/?dataset=e...](http://quixdb.github.io/squash-
benchmark/unstable/?dataset=enwik8&machine=s-desktop&speed-
scale=logarithmic&visible-plugins=zstd,brotli,lz4,lzma,zlib)

~~~
pmarreck
Browsers could cache dictionaries per MIME type per domain which are provided
by the server. Would be super cool, if a bit more complicated.

~~~
felixhandte
Yep, that's in the vicinity of the solutions we're thinking about. There are a
few proposals out there, and prior art like SDCH [1].

The hard part is that compression is already an attack vector for the web
(e.g., CRIME [2], BREACH [3], et al.). We want to make sure that we're not
eroding or unduly complicating that situation [4].

[1] [https://tools.ietf.org/html/draft-lee-sdch-
spec-00](https://tools.ietf.org/html/draft-lee-sdch-spec-00) [2]
[https://en.wikipedia.org/wiki/CRIME](https://en.wikipedia.org/wiki/CRIME) [3]
[https://en.wikipedia.org/wiki/BREACH](https://en.wikipedia.org/wiki/BREACH)
[4] [https://tools.ietf.org/html/draft-kucherawy-httpbis-dict-
sec...](https://tools.ietf.org/html/draft-kucherawy-httpbis-dict-sec-00)

------
koolba
> And the zstd binary itself includes a dictionary trainer (zstd --train).
> Building a pipeline for handling compression dictionaries can therefore be
> reduced to being a matter of gluing these building blocks together in
> relatively straightforward ways.

What happens if your user data trained dictionary ends up storing user data
and you receive a GPDR destruction request?

~~~
bowmessage
_standard GDPR cop out disclaimer that 's something about needing to keep the
data for technical reasons_

I wish I knew more about it, but that's what I keep hearing.

~~~
karavelov
The dictionary data in not personally identifiable, so no need to do anything.
At least this is my understanding of GDRP, I am not a loyer.

------
jclay
I find the first chart so hard to understand. The axes need labels, and the
color scheme is not ideal. They should use different line styles, and add a
caption below summarizing the findings. There's a reason journals often
require graphs to be formatted this way.

This is a resource I've found helpful:
[https://www3.nd.edu/~pkamat/pdf/graphs.pdf](https://www3.nd.edu/~pkamat/pdf/graphs.pdf)

"Consider readers with color blindness or deficiencies"

"Avoid colors that are difficult to distinguish"

~~~
jclay
That being said, can anyone able to decode this share some cases in which this
would be better suited than zlib?

~~~
terrelln
The x axis is compression speed, and the y axis is compression ratio.

Zstandard outperforms zlib in compression ratio, compression speed, and
decompression speed (not shown). The only reason to stick with zlib is for
compatibility with systems that expect zlib.

~~~
jclay
That sounds fantastic. What is the porting process generally like? Are there
any possibilities to create an API compatible wrapper to make it a drop in
replacement for zlib?

~~~
terrelln
Porting is generally very easy.

* If you already have the compression algorithm tagged, through a file extension, or a field, then you can use that to dispatch to the right decompression algorithm.

* Zlib, gzip, xz, zstd, ... all have headers. If you are using zlib, and switching to zstd, you simply have to check the first 4 bytes for the zstd header using ZSTD_isFrame() [0], or attempting to decompress with zstd and if it fails fall back to the previous decompression algorithm.

* The zstd CLI can decompress both zstd and zlib/gzip if compiled with zlib support.

* Zstd provides a wrapper around the zlib API so you could transparently switch to zstd. [1]

[0]
[https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L1409](https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L1409)
[1]
[https://github.com/facebook/zstd/tree/dev/zlibWrapper](https://github.com/facebook/zstd/tree/dev/zlibWrapper)

~~~
jclay
Great, thanks! Last question.

On the blog you mention you are underway porting the internal code to replace
Zlib with Zstd. Is there a reason you decided not to use the wrapper as a
first pass to migrate all uses of Zlib to Zstd across the entire codebase?

~~~
terrelln
There are a few reasons we haven't used the wrapper.

* The larger services require tuning to get the best performance out of zstd, and we use some advanced options.

* We have a "Managed Compression" library which does zstd dictionary compression, which doesn't work with the wrapper.

* We have our own automatic decompression framework that handles many algorithms [0].

* A lot of use cases switched over from other algorithms than zlib.

* A lot of use cases switched over to zstd organically, without our involvement, since it was such a clear win.

[0]
[https://github.com/facebook/folly/blob/master/folly/compress...](https://github.com/facebook/folly/blob/master/folly/compression/Compression.h#L522)

------
jzawodn
Am I the only one getting sick of "at scale"?

~~~
erikb
It's Enterprise slang for "we are big, so we assume everything we do works
better at big scale, but please don't check if it's true, just trust us". (in
this thread it might be true, usually it isn't though)

