
Zstandard v1.4.0 - terrelln
https://github.com/facebook/zstd/releases/tag/v1.4.0
======
_wmd
Zstandard is awesome, there is nothing else close to it that I've seen. There
are plenty of codecs that give you either obscene ratios or low CPU usage, but
none that support both in combination. Zstandard on its highest setting easily
competes with xzip for anything I've thrown at it. Decompression throughput is
unaffected by compression setting -- it just burns more memory. So you can get
xzip-like ratios with decompression throughput approaching 1GB/sec

It allows trading lower CPU for gzip-or-worse compression, and you can mix the
settings within a single file. This means you can e.g. use the lowest setting
(or no compression at all - it supports that too) to append to a file, while
occasionally recompressing recent appends into a single block using the
highest setting -- so the cost of compression can be amortized

The only petty annoyance with it is ecosystem support - e.g. GNU tar has no
option for it, so it's slightly more painful to work with

~~~
curlypaul924
In my experience, for the best balance between compression speed and
compression ratio, nothing beats 7zip with the right options:

    
    
      -mmt=$(nproc) # use all available cores
      -ms=off       # disable solid archives (compress each file separately)
      -m0=lzma2     # lzma2 has better threading than lzma1
      -md=64m       # dictionary size
      -ma=0         # "fast" mode
      -mmf=hc4      # hash chain match finder
      -mfb=64       # number of "fast bits"
      -mf=off       # disable filters
    

The biggest gains are: 1) using all available cores, 2) setting the match
finder (the binary tree match finders are terribly slow; I haven't played much
with the newer patricia tree match finders), 3) disabling solid archives (this
seems to cause 7zip to distribute the work more evenly between cores, though
it still may only use a few cores if there are many small files), 4) using
"fast" mode (whatever that is, it gives a noticeable performance boost and
doesn't seem to affect compression ratio much).

Every few years I try zstd and others, and for the data I work with (primarily
a mix of json and fixed-width-field binary data), lots of tools beat 7zip out
of the box, but they fall short of 7zip with the above command-line options.

~~~
terrelln
A comparable zstd call that uses a 64 MB window size and all cores is:

    
    
        zstd --long=26 -T0
    

From there you can tune the compression level, or increase the window size up
to 2 GB (--long=31). zstd won't beat the compression of xz, but it can
compress much faster if you trade off some space.

------
drej
There are basically three scenarios where I choose various compression
algorithms (a few exceptions excluded):

\- maximum compatibility (while tolerating low performance) - gzip

\- great performance (while tolerating larger files) - snappy

\- very good performance with good (not best) compression ratios - zstd

I don't really want to use Any New Shiny Algo to compress some data that might
outlive this piece of software, that's why I use gzip very often, because I
know I'll always be able to decompress it. But I've been increasingly adopting
zstd and snappy for one single reason - they are becoming widely supported
within the ecosystems I work in (data processing).

That, to me, is more important than compression ratios and decompression
speeds.

~~~
d33
How does snappy compare to lz4?

Also, for gzip, you might want to consider using its multithreaded version,
called "pigz".

~~~
terrelln
lz4 compresses and decompresses faster than snappy, and compresses similarly.
You can see some comparisons on the GitHub's readme
[https://github.com/lz4/lz4](https://github.com/lz4/lz4).

------
znedw
I've been using zstd compression on btrfs for a while now and it's excellent,
most of my stuff is already compressed (movies, music) but my home directory
(which is mostly comprised of text files) has shrunk greatly.

~~~
terrelln
The next GRUB release (grub-2.04) includes my patch to add support for zstd
compressed BtrFS filesystems, which should solve one of the major pain points
of Zstd BtrFS compression.

~~~
cmurf
Cool! Looking forward to that.

A temporary work around in the meantime, I've used `chattr +C` on the
directories I want to be exempt from zstd compression, so that grub can read
those files.

------
usefulcat
We have many terabytes of large files that are currently compressed using xz
(well, pixz actually). In terms of compression speed and ratio, zstd is pretty
comparable, and for single-threaded decompression it's faster. At this point
the only thing stopping us from using it instead of xz/pixz is the fact that
multi-threaded pixz decompression is faster. Are there any plans to add MT
decompression to zstd?

~~~
terrelln
There aren't any technical limitations to adding multithreaded decompression
zstd. We just need a compelling enough use case to justify the work it would
take to add it.

pzstd is now obsoleted by zstd -T0, but it offers multithreaded decompression
for files compressed by pzstd (it will still be single threaded for files
compressed by zstd).

~~~
usefulcat
Thanks, I didn't know about pzstd, will check it out. See my reply below for
more details of my particular use case.

------
jasonhansel
How widely is this being adopted compared to (say) Brotli?

~~~
terrelln
Disclaimer: I'm a maintainer of zstd, so I'm biased.

Brotli dominates HTTP compression. Zstd just got its RFC approved a few months
ago, but Brotli has been present in browsers for years.

However, zstd is more widely adopted everywhere else, especially in lower
level systems. Zstd is present in compressed file systems (BtrFS, SquashFS,
and ZFS), Mercurial, databases, caches, tar, libarchive, package managers (rpm
and soon pacman). There is a pretty complete list here
[https://facebook.github.io/zstd/](https://facebook.github.io/zstd/).

Again, I'm biased because I know almost everywhere where zstd is deployed, but
not everywhere that Brotli is.

~~~
Svenstaro
With the zstd support in pacman, arch will soon switch all packages over to
zstd.

~~~
jasonhansel
Out of curiosity: Why doesn't pacman just use HTTP's built-in compression? It
could cache packages in gzipped form, but there's no reason to re-compress
them over the wire.

~~~
Svenstaro
Because mirrors aren't guaranteed to use HTTP gzip compression and also gzip
compresses much worse than xz.

------
albertzeyer
Older discussion:
[https://news.ycombinator.com/item?id=16226923](https://news.ycombinator.com/item?id=16226923)

Some of the obvious competitors are brotli and snappy.

Here some comparison:

[https://quixdb.github.io/squash-benchmark/](https://quixdb.github.io/squash-
benchmark/)

[http://www.mattmahoney.net/dc/text.html](http://www.mattmahoney.net/dc/text.html)
(I wonder if anyone has an updated picture with Pareto front for these
numbers.)

~~~
JyrkiAlakuijala
Mahoney's benchmark is missing the large-window brotli numbers, which are
about 5 % better than those of zstd and 10 % better than those of small-window
brotli.

Brotli with large window can do around 199M for the 1G text corpus:
[https://groups.google.com/forum/m/#!topic/brotli/aq9f-x_fSY4](https://groups.google.com/forum/m/#!topic/brotli/aq9f-x_fSY4)

Here a large window aggregate result view:
[https://encode.ru/threads/2947-large-window-brotli-
results-a...](https://encode.ru/threads/2947-large-window-brotli-results-
available-for-a-several-corpora)

------
ac29
Anyone know a decent Windows implementation?

There's a 7zip fork that includes Zstd support, but it can only put Zstd
inside a .7z container, which doesn't appear to work with any other tools.

~~~
svnpenn
The linked site has windows builds...

~~~
ac29
True! I suppose what I actually want is a Windows utility that can make a
.tar.zst archive, ideally from a GUI.

In the Windows world, archiving and compression are usually in a single file
type (.zip, .rar, .7z). Zstd follows the unix style where it can't directly
compress folders of files, they need to be in a tar (or other archive format)
first. This isn't really an issue on Linux, since Zstd support is built into
tar, which ships on pretty much every system.

~~~
chungy
There haven't yet been any extensions to zip or 7z for zstd support. There is
a branch of wimlib that has experimental zstd support, though it's unlikely it
will ever be merged into the master branch.

You could make uncompressed zip or 7z files and compress that independently as
a zst file, but that's a bit baroque compared to just using tar. :)

7-Zip does often seem bent on supporting everything, I imagine some day in the
future it'll support zstd at least as an independent archive, if not extending
the Zip and 7z formats as well.

~~~
mappu
GP mentioned this fork of 7-Zip with zstd support:

\- [https://github.com/mcmilk/7-Zip-zstd](https://github.com/mcmilk/7-Zip-
zstd)

The changes to the 7-Zip file format were discussed upstream, and upstream
agreed to not tread on the magic values:

\-
[https://sourceforge.net/p/sevenzip/discussion/45797/thread/a...](https://sourceforge.net/p/sevenzip/discussion/45797/thread/a7e4f3f3/?limit=25&page=2)

But ultimately the patches were not upstreamed. It's now onto its second
developer:

\-
[https://sourceforge.net/p/sevenzip/discussion/45797/thread/6...](https://sourceforge.net/p/sevenzip/discussion/45797/thread/6db98beb/?limit=25&page=0)

------
vbtechguy
Posted my zstd 1.4.0 compression benchmarks
[https://community.centminmod.com/threads/round-3-compression...](https://community.centminmod.com/threads/round-3-compression-
comparison-benchmarks-zstd-vs-brotli-vs-pigz-vs-bzip2-vs-xz-etc.17259/) \-
looking good :)

------
ryacko
HTTP compression should be disabled by default, 99% of web content downloaded
is already compressed.

~~~
algorithmsRcool
But this is flatly untrue.

Images and video sure, but everything from html, just, css, svg isn't
compressed.

In fact compression is critical to modern SPA frameworks to keep initial
download times lower.

Also this post has little to do with http compression. ZSTD is used it many
other circumstances.

~~~
deathanatos
And for resources that don't compress well, web servers like nginx (and I
presume others) support listing what mimetypes to compress, so they won't
double-compress those things.

