This would have been very handy about 17 years ago when I was doing a virtual CD-ROM driver, and needed to store compressed images, and the client wanted a wide range of speed/compression options for the user--much wider than zlib offered.
I ended up doing a hack that was either the most disgusting thing I have ever done, or was brilliant. I have not been able to decide which.
My hack: I gave them a compression/speed slider that went from 0 to 100. If the slider was set to N, I would apply zlib at maximum compression to N consecutive blocks, and then apply no compression to 100-N consecutive blocks. Repeat until the whole image is stored.
The client loved it.
Zlib is actually highly configurable - it has a huge number of tweaks beyond just that one flag of size.
DEFAULT_STRATEGY is pretty much only one possible strategy.
You can tweak the LZ77, RLE and Huffman separately to get different compression performance (the levels correspond to different LZ77 lookups and deflate_fast vs deflate_slow).
In my work, I use different zlib strategies for different data (i.e already bit-packed integers vs raw strings), which works out pretty well.
My current complaint is that the Zlib algorithm doesn't quite detect record boundaries when building a dictionary, which is all only relevant to the compression algorithm, because the output is eventually just a DEFLATE stream (Zopfli is an example of an encoder only Zlib library).
Zstandard has a much wider set of options to use already built-in (fast, greedy, lazy, btlazy, btopt, btultra) etc.
Small proposal for improvement: Instead of
N * "gzip -9" + (100-N) * "cat"
N * "gzip -9" + (100-N) * "gzip -1"
(Yes, I'm aware that zlib's output is slightly different from gzip's output, as gzip adds some more metadata on top of zlib, but this is irrelevant in terms of size and speed.)
In my experience, -9 is considerably slower than -7 but the gains in compression ratio are negligible. It's almost never a good trade-off.
I don't know the numbers, but there exists a frequency and duty cycle at which humans perceive the light to be brighter even though there's less energy used.
That is PWM, pulse width modulation.
Having seen some of the things my co-workers have implemented, however, I now disagree completely.
I imagine compression could be a real performance win when reading from optical media, too.
Anyone here have any experience developing games for optical-media-based games consoles? I imagine high-performance on-the-fly decompression could have been useful on the PS2, for instance. Or perhaps I'm wrong and the weak CPU and slow seek times dominate things.
Vaguely related game-data madness: How Naughty Dog Fit Crash Bandicoot into 2MB of RAM on the PS1, https://news.ycombinator.com/item?id=9737156
Edit: I see compression is indeed used in those contexts - https://news.ycombinator.com/item?id=16228840
Some CD-ROM games would actually store more than one copy of assets on the disc. It's much faster to read all your level data in one swoop than to use performance-killing seeks.
The client, a major Japanese software distributor who was going to be selling copies of the software to end users, wanted a way to let the user decide how they wanted to trade off ripping speed with disk usage.
That was a perfectly reasonable thing for the client to want, given his extensive knowledge of the hardware characteristics of the computers that were in widespread use among his potential customers, and his knowledge of Japanese usage patterns.
What I gave him did that.
So how exactly is he a moron for (1) asking for a reasonable feature, and (2) being happy that he received that feature?
I tried zlib's compression level settings. Maybe it's different in modern zlib implementations, on today's hardware, but a couple of decades ago on typical hardware of the day the 10 compression levels of zlib really were clustered around only two or three performance levels.
I'm the one who decided on the skipping block approach, and decided to give him a percentage slider. I didn't actually expect him to accept that as is. I expected he'd test it at various settings, and pick out 4 or 5 specific percentages and ask me to replace the slider with radio buttons that selected those, and give them descriptive names.
But a lot of users like to have more knobs and tweaks on their software, especially their utility software, so my guess is that based on his understanding of the market he thought that his customers were those kind of users so accepted the slider.
It is, and I do.
But if they still want it anyway, unless it poses a security problem or other ethical issue or would reflect badly on us (as doing something silly very publicly could do), then it comes down to "we get paid for it or someone else does".
For example, if you know you're dealing with text, you can use snappy, if you know you're dealing with images, webp, videos x264 (or x265 if you only care about decode speed and encoded size), etc and then fall back to zstd only when you don't have a specific compressor for the chosen file type.
On enwik8 (100MB of Wikipedia XML encoded articles, mostly just text), zstd gets you to ~36MB, Snappy gets you to ~58MB, while gzip will get you to 36MB. If you turn up the compression dials on zstd, you can get down to 27MB - though instead of 2 seconds to compress it takes 52 seconds on my laptop. Decompression takes ~0.3 seconds for the low or high compression rate.
<Article>... [30 kb of text] ...</Article>
In my experience, if compression time is not a factor, for text (non-random letters and numbers), lzip is the best. I recently had to redistribute internally the data from python nltk, and tried to compress/decompress with different tools, this was my result (picked lzip again):
gzip -9 10 m 503 MiB 31 s
zstd -19 29 m 360 MiB 29 s
7za a -si 26 m 348 MiB s
lzip -9 78 m 310 MiB 50 s
lrzip -z -L 9 (ZPAQ) 125 m 253 MiB 95 m
* 7za -m0=PPMd produced the smallest file being faster than bzip2
* bzip2 turned out to be way faster than both lz (684%) and xz (644%) and produced a smaller file
* xz is marginally faster than lz, compressed sizes are about the same with the xz file being a tad smaller
* without any switches 7za produces an archive a bit bigger than xz and lzip in about the same amount of time
* gzip and zst produce about the same compressed size, only zstd is a lot faster (517%) than gzip
The 7z file was produced using the -m0=PPMd switch. For the other files no command line switches were supplied. Here are the file sizes:
Was bzip2 slightly, or considerably slower than zst?
If I can remember correctly zstd = 0.2s, gzip = 0.8s, 7zip (PPMd) = 2.1s, bzip2 = 2.7s, lzip, xz, 7zip (lzma) = 15..16s. This is CPU time from memory, might not be fully accurate.
I'd say zstd and gzip is better suited for general use, while bzip2 and 7zip (PPMd) are better suited for high compression of text files.
An example of close number sequences is just simple graphs. Your CPU temperature is 78 degrees, most likely it'll be 78, 79 or 77 the next tick, so they're almost close, the delta's will be 0's and 1's usually.
The correct term from information theory is that it approximates a "universal" compressor with respect to observable markov or fsmx sources.
I haven't investigated either approach, but this is one of my next projects I'm working on, so I'll be figuring it out either way. I was wondering, does Zstd provide any ability to serialize/restore and continue where it left off?
(Context: Disk compression. Due to long-term factors beyond my control I've had severe disk space issues for over a decade (with frequently only tens of MBs free). Complex story ending in "cannot work". I recently realized that some of my disks contained data I didn't immediately need, and began wondering about ways I might be able to compress this data out "of the way" so I could carefully partition the remaining space. This would need to be done in-place as I would have nowhere with enough space to hold the compressed data, at least not to begin with.)
The last time someone gave me an old empty 320GB HDD they weren't using, one of my own disks started clicking about a week later and I was able to save everything on it. I still shake my head at that perfect timing.
Of course, this meant I lost all the additional free space, haha. One step forward, two steps back...
Heavy mmap usage (VMWare sometimes uses an mmap to hold a guest's memory, for example) doesn't show up as memory usage in any system monitoring tools, and the system starts to thrash long before apparent memory usage gets high. Maybe someone here has a solution to that?
The long range matcher has a configurable match size, where it will only look for matches that are at least that large. By default it is 64 B, but by making it larger, say 4 KB, you can ensure that if you are going to force a page fault, you get enough benefit.
That said, I'm not sure I'd call it a "real-time" compression algorithm. It's still an factor of 2x slower than lz4 compress and a factor of 4x slower decompress.
I can't see how that applies to Zstandard either.
With only a small delay, can I compress and decompress at least one second of data per second, indefinitely.
Think live TV versus film.
TL;DR: Your optimal compression algorithm will vary based on these parameters. There's no universal answer.
I'm not kidding. I couldn't believe it myself, but subsequent testing stubbornly bore it out—on one file that was 15 MB after compression, and on a mix of smaller files. I tried compiling gzip from source, using the same compiler I used for zstd, and the results were the same. strace seemed to show zstd read and wrote in chunks 2x the size, but the number of syscalls didn't seem to be nearly enough to explain the difference. It seems to have this "zlibWrapper" subdirectory; its README has some benchmark numbers that I don't fully understand, but some of them seem to match the 2/3 factor I got.
I'm wondering if this is a clever tactic to drive adoption—get people to use the zstd executable even when they're still using gzipped files. ;-)
Also, the fact that it has support for reading (and, apparently, writing) gzip, lz4, and xz on top of its own format really makes "z standard" an appropriate name.
Edit: Also in squashfs. Here's the git pull request which includes some benchmarks.
Compression speed, for example, doesn't really matter. Sure, it should be something reasonable (e.g. not like PAQ), but if you can bump the above metric by 10 % while compression takes 50 % longer: probably worth it.
In any case - thanks for releasing this, it's been very helpful to me.
Say I have 1000 files. I want to compress them and let the cron rsync do it's thing.
Next day, if only one file had changed, rsync should pickup only the differential instead of the whole archive.
Or is there a better way of doing it?
Does anyone know any better? It seems like we could use a better alternative to Snappy.
ORC has a reserved future enum for ZSTD  (my guess is that FB already uses it).
The original headache was the patent licensing scheme for the actual codebase, but now that hadoop has a ZStandard built into libhadoop.so, this is much more easy to ship to Apache.
My branch is so far behind now that I probably would implement it faster if I started over & used the libhadoop.so impl as the basis - https://github.com/t3rmin4t0r/orc/tree/zstd
 - https://issues.apache.org/jira/browse/ORC-46
 - https://github.com/facebook/zstd/issues/775
I guess general data compression works on audio and video, but most of the time you either choose to compress text, audio, video or you create a file format that indexes your data.
Also, you may want to play with borg options and use lz4