Deflate combines two compression techniques: Huffman coding to replace each value with a string of bits whose length depends on the frequency of that value, and backreferences of the form "go back N bytes and repeat M bytes from there". (Glossing over a pile of fun interactions between those two, such as Huffman-coding both literals and backreferences, and Huffman-coding the Huffman tables. Not to mention the ability to have M > N, repeating the earlier parts of the thing you backreferenced. See RFC1951, https://www.ietf.org/rfc/rfc1951 , for the full details.)
Existing deflate implementations use various heuristics to guess when they should encode the next bit of data as a literal symbol or a backreference, such as a hash table of possible strings to backreference; the compression level (-1 through -9) tunes those heuristics to take more or less time, most notably by changing the lengths of strings stored and searched for for in the hash table.
Zopfil says it uses "more exhaustive compression techniques" based on "iterating entropy modeling and a shortest path search algorithm to find a low bit cost path through the graph of all possible deflate representations." The entropy modeling allows Zopfil to estimate the effectiveness of a given approach to deflate. The path-search algorithm effectively treats the space of all possible deflate representations as a single-player game, with "moves" that change the representation in some incremental way, and then uses standard search algorithms to find a near-optimal representation.
(Without reading the source in detail, I don't know whether the search space includes the choice of Huffman tables, the set of possible backreferences, or both. I can see how either one could map onto a search space, though.)
git clone https://code.google.com/p/zopfli/
chmod +x zopfli
Optional: Copy it to /usr/local/bin (or your favorite path location)
Original file: 268381 bytes
Zopfli (with -i1000) 75730 bytes in 950ms
Gzip (with -9) 79388 bytes in 30ms
It's really slow but the difference seems significant enough. (They have more data here: https://zopfli.googlecode.com/files/Data_compression_using_Z...)
For comparison on my machine:
zopfli -i1000 takes [real, sys]=[1069ms, 21ms], 75730 bytes.
xz -9 takes [148ms, 15ms], 69476 bytes.
(Though as mentioned as a pro of zopfli, xz would lose on decompression speed for a larger file.)
If you control the compressor and decompressor then applying new tricks to producing a Deflate stream is far less interesting, just use a better compressor.
I've not used xz, I'm sure it's good, but the point isn't really about which compression algorithm is the best in the world, but which works best in the (depressingly limited) real world conditions.
LZMA compression has gained a lot of traction over the past few years.
32653 gzip -9
31763 7z -tgzip -si -mx=9 -mfb=258 -mpass=16
31691 zopfli -i1000
time size format
n/a 987640 uncompressed
18.44 210896 gzip
281.44 143128 xz
3604.21 200640 zopfil
Not worth it.
gzip 76070 (32KB dictionary, 128 word size)
PNG seems like a good candidate for this style compression. A constant time initial cost for recurring savings in bytes transmitted seems like an excellent use of CPU time to me.
As an aside, I ran the Zopfli program with 10000 iterations on jquery-1.9.1.js:
Wish more people understood this aspect of web scale! well said :)
32666 (35.26%) gzip -9
31691 (34.22%) zopfli with standard compress
31688 (34.20%) zopfli with max compress
31783 gzip with 32KB dictionary and 128 word size
31764 above with filename stripped
So in this case it's
I guess the real question is what is the decode penalty time if any?
bzip2 can get it to 65536 bytes which is a whopping 10k smaller than zopfli
Oh wait, I see that zopfli is deflate compatible so that means browsers can still decode it. Fascinating.
On-demand xz will probably never be worth it however, the encode time just takes too long. You would have physically transferred the data by the time it has finished compressing.
Does anyone know if there are programs out there that allow recompressing zip or png format files with this library?
They also tend to use forked versions of zlib, often to expose some more internal knobs to trial-and-error on, so a zlib-compatible wrapper around zopfli wouldn't necessarily just drop in.
But I'm sure support for zopfli will appear sooner rather than later :)
Btw. I'm surprised that you've found the ü character. Not even US customs/border protection let me enter my German name in the ESTA form.