
Compress data more densely with Zopfli - speeder
http://googledevelopers.blogspot.com/2013/02/compress-data-more-densely-with-zopfli.html
======
JoshTriplett
A quick summary of the difference between this and existing implementations of
deflate such as gzip or zlib:

Deflate combines two compression techniques: Huffman coding to replace each
value with a string of bits whose length depends on the frequency of that
value, and backreferences of the form "go back N bytes and repeat M bytes from
there". (Glossing over a pile of fun interactions between those two, such as
Huffman-coding both literals and backreferences, and Huffman-coding the
Huffman tables. Not to mention the ability to have M > N, repeating the
earlier parts of the thing you backreferenced. See RFC1951,
<https://www.ietf.org/rfc/rfc1951> , for the full details.)

Existing deflate implementations use various heuristics to guess when they
should encode the next bit of data as a literal symbol or a backreference,
such as a hash table of possible strings to backreference; the compression
level (-1 through -9) tunes those heuristics to take more or less time, most
notably by changing the lengths of strings stored and searched for for in the
hash table.

Zopfil says it uses "more exhaustive compression techniques" based on
"iterating entropy modeling and a shortest path search algorithm to find a low
bit cost path through the graph of all possible deflate representations." The
entropy modeling allows Zopfil to estimate the effectiveness of a given
approach to deflate. The path-search algorithm effectively treats the space of
all possible deflate representations as a single-player game, with "moves"
that change the representation in some incremental way, and then uses standard
search algorithms to find a near-optimal representation.

(Without reading the source in detail, I don't know whether the search space
includes the choice of Huffman tables, the set of possible backreferences, or
both. I can see how either one could map onto a search space, though.)

------
kevinconroy
In case you're wondering how to try it, the site lacks a simple getting
started guide. Sure it only takes a few minutes to get to this, but for others
looking to try this with <30 seconds of work, do this:

    
    
      git clone https://code.google.com/p/zopfli/
      cd zopfli
      make
      chmod +x zopfli
      ./zopfli <FILENAME>
      Optional: Copy it to /usr/local/bin (or your favorite path location)

------
eknkc
Not a scientific experiment or something like that, just tried it on
uncompressed jQuery for fun;

Original file: 268381 bytes

Zopfli (with -i1000) 75730 bytes in 950ms

Gzip (with -9) 79388 bytes in 30ms

It's really slow but the difference seems significant enough. (They have more
data here:
[https://zopfli.googlecode.com/files/Data_compression_using_Z...](https://zopfli.googlecode.com/files/Data_compression_using_Zopfli.pdf))

~~~
Jach
As I suspected, XZ performs better at least for jquery. I was disappointed it
was missing from benchmarks.

For comparison on my machine:

zopfli -i1000 takes [real, sys]=[1069ms, 21ms], 75730 bytes.

xz -9 takes [148ms, 15ms], 69476 bytes.

(Though as mentioned as a pro of zopfli, xz would lose on decompression speed
for a larger file.)

~~~
andrewf
Yeah but the point here is to produce a .gz file which is compatible with
standard tools like gunzip, PNG decoders, your browser's handling of "Content-
Encoding: gzip", etc.

If you control the compressor and decompressor then applying new tricks to
producing a Deflate stream is far less interesting, just use a better
compressor.

~~~
Jach
Can we get browsers to support LZMA? :) (Hey, a bug report!
<https://bugzilla.mozilla.org/show_bug.cgi?id=366559>) But yeah, zopfli does
seem like an excellent fit when you can't force a particular decompressor.

------
moonboots
Zopfli can be used with nginx without code changes with the static gzip module
[1].

[1] <http://wiki.nginx.org/NginxHttpGzipStaticModule>

~~~
magnetikonline
Absolutely love this Nginx module - use it all the time in production. For me,
dropping in Zopfli into the "pre compress asset phase" will be trivial.

------
ck2
You can already get 3% more compression out of zlib by giving it a bigger
dictionary like 7zip can optionally do.

code.jquery.com/jquery-1.9.1.js

    
    
      original  268381
    
      gzip       76070      (32KB dictionary, 128 word size)
    
      Zopfli     75730
    
    

340 bytes is not worth a 20x time increase for compression.

~~~
afhof
340 bytes is not worth 20x time increase. 0.4% decrease in file size IS worth
the time. If you can measure the amount of time it takes to download those
last 340 bytes, then you can start to count how many downloads it takes before
its worth it. (If download speed was constant, it would be about 224
downloads).

PNG seems like a good candidate for this style compression. A constant time
initial cost for recurring savings in bytes transmitted seems like an
excellent use of CPU time to me.

As an aside, I ran the Zopfli program with 10000 iterations on
jquery-1.9.1.js:

    
    
      268381 jquery-1.9.1.js
       75622 jquery-1.9.1.js.gz

~~~
CJefferson
A fairer (in my opinion) comparison, with minimised jquery:

    
    
        92629 jquery-1.9.1.min.js
        32666 (35.26%) gzip -9
        31691 (34.22%) zopfli with standard compress
        31688 (34.20%) zopfli with max compress

~~~
ck2

         31783 gzip with 32KB dictionary and 128 word size
         31764 above with filename stripped
    

I am too lazy to strip out the filename so there is a few bytes penalty, so
it's more like 31764

So in this case it's

    
    
        31688 
          vs
        31764 
    

76 bytes

I guess the real question is what is the decode penalty time if any?

------
csense
Apparently only gzip and zlib compatibility is included in the code.

Does anyone know if there are programs out there that allow recompressing zip
or png format files with this library?

~~~
andrewf
The "PNG crunch" programs out there - eg OptiPNG - actually do a bit more
work. They'll try palette-ising a truecolor image, applying all the PNG
filters (<http://www.w3.org/TR/PNG-Filters.html>), etc.

They also tend to use forked versions of zlib, often to expose some more
internal knobs to trial-and-error on, so a zlib-compatible wrapper around
zopfli wouldn't necessarily just drop in.

But I'm sure support for zopfli will appear sooner rather than later :)

------
ArchD
It would be good to see some comparison with bzip2, which typically has better
compression ratio than gzip.

~~~
wmf
FYI, bzip2 has been obsoleted by xz which is both faster and gives better
compression. But as others have said, Google doesn't care because browsers do
gzip.

~~~
pjscott
In the future, browsers could also do xz; they'd just need to advertise it in
their Accept-Encoding header. Google could put this in Chrome any time they
like, and nothing would break.

~~~
dbaupp
As Jach points out above, Mozilla has had a bug[1] open about it for 6 years.

[1]: <https://bugzilla.mozilla.org/show_bug.cgi?id=366559>

------
DeepDuh
As a Swiss I'm curious how the name has come to be. Is the team related to ETH
Zurich?

~~~
Scaevolus
Lode Vandevenne wrote the blog post and the code. LinkedIn says he works at
Google Zürich.

~~~
DeepDuh
Thanks a lot. I'm still reluctant to use LinkedIn since much of it feels like
extortion to me.

Btw. I'm surprised that you've found the ü character. Not even US
customs/border protection let me enter my German name in the ESTA form.

~~~
roel_v
It's actually right in the post that he's based in Zurich.

~~~
DeepDuh
I could be wrong, but I think it was edited in. Otherwise scaevolus would
probably have pointed it out instead of going to hunt LinkedIn.

~~~
mhroth
I guess, Zopfli might be Zöpfli, which is a kind of braided Swiss bread.
<http://en.wikipedia.org/wiki/Zopf>

------
supervillain
Zopfli sounds like a pharmaceutical drug.

