
Google compression algorithm Zopfli may lead to faster Internet - paulschlacter
http://news.cnet.com/8301-1023_3-57571982-93/google-compression-algorithm-zopfli-may-lead-to-faster-internet/
======
Negitivefrags
So we distribute our game client over HTTP and it's something like 4GB
compressed using gzip.

When I saw zopfli, my immediate reaction was that it should save a non-trivial
amount of money for us while leading to a slightly faster download for our
users. Hearing a 100 times increase in compression time doesn't sound like
much compared to the number of times the files are downloaded right?

The last incremental patch we did to the game took 79 seconds in our build
system to compress with the usual gzip. This is only compressing the files
that were changed since the last patch, not the entire game client, so larger
patches would take longer. If we changed to zopfli, it would take 2.2 _hours_.

That would put a serious dent in our agility. When we are trying to deploy a
new patch, adding 2.2 hours (or more) somewhere in the middle of it really
isn't viable. (It's already maddening when the rest of the process takes ~40
minutes with all the automated tests that we run).

So then lets look at the savings that this would buy us. We distributed about
30TB yesterday. Bandwidth costs 2c per gig. This would save us about 30
dollars a day.

So much for that!

~~~
csense
4GB is really pushing the boundaries of what you should be using HTTP for.
It's a nontrivial download for users with slower connection speeds, so you
should really have better failure recovery than a simple HTTP download
provides.

You should distribute it in pieces, or use bittorrent or rsync or some other
protocol which allows correcting a bad block in transit. Users should download
a small downloader program which speaks whatever protocol, then the downloader
downloads the actual game. (Think about the way Blizzard does WoW updates.)

Checking blocks should theoretically possible using HTTP transfers with byte
ranges, if you have a separate index file with checksums for each block. That
way you get failure recovery but serve your content from Really Efficient
Webservers like nginx or Amazon S3. Anyone know of any projects that do this?

You should also consider using a more modern compression algorithm like bz2 or
lzma, if you have control over the application running on both ends of the
connection. I'm guessing you'll see these results with Zopfli:

    
    
       gzip -c9 > zopfli -i1000 > bzip2 > bzip2 -c9 > xz   # compressed size
       gzip -c9 < bzip2 < bzip2 -c9 < xz, zopfli -i1000    # compression time
    

Not really sure about xz vs. zopfli for compression speed, but I'm sure that
xz's LZMA algorithm gets way better compression than gzip for most data, due
to fundamental limitations in the design of gzip's format.

The only real reason to use Zopfli is when the client only knows gzip
compression and you can't change it (for example, many, many people have
static webpages, CSS, JS, etc., which need to support several different web
browsers as clients).

~~~
Negitivefrags
I absolutely agree that improvements could be made to our patching system, so
here are a few more details as to why we do things how we do them.

1) It is actually working right now. This is the biggest deal at a startup.
It's good enough. We have lots of other features to work on that will give us
higher bang for our buck.

2) We wanted to be able to use an "off the shelf" CDN for distribution. It's
nice to be able to farm out parts of your infrastructure you don't want to
deal with.

3) We don't want to have to deal with versions in the patching system. Our
patching systems first use is right in our office. The same deployment
infrastructure that deploys the game to production runs in our office with
every commit from a programmer or artist. When someone runs the game client
(or one of the in house tools) the patching system syncs up the files to the
latest version.

For real production clients, there can be several versions between the last
time someone ran the game client and the next time. It's easier conceptually
to deal with.

4) The client is over 100,000 files. Each file is generally quite small which
limits the damage of not using an actual diffing solution. The larger files
like textures and audio don't diff very well anyway when artists make changes
due to the nature of the data formats.

5) It was really fast to develop. It's just libcurl. Once again, we are a
small startup so we have lots of things we need to be focusing on that are not
our patching system.

All the files are checked against hashes after download so we can correct
errors. This is especially important due to the surprisingly large number of
buggy caching proxies that some users seem to have in front of them.

One day I would love to make the best possible patching system. Unfortunately
it's just not as high on the list as so many of the other things we need to be
doing!

~~~
glomph
From a user perspective I wish you guys would switch to something better like
bit torrent (at least offer an alternative torrent download).

I almost gave up before playing the game when an interrupted download meant
that I had to start again, and it is pretty frustrating not even maxing out my
slow internet connection at peak times for patches.

The fact that you don't have any kind of diffing solution seems like absoulte
madness to me.

~~~
Negitivefrags
You don't have to start again. It's just that the progress bar starts from
zero again. If you download 2GB of 4GB initially then start again, when you
restart it will show 0% of 2GB to download.

It's effectively a diffing system where the block size is on average the size
of a file (about 42kb).

------
rtkwe
> Due to the amount of CPU time required — 2 to 3 orders of magnitude more
> than zlib at maximum quality — Zopfli is best suited for applications where
> data is compressed once and sent over a network many times, for example,
> static content for the web.

So it's not a magic bullet, and only results in 3-8 percent shrink compared
the zlib at maximum compression. It is interesting that it is able to be
decompressed by existing algorithms, which means it can be rolled out
immediately instead of waiting for it to be adopted by enough browsers to make
the compression time investment useful.

Google's blog post is here:
[http://googledevelopers.blogspot.com.es/2013/02/compress-
dat...](http://googledevelopers.blogspot.com.es/2013/02/compress-data-more-
densely-with-zopfli.html)

~~~
gillianseed
Well it uses the same deflate algorithm which is used by zlib, gzip etc.

There are many much more capable compression algorithms which will get much
better results with alot less cpu time (lzma comes to mind), however the big
deal is that deflate is supported by all browsers and thus it is a standarized
means of compressing content.

With that, being able to squeeze out 3-8% more compression for compressed web
content can likely be attractive if it is a decent part of your bandwidth
usage.

------
pilif
Nitpicking: the thing should be called Zöpfli (note the umlaut). The
diminutive in Swiss German often also affects vocals in addition to adding the
li suffix.

A Zöpfli is a small Zopf which is the bread the algorithm is named after. To
make matters worse: in the process of being diminished, the word even changes
gender from masculine to neutral at least in the dialect I use. Pro tip: never
waste your time learning Swiss German unless you know it by growing up with it
:p

I can't really add a lot from the technical perspective aside if the fact that
its really cool to be able to invest more time into compressing such that
decompression still uses the unaltered old decodes, requires no more time (to
uncompress) and still produces significantly smaller compressed data.

~~~
csense
> the thing should be called Zöpfli (note the umlaut)

No, it shouldn't.

A few years ago I downloaded a tarball of Java code. Every file had one of
those "ö" characters, because the author spelled his name that way, and his
name was in a comment at the top of every file as part of his license notice.

The project would not compile.

I tried compiling with Eclipse. I tried the included Ant script. I tried
compiling directly by typing a javac invocation into the command line. I tried
every permutation of the locale environment variables that Google knew about.
My environment was very conservative, an official Sun Java toolchain from apt
on a recent Ubuntu.

Finally, I wrote a script to delete that line from every file. It compiled
just fine after that!

Putting anything but ASCII in source code will give everyone who ever tries to
build the code endless nightmares of toolchain breakage.

~~~
X-Istence
So what you are saying is that Java was broken. Not that using characters
containing an umlaut is a terrible idea.

The world is no longer ASCII, unicode is here to stay. I don't see that
disappearing anytime soon, get used to it.

Speaking of which, code I write in Python that deals with ØMQ has the Ø in it
... i've found some editors that don't deal with it correctly, but that is the
fault of the editor, not the file.

~~~
csense
> The world is no longer ASCII, unicode is here to stay.

I wouldn't have a problem with it, if Everything Just Worked. If everything
international was UTF-8 and Just Worked, I'd be cool with that. But:

1\. > some editors that don't deal with it correctly

2\. You need fonts installed, otherwise you get boxes everywhere.

3\. You need to specify locale somewhere. This is currently sometimes manual:
Applications can't automatically agree on locale, because not everybody uses
UTF-8, and not every format that can be in multiple locales puts it somewhere.

4\. Linux insists on "generating locales." Now _that's_ a mess.

Now people who want to show how edgy they are by insisting on using the empty
set symbol are wasting my time, because when I install $COOLAPP that depends
on $AWESOMELIB which requires $YOURPYTHONZEROMQLIB, then I discover after
hours of Googling cryptic error messages and browsing multiple forums for
these different projects (plus the OS(es) and programming language(s)
involved), that your Python was compiled with wide-character support, and I
have to bootstrap a wide-character-compatible gcc in order to build wide-
character-compatible Python that can run $COOLAPP, but I have to also create
more locales and change the LC_ALL environment variable, which (when I finally
do it) will of course cause another obscure error message, and it turns out
that $OTHERCOMPONENT really doesn't want to use wide characters because it
only knows UTF-8, and there's really no way to have multiple processes in the
same address space because different character widths can only run in
different chroots because they're like amd64 and i386 before Ubuntu grew
multilib support...

I haven't had this exact breakage, but the point is that this is the sort of
rabbit hole you can find yourself falling down, all because you wanted two
dots over your 'o', or a slash through it, just to show how edgy and hip and
modern you are.

Fortunately you have contact information in your profile, so when your code
does this and I end up wasting 10 hours of my life with this garbage, I know
where to send the invoice.

------
ck2
Deja vu - this was just on HN 48 hours ago.

<http://news.ycombinator.com/item?id=5301688>

7zip does it in a fraction of the time within 100 bytes

    
    
       92629 bytes  jquery-1.9.1.min.js 
       31783 bytes  7z.exe -mx9 -tgzip a "jquery-1.9.1.min.js.gz" "jquery-1.9.1.min.js"
       31688 bytes  zopfli.exe -v --i1000 "jquery-1.9.1.min.js"
    
    

BTW if you need a windows binary, mingw compiles it without any trouble.

------
mmastrac
I've been wondering how this compares to straight-up AdvanceComp. I've used
that for pre-compressing gzipped files for a while now.

------
WestCoastJustin
You can browse the code here [1]. You can also find the pdf paper here [2].

[1] <https://code.google.com/p/zopfli/source/browse/>

[2]
[https://zopfli.googlecode.com/files/Data_compression_using_Z...](https://zopfli.googlecode.com/files/Data_compression_using_Zopfli.pdf)

------
jmspring
Compression is always a trade off between time, memory, and size. If zopfli
keeps compatibility with deflate, there will be those that will find a use for
it.

The poster that suggested zlib until zopfli would be ready has the right idea
where dynamic is coming in and awaiting compression.

------
allerratio
What would be nice is a library that you can load with LD_Preload and replaces
zlib compression. That way I could easily use e.g. Optipng with this new
algorithm.

~~~
lucian1900
I don't think that would be of much use, since only it only deals with
compression.

~~~
allerratio
It only needs to overwrite the compression methods

~~~
lucian1900
Right, but compressors are easy to change, it's decompressors that are in
hard-to-update software.

~~~
allerratio
Also the compression for example in libpng

------
mrinterweb
Compression advantage over zlib =~ 3-8% CPU time over zlib's maximum
compression =~ 20x-30x

Definitely would not want to use Zopfli compression on demand unless it is
going into a long-term cache.

~~~
pjscott
... Which is exactly Zopfli's intended purpose. Many things are read a lot
more often than they're written.

------
kunil
Do we really need a new compression algorithm to use compression with http?

~~~
lucian1900
It is the same format as gzip (deflate).

------
goggles99
_up to 8 percent smaller than zlib._

Always with the spin, cause if you say on average 3% people yawn even more
than they are yawning at 8%.

Wake me up when there is a stable/reproducible quantum parallelism (I don't
mean Schumacher compression either) means of compression, not a tweak in an
existing compression algorithm - Sorry to sound grumpy, but I just don't see
this as news worthy or anything to get exited about. Many of us did this with
similar results in grad school or in our spare time as a hobby. You can even
find some of them on SF or GH if you don't believe me.

This is one of those stories where something is dressed up to sound really
good and a couple of years later you wonder to yourself, "What ever happened
to that?" You go look it up and find a dead end, usually because the original
concept was over-hyped or unachievable.

After enough false optimism, the sane human will start to sour on
"breakthrough" news and become cynical - because "doing the same thing over
and over again and expecting different results" (Einstein) is the very
definition of insanity right?

