
A new compression algorithm to make Google Chrome updates small - l0stman
http://blog.chromium.org/2009/07/smaller-is-faster-and-safer-too.html
======
bkz
The code has been available in their public SVN repository for quite some
time:

<http://src.chromium.org/viewvc/chrome/trunk/src/courgette/>

BTW, I've read about 40% of the Chrome code and I have to say it is the single
best open-source project I've come to study so far. Incredibly clean code,
really shows how much code reviews done properly can benefit project in the
long run.

~~~
javanix
Well, the "cleanliness" is probably because it hasn't been open-source for too
long - most of the original work was done by a dedicated team of software
engineers most likely operating under fairly strict code guidelines.

~~~
bluefish
As someone who works on a project with a team of dedicated engineers under
fairly strict code guidelines including code reviews, I can tell you from
experience none of these factors have to do with "code cleanliness" (as ours
is not, at least in my opinion).

------
scott_s
bsdiff was written by a HN regular, and I can't think of a better person to
provide insight on this. (Hint.)

~~~
cperciva
Did someone summon me?

I don't know all the details behind Courgette, but it sounds very similar to
Exediff. In my thesis I show that bsdiff (well, a slightly improved version of
bsdiff which I never got around to releasing publicly) performed on par with
or slightly better than Exediff; so I'm surprised that Courgette is getting
this far ahead of bsdiff.

Based on the numbers they've published (a 10 line source code patch to a 10MB
executable resulting in a 700kB bsdiff patch), it sounds like there's
something weird going on which is breaking bsdiff -- the FreeBSD kernel is
roughly the same size as Chrome, but normally for a small (e.g., 10 line)
source code patch I would see a bsdiff patch of between 50kB and 100kB.

Beyond that, I really can't offer any more insight -- unless someone wants to
hire me to spend a week looking at the old and new binaries and figuring out
where bsdiff is going wrong. :-)

------
herf
Does this generalize to a moving "fixed offset" in the compressor? i.e. can't
it be done without the disassembler? (Assuming they're parsing x86?)

x86 is variable length, so this isn't trivial, but surely it is possible. I
guess I don't like the "assembler" step because, hmm, SSE7 might not look like
x86.

Also, if you were willing to have the loader to do the work, it sounds like
you could do a -fPIC sort of thing, and load most code as a DLL.

Very cool that they got it working, though.

------
req2
Without engaging conspiracy theories, I think the most interesting part is
here:

The small size in combination with Google Chrome's silent update means we can
update as often as necessary to keep users safe.

[An extra tidbit- courgette is a 'summer squash'. Silly Google...]

~~~
mquander
I personally found an order-of-magnitude increase in the compression ratio for
their executables to be more interesting than that.

------
mace
Technically, it's not a new compression algorithm. It's just a more efficient
update scheme which uses a better suited binary diffing algorithm.

~~~
cperciva
What you call "binary diffing" is called "delta compression" by people in the
field -- and delta compression is considered to be a type of compression (just
like image compression, audio compression, video compression, et cetera).

~~~
DarkShikari
Perhaps what he meant is that it's not a new _entropy coding_ algorithm; it's
a new prediction algorithm that feeds data to an old entropy coding algorithm.
Overall, it's still a new compression scheme though.

------
aminuit
It's a very impressive achievment, but I'm not sure I buy their motivation for
doing the work. 70k vs 700k? It's a blink of an eye on today's Internet.
Sounds more like a very bright engineer wanted to work on a cool project, then
reverse reasoned his way back to this "smaller is faster" idea.

~~~
txxxxd
"If the update is a tenth of the size, we can push ten times as many per unit
of bandwidth. We have enough users that this means more users will be
protected earlier."

~~~
aminuit
Yes I read the article as well. The issue is that Google probably has as close
as you can get to an infinite amount of bandwidth, which makes this sort of a
low priority item. I don't want to trivialize the technology; it's obviously
very cool stuff, but it's the solution to a problem I doubt they ever had.

"Hi everyone, welcome to the Monday morning Chrome staff meeting. Um... so
we've gotten some push back from management. They -- guys you're not going to
believe this -- they think we might be using too much bandwidth. I tried to
make a snarky joke about YouTube, but they weren't having any of it. So, um...
we're gonna go ahead and pull two or three guys from the [INSERT IMPORTANT
BROWSER COMPONENT THAT WILL LEAD TO A BETTER BROWSER HERE] team and have them
work on making the updates smaller."

