
Intern Impact: Brotli compression for Play Store app downloads - abhikandoi2000
https://students.googleblog.com/2017/02/intern-impact-brotli-compression-for.html
======
cdnsteve
NGINX:
[https://github.com/google/ngx_brotli](https://github.com/google/ngx_brotli)

Apache: [https://lyncd.com/2015/11/brotli-support-
apache/](https://lyncd.com/2015/11/brotli-support-apache/)

Node.js: [https://hacks.mozilla.org/2015/11/better-than-gzip-
compressi...](https://hacks.mozilla.org/2015/11/better-than-gzip-compression-
with-brotli/)

------
orliesaurus
So wait, if I understand this article correctly she applied the compression
because someone told her to? Or did she research herself and apllied the whole
thing? I agree this is a bit like a "we re hiring interns" post

~~~
arenaninja
I would be excited if I had taken an internship and someone had told me to do
this :)

~~~
orliesaurus
yeah sure nothing wrong with that - it's just that the article is a tad
clickbait

------
dajohnson89
The amount of negativity in the comments section here is astounding. How could
you not be excited and happy for this promising young woman's achievement? No,
her work will not put her on the shortlist for a Turing Award. But it is
something any engineer should be proud of, and has real impact for millions of
users.

You have a right to be unimpressed, but if you're taking the time to say "So
what?" or "This is just a recruiting ad" then you should probably rethink. I
never thought I'd say this, but the negativity here really indicates the kind
of latent discrimination that so many URMs & women in tech complain about. I
have literally no other explanation for it -- a senior engineer at Google
could have implemented this compression and it would still be HN worthy, and
nobody would be calling the blog article a fluffy PR piece.

~~~
djrogers
> a senior engineer at Google could have implemented this compression and it
> would still be HN worthy,

No, I doubt it would be. How many of the hundreds of little features in the
google play store have been posted on HN with an article about the person who
implemented them?

Also, I find it more than a little presumptuous of you to assume that any
scoffing is due to sexism. I see the exact same cynicism and lack of awe in
the posts below that I have come to expect from HN - regardless of gender or
color of the person involved.

~~~
dajohnson89
1.5 petabytes of savings in data usage per day is not HN worthy? I've seen far
less significant improvements get voted to the front page with hundreds of
comments. This is more than a small tweak to the Play store.

Agreed on the high level of cynicism here -- we've also come to expect that.
Moreover I _never_ come to HN (my account is five years old) to point out
sexism/racism -- it's just too sensitive and difficult of a subject and to be
honest I'd rather just read/talk about topical things without getting
political. But again, what I pointed out is stark and I have no other
explanation for it (believe me, I want one).

Let me point out the title of the blog: "Google Student Blog: Google news and
updates especially for students". Of course there is PR going on, and of
course the achievements of an intern will often be on a smaller scale. But
this particular achievement is high-impact and the intern deserves credit on
that blog for her work. If you're not impressed, then just move on.

~~~
ThrowawayR2
> _I have no other explanation for it (believe me, I want one)_

Um, the fact that that her work saved that much bandwidth is a happy accident
of the fact that she was at Google and assigned on a project that had a high
user volume. Other than passing the intern interview loop, that took no
absolutely no merit or effort on her part whatsoever.

Moreover, the blog post describes the work as "to add support for Brotli for
both new app installs and app update." I mean, hundreds of thousands of
developers use third-party libraries to add functionality every single day of
the year. Some "achievement".

Please take off the X-ism colored glasses, dajohnson89. I promise that it
makes the world look like a better place.

~~~
skybrian
It's of course true that at Google you have a chance to make a bigger impact.
(Not guaranteed - it depends on what project the intern is given, and that's
kind of random.) And interns do get lots of support.

But, to say that there is "no merit or effort on her part" is an insult to all
the good work interns do while they're here. They're not coasting.

Seems like you're so eager to tear this down that you'll say anything.

~~~
ThrowawayR2
Given the fact that your first paragraph says the same thing my first
paragraph does, it's unclear to me what point you're trying to make.

The intern in question had no part in making Google the size it is and was
most likely not offered much choice in the way of team or project assigned, so
no merit or effort was involved in either of the two. Which part would you
like to dispute?

~~~
skybrian
I dispute the part where you say she shouldn't get any credit for her work
because she did the work at Google.

If you're going to make that argument, nobody at Google deserves any credit
for anything we do. That's not how we normally measure impact. We all stand on
the shoulders of giants, but putting those resources to work effectively still
counts.

------
jdcarter
> her work resulted in saving users an expected 1.5 petabytes (that's 1.5
> million gigabytes) of data each day.

I'm guessing this is not a measure of data at rest, but data transferred over
the network. The couple samples listed on the page ranged from 2.5%
improvement to 20.3% (vs. zLib) so I guess they're extrapolating that out to
all app downloads and updates across the world. Nicely done.

More generally, we've seen some great advances in compression lately. I've
been using Facebook's zStandard [1] for compression in a product I'm currently
working on, and I've been extremely pleased with both its speed and
compression ratio. The days of "just use zLib" are coming to a close.

[1]: [https://github.com/facebook/zstd](https://github.com/facebook/zstd)

~~~
rdtsc
Are you worried at all about their patents stance. I currently I think it says
if you litigate with Facebook you lose the license. Otherwise I agree zstd is
looking like a very nice improvement in an area where most people think
nothing happens. I especially dictionary compression bit.

~~~
jontro
Looks like they're using the BSD license in this project,
[https://github.com/facebook/zstd/blob/dev/LICENSE](https://github.com/facebook/zstd/blob/dev/LICENSE)

So no need to worry about the patent clause in this case right?

~~~
nitrogen
AIUI BSD only covers copyright, unlike Apache and GPL3 for example.

------
arenaninja
Pretty cool that an intern was given this level of confidence. Less data for
updating/installing applications is good no matter how you slice it

------
mbesto
I've worked with a fair number of people that graduated from the Mathematics
and Informatics at Babeș-Bolyai University. I'm generally very impressed by
them, and is just another data point of areas of that world that get
overlooked.

------
Syzygies
Can we get her to work for DropBox? Every time my iPad GoodReader syncs my
1,000+ papers, it has to check every file. It boggles the mind that they don't
support some version of change records.

~~~
falloutx
I don't whats the config of your computer, but Dropbox works like a charm for
me. I have more than 600 gigs of data synced btwn Dropbox and my computer and
it works pretty nicely. I never had to manually check whether a file has been
transferred or not.

------
bhouston
I bet switching to LZMA would have saved even more. LZMA beats Brotli nearly
every time. zStandard would likely have worked better as well. Brotli is very
slow to compress.

~~~
Someone1234
That doesn't appear to be true:

[https://cran.r-project.org/web/packages/brotli/vignettes/bro...](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf)

I'm sure you can use those results to argue that LZMA is superior in some way
(e.g. compression speed) but it definitely isn't clear cut superior in other
important ways (compressed size and decompression speed are inferior).

I can see why, given those results, that they would use Brotli over LZMA.

~~~
bhouston
The independent tests I did here:
[https://github.com/google/brotli/issues/165](https://github.com/google/brotli/issues/165)

And also those here: [https://www.percona.com/blog/2016/03/09/evaluating-
database-...](https://www.percona.com/blog/2016/03/09/evaluating-database-
compression-methods/)

Suggest that LZMA compresses better than Brotli except in the case of text
documents.

------
okreallywtf
There isn't much information but this reads more like an advertisement for
google internships than anything else. Not to denigrate her work, she could
very well be brilliant and have gone above and beyond, but from how it reads
they could be blowing it up to make it seem like every intern has a huge
impact and you could too! Either way good for her, but not sure why this is so
high up on HN.

~~~
bobdole1234
It's almost like Google might be recruiting.

~~~
whyileft
Yup. Voting on HN is very suspect in general and easy to manipulate because
the votes are hidden.

~~~
grzm
If you've got concerns about voting rings or brigades or other irregularities,
your best option is to contact the mods via the Contact link in the footer.

------
iamleppert
This compression technique seems to be based on the fact they have previous
installation of an app that can be diffed and patched, so it wouldn't receive
any benefit from first installations, only updates. But still might be worth
it for many applications. I remember I investigated a way to send and apply
diffs of javascript code (using a js version of patch) and store in the
browser using localstorage. However, at the time the performance wasn't good
enough when compared in an end to end benchmark.

However, this has got me wondering as a general corollary for application
delivery...would it just make more sense to use something like a well-pruned
and compact git repo, and make the connections over HTTP with gzip
compression? I'm not sure how space efficient the git repo is but may seem
like an interesting project. I'm wary of using any Google technology, open
source or not if it can be done yourself in an afternoon.

Does such a thing even exist?

~~~
captn3m0
What you want is
[http://www.daemonology.net/bsdiff/](http://www.daemonology.net/bsdiff/)

All objects in git packfiles are already compressed, so you aren't gaining
much by adding another layer of compression.

------
jknoepfler
The phrasing makes it clear that this is not intended to wow a tech audience.
It's a Google ad to parents, or something.

------
Yuioup
Mathematicians are the true programmers. I wish I was one.

~~~
treehau5
Even mathematicians rely on abstractions every day to do their jobs. Don't
sell yourself short.

~~~
chanandler_bong
[https://xkcd.com/435/](https://xkcd.com/435/)

------
jedc
"Google Student Blog" // "Google news and updates especially for students"

Important context for this blog post and the comments in this thread.

------
bluedino
On the other end of the spectrum, how much more energy has been used by the
millions of Android phones uncompressing the app, applying the patch, and re-
compressing the data?

------
bricss
Take a look at LZ5 algorithm ->
[https://github.com/inikep/lz5](https://github.com/inikep/lz5)

------
sp332
This page is consistently crashing my Firefox content process. I'm running
51.0.1 64-bit on Win10. Anyone else having this problem?

~~~
gcp
Look in about:crashes, you can click the links and see if there is already a
bug filed for it. Given that you're on release and this is a Google blog
rather than an obscure site, it's unlikely to be a problem in Firefox itself.
Check gfx drivers, plugins, etc.

------
jfasi
I've seen replies about how this is a "simple library swap" and so doesn't
deserve the attention it recognition it has received. As some who works at
Google but not anywhere remotely near this project, but with experience in
similar projects, I'd like to shed some light on why this isn't a simple
library swap, and seems from far away to have been both a tremendous
accomplishment and a wonderful learning experience.

First off, there is no such thing as a library swap at Google. Our codebase is
quite large. Like shockingly overwhelmingly large. Executing a change like
this is almost certainly not a case of "swapping out one configuration line
for another." It requires writing new code, testing it appropriately, updating
any integration tests, updating documentation, etc. But the real fun starts
when you're done coding...

There's the issue of frontend and backend. Serving Brotli-compressed data is
great, but what if you're app doesn't support it? If you're lucky, this will
be handled by the underlying network layer but then you have to deal with...

Rollout. I don't know how many servers are dedicated to app updates, but I
imagine it's a lot. I also imagine they're distributed geographically, across
regions and probably even continents. Getting all those servers to support new
features is a delicate, time consuming process where any misstep _will_ result
in users noticing. It's not coding, but that's why it's called "software
engineering" and not "coding engineering." But then once you're servers are
all up and running you have to deal with...

Versioning. Updating backend servers is bad enough, but at least you control
them. What about that zoo of Android versions out in the wild? How do you
ensure they all support this changes? Short answer: you don't. You design a
strategy that will allow the rollout to happen gradually over a period of
time, and closely monitor it to make sure nothing unintended is happening.

Then how do you turn down the old feature? When do you turn it down? You need
to build and properly use instrumentation to determine the safest time to kill
off the old feature. Or you could never kill it and commit to paying the cost
in perpetuity. That's a design decision, and not a trivial one.

But, odds are you're not the only feature being rolled out. You have to
anticipate/deal with potential interactions with other features, rollbacks of
other people's work, etc.

I could go on, but I think I've already demonstrated why this is by no means a
trivial accomplishment, even for a full time engineer. Add to this the fact
that every intern has to race against the clock to get ramped up on their
project, making something of this complexity and with this large an impact
happen deserves applause.

I should add, I'm speaking as myself here and not representing Google in any
way.

------
MtL
Makes you wonder how much they'd save by using Courgette, like the Chrome team
does.

------
mnml_
thats like 50 million dollars a year (in egress cost)

------
jordache
she didn't create a compression algorithm.

More akin to enabling GZIP in IIS...

~~~
16bytes
If you had an intern that was responsible for turning GZIP on in IIS for a
website that had 1B users it starts to become much more of an accomplishment.

Even small changes at that scale require careful analysis and coordination.

~~~
jordache
true.

However my response was more to the click baity title giving the automatic
impression an intern came up with an innovative approach that netted
tremendous result.

------
jonatron
1.5M GB = 1.5 PB?

------
PedroBatista
Please fix the title, its 1.5 PB, not GB

~~~
sp332
It says 1.5M GB == 1,5000,000 GB == 1.5 PB

~~~
Someone1234
That's confusing.

To use an analogy, imagine if someone wrote: "$1.5M B" instead of 1.5
quadrillion or 1,500,000,000,000,000. You'd be confused, and rightly so. A lot
of people would mistakenly read it either as $1.5M or $1.5B, neither of which
is right.

In this case a lot of people are misreading it as 1.5 GB/Day instead of 1.5
PB/Day.

PS - The way Google uses it in the Blog post is pretty clear, they're
describing what a petabyte is. My issue is with the HN title only.

~~~
leephillips
I'm sure I'm strange, but I find "$1.5M B" clearer than talk of
"quadrillion"s, because I'm not confident I know what power of 10 a
quadrillion is, but when I see M B I just add 6+9. Even then, the British
sometimes say things like "thousand million", because billion over there used
to mean 10^12, but now means 10^9 (which they used to call a milliard); as
with other wordy number-words they've succumbed to American usage¹.

¹[https://en.oxforddictionaries.com/explore/how-many-is-a-
bill...](https://en.oxforddictionaries.com/explore/how-many-is-a-billion)

------
demonshalo
So she did this for free? :D please tell me this is a paid internship!

~~~
sp332
Unpaid internships are illegal, if the intern is doing real work for the
company.
[https://www.dol.gov/whd/regs/compliance/whdfs71.htm](https://www.dol.gov/whd/regs/compliance/whdfs71.htm)

~~~
fs111
Newsflash: There are more countries than the US of A and your laws do not
apply here.

~~~
sp332
Yeah but Google is mostly a US company so it was an easy slip to make.

------
diimdeep
Click bait.

~~~
adtac
No it's not. That's literally what I would summarise the entire article as:
Google Intern's work saves over 1.5M GB per day.

------
divbit
I don't know much about gigabytes, but that seems like a lot

edit: (I'm guessing the downvotes are because I phrased it like a meme, but to
clarify, this was a genuine compliment in response to a 'look what this person
did' type post- it's inspiring stuff)

------
painted
So she used a compression algorithm developed by other googlers? So what?

Don't get me wrong, I'm sure she did a lot of work for it, but looks like a
lot of people would have been able to do that, there is nothing innovative in
what she did, right?

~~~
geodel
I think it is mostly young IT worker recruitment ad, and also a 'Women in
technology' angle.

I personally wouldn't be too happy if my modest contributions as novice
described as 'massive improvement for millions of our users'.

~~~
softawre
You wouldn't be happy if somebody gave you a virtual high five for your work
when you were early career?

~~~
GunlogAlm
I'd be happy, but happier if it came without the hyperbole (personally).

------
4twilight
Why she doesn't wear a Pied Piper t-shirt? However, I'm more interested if the
Erick's position is still vacant in the venture? (Jin Yang's will work as well
for first 2 years, I suppose).

