

Flywheel: Google's Data Compression Proxy for the Mobile Web - kid0m4n
http://research.google.com/pubs/pub43447.html

======
tyho
The paper states that although data use is reduced by 58% on average, the page
load time is increased by 6%. Smaller pages are typically slower to load
though Flywheel however large pages actually load faster.

They do lots of interesting things beyond gzip. They convert images to WebP
and scale the resolution of the image based upon the resolution of the host
device. They also minify CSS and HTML. What I noticed which I thought was
quite cool was they add ETag headers to improve the client's ability to cache
resources.

They rewrite the response headers to lower case to increase the effectiveness
of compression and of course everything is served over SPDY (except in special
cases where that fails)

~~~
AceJohnny2
> Smaller pages are typically slower to load though Flywheel however large
> pages actually load faster.

I was assuming that in this post-Web 2.0 internet, most webpages were heavy
(like >1MB), but if I read Figure 3 from the paper correctly, I see that
actually about 90% of webpages are <= 500kb.

Hm, considering metered data for many (most?) mobile plans, it makes sense to
save data for the user, but my initial assumption was that this would make
things faster for most people turns out wrong, as demonstrated by the median
(not average!) load time increasing by 6%, as you point out.

~~~
GeneralMayhem
>median (not average!)

Median is a type of average, and in this case is by far the most important
one.

------
denysonique
A compressing proxy is something fantastic for low bandwidth and/or high
latency mobile browsing.

Is the source code available so I can run the proxy on my own private server?

If not, here are some open source alternatives that I have used:

[http://en.wikipedia.org/wiki/Ziproxy](http://en.wikipedia.org/wiki/Ziproxy)

[https://wiki.mozilla.org/Mobile/Janus](https://wiki.mozilla.org/Mobile/Janus)
(written in Node.js)

~~~
13
I did this for a while, and then realized my use of SSL Everywhere combined
with a HTTP only compression proxy was self defeating. The only way to get
decent coverage is to have the proxy also bust SSL and then repackage it,
which is completely undesirable because you lose fine grained control over
certificates.

~~~
deweerdt
Provided you own the proxy, you let the proxy implement the certificate policy
you decide upon? You would then have the browser only trust the certs that the
proxy creates.

------
buro9
It is a pain that papers themselves
[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43447.pdf)
no longer carry the date of publication.

------
jamescun
Interesting they felt the need to add:

> The majority of Flywheel code is written in Go, a fact we mention only to
> dispel any remaining notion that Go is not a robust, production-ready
> language and runtime environment.

~~~
djhworld
It says in the paper that the system has been running for three years, maybe
this comment was written at time when Go was just hitting 1.0

Or they're affirming the fact that the system has been running successfully
for a long time, which is a good advert for Go as a production ready runtime.

~~~
nostrademons
Yeah, Flywheel was very much an early adopter of Go within Google and was
written at a time when many teams were uncertain about whether to switch.
Since then there've been a lot of commercial success stories both inside of
Google and out (Docker, Kubernetes, Cloudflare, everything that Sawzall used
to be used for inside Google, and a number of backend parts of YouTube), so
using Go isn't nearly as controversial. It was quite a leap when the Flywheel
work was done in 2011.

------
therealmarv
Using it even on desktop Chrome while using a data limited surfstick
[https://twitter.com/therealmarv/status/592303381093425152](https://twitter.com/therealmarv/status/592303381093425152)
Chrome extension: [https://chrome.google.com/webstore/detail/data-
compression-p...](https://chrome.google.com/webstore/detail/data-compression-
proxy/ajfiodhbiellfpcjjedhmmmpeeaebmep?hl=de)

~~~
motoboi
Official (beta) data saver extension from Google itself:
[https://chrome.google.com/webstore/detail/data-saver-
beta/pf...](https://chrome.google.com/webstore/detail/data-saver-
beta/pfmgfdlgomnbgkofeojodiodmgpgmkac)

Google Proxy is built in desktop Chrome for a while, this extension just
enables it (and show some statistics).

Interestingly, this extension is the only one Chrome allows to use the API
(dataReductionProxy permission on manifest).

~~~
therealmarv
Thanks for pointing to the official beta from Google itself. Did not know it
existed :)

------
fsiefken
I think this is comparable to Opera Turbo, which also runs on iOS and Android.
Opera Mini stripping gives even better compression but breaks js apps /
complex sites. There is a russion reverse engineered firefox plugin which
enables Opera Mini support on desktop operating systems. The Opera Mini server
is not commercially available or open source las time I checked. I wonder if
and how a PPM based compression proxy would be doable and if it improves
existing browser compression support like opera turbo, mini or the google
chrome solution.

I once was somewhere where there was only an abolutely minimal wifi connection
with a latency and very low bandwith ( < 1 KB/s. Opera mini was still slow but
worked wonders, where a regular browser was basicly unusable.

~~~
motoboi
From the paper:

> The proxy service with the closest design to ours is Opera Turbo [9].
> Although Opera has not published the details of their optimizations or
> operation, we performed a point comparison of Flywheel and Turbo’s data
> reduc- tion gains, and found that Flywheel provides comparable data
> reduction.

~~~
niutech
Here is the comparison between Flywheel, Opera Turbo and Mozilla Janus:
[http://browsingthenet.blogspot.com/2014/09/chrome-data-
compr...](http://browsingthenet.blogspot.com/2014/09/chrome-data-compression-
proxy-vs-mozilla-janus-vs-opera-turbo.html)

------
xnull2guest
50% savings is not enough, for me, to justify the MITM. I would rather have
data sovereignty than data bandwidth at this tradeoff.

~~~
cbhl
The great thing is, even if you use Chrome, you can turn "Data Saver" off by
going into Settings on Chrome for Android/iOS.

(I'm biased, but I'd personally rather have Google MITM me than a mobile
carrier like Verizon or T-mobile.)

~~~
rjvs
You seem to be conflating the choice; really, it's a choice of adding Google
to the list of MITM or not. This system can't remove your mobile carrier from
the circuit or stop them playing with your data.

~~~
mdwelsh
This is not strictly true. In most cases, Chrome uses an encrypted (HTTP/2)
connection to the Flywheel proxy which would bypass by ISP-side middleboxes.
However, some carriers are downgrading the Flywheel connection to HTTP for the
purposes of implementing adult content filtering; see
[https://support.google.com/chrome/answer/3517349](https://support.google.com/chrome/answer/3517349).
So we do "remove the mobile carrier from the circuit" in most cases.

------
motoboi
I suppose the paper would show a substancial latency reduction would they
select high latency networks for sampling.

I guess that on a satellite network (~600ms), latency would be greatly reduced
as there is only one (and persistent) TCP connection to the proxy, with HTTP2
(or SPDY) over that (using http2 multiplexing capabilities).

This one tcp connection will be fully established and with a full tcp-window
basically the whole time. The TLS with be fully negotiated and setup. Leaving
the client with an optimized tunnel to google servers.

Considering that most latency on the path is on the satellite part, and google
servers are on the other side of it, tcp handshakes and ssl setups to
destination servers will occur on that low-latency side. Google will just push
those optimized content over the tunnel to clients.

Looks like paradise.

TL,DR: Satellite clients won't open 4 connections to servers and wait for 4
tcp handshakes and slowly open those 4 connections tcp-windows for every site
visited. They will open one connection to google server (on the other side of
satellite) and let google do the hard work. High-latency paradise!

------
falcolas
Seems remarkably similar to what Opera was doing with the Opera Mini, though
not quite as intrusive.

I remember really appreciating how quickly pages rendered on that browser,
compared to the stock chrome when I was using android.

~~~
andreastt
It seems more similar in nature to Opera Turbo, which does a subset of the
things described here.

------
bootload
_" HTTP proxy service that extends the life of mobile data plans by
compressing responses in-flight between origin servers and client browsers."_

so this is part of the google mobile strategy.

 _" Flywheel is integrated with the Chrome web browser and reduces the size of
proxied web pages by 50% for a median user."_

is there any implementation of this outside chrome, OS?

~~~
mdwelsh
I am the tech lead on the Flywheel proxy. The feature is currently launched
for Chrome on Android, iOS, and desktop (including ChromeOS). There is no plan
to implement this outside of Chrome.

~~~
bootload
_" The feature is currently launched for Chrome on Android, iOS, and desktop
(including ChromeOS)."_

thx Matt, reading for background: 'Making the mobile web fast' ~ [http://matt-
welsh.blogspot.com.au/2011/05/what-im-working-on...](http://matt-
welsh.blogspot.com.au/2011/05/what-im-working-on-at-google-making.html)

------
pdknsk
> Flywheel compresses all text responses using GZip.

Since this is for Chrome only, I'm surprised they don't use the other possible
compression format: zlib (confusingly named deflate in the RFC). It's 12 bytes
less and uses a faster to compute ADLER32 checksum, compared to CRC32 in gzip.

~~~
grrowl
> For example, measurements of Flywheel’s workload show that 42% of HTML bytes
> on the web that would benefit from compression are uncompressed, > despite
> GZip being universally supported in modern web browsers [13]. This is in
> part because GZip is not enabled by default on most web servers, > yet only
> a single-line change to the server configuration is needed.

So really, it's a gzipping proxy written in Go. Hopefully they extend the
protocol to better compression in future.

~~~
mdwelsh
The proxy does much more than just gzip; this paragraph refers to only the
impact of gzip on the Web (which was surprising to us given how widespread we
assumed it would be).

------
djhworld
I went to an interview at a startup a year or so ago who were doing this, they
sold it as a service to corporations to help cut the cost of employees using
too much mobile data along with other security features etc

Interesting idea I guess.

------
allcentury
but what's their weisman score?

~~~
theunixbeard
Approaching the theoretical limit of 2.89 if I'm not mistaken.

------
jeltz
Are there still that many websites which has not enabled gzip to make this
kind of proxy worth it? Enabling gzip is trivial.

~~~
buro9
[http://www.bbc.co.uk/](http://www.bbc.co.uk/)

You'd be surprised how many sites still do not enable gzip.

~~~
jacquesm
If you're cpu bound it might make sense to not gzip. Also if you send a lot of
small files (in that case you probably would do better to change that so you
don't need to send many small files but in general a single web page requests
10's if not 100's of small resources these days).

~~~
buro9
BBC site headers show heavy use of cache, they could have easily cached the
compressed version and VARY'd on the ACCEPTability preference of the user-
agent to handle the compressed version.

~~~
acdha
That's how it should work but until relatively recently it had some annoying
hitches with clients which claimed to support gzip but were buggy[1], which
meant you had to maintain a list of clients to never enable compression for.
One of those clients is IE6 so I'm not terribly surprised that they put it off
– for a site as widely visited as the BBC, even a small percentage of visitors
means a potentially large number of complaints.

I still don't think that excuses not implementing it by now but I'd bet the
explanation starts with some engineer having a bad week and not wanting to
relive the experience.

1\. Some major CDNs and caching proxies like nginx also haven't bothered to
implement Vary but that doesn't matter for this particular scenario since they
don't appear to be using any of them.

------
manigandham
For similar products today: CloudFlare and Instart Logic have capabilities to
do this.

------
maximveksler
Sure does feel like [https://www.onavo.com](https://www.onavo.com) was at
least taken as inspiration for designing this system.

------
dheera
Or is this simply a service to spy on what people are browsing so that they
can improve PageRank?

~~~
xnull2guest
Not simply, but yes it will be used for that. And due to the third party
doctrine it is information that is valid to request for Total Information
Awareness.

~~~
mdwelsh
I am the tech lead on the Flywheel proxy. We do not use the data passing
through the proxy for any purpose other than compression. This is covered by
the Chrome privacy policy which you can read here:

[https://www.google.com/chrome/browser/privacy/](https://www.google.com/chrome/browser/privacy/)

~~~
xnull2guest
I hope this is and remains to be the case, but it is not in your control.
Decisions to include this information into personalization of search or access
to law enforcement is not a technical decision and it is something that will
be considered in the future after the feature has been broadly adopted.
Similarly, any EULA or privacy terms are subject to change.

If you did not serve up this information to request even though you had access
to it, your company WOULD be breaking the law - this has already been settled
in court. It is my conclusion that, as law abiding citizens and company, you
would serve such requests. I also tangentially believe that if you felt the
information would be useful to create a better product (and/or make more
money) you would do this, as you are also compelled by law to maximize profit
for shareholders and are incentivized do so to by financial compensation. I
see no reason why your company would break the law to restrict the scope of
this feature.

Furthermore, I can not tell where it says on the privacy policy that you do
not or will not use this information for personalization of search or in
response to compulsion of law. I saw a mention in the privacy policy of the
compression proxy, but nothing relevant to Google not using the information or
being able to provide it to a third party upon compulsion. Could you point me
to that?

~~~
mdwelsh
This line: "You do not need to provide any personally identifying information
in order to use Chrome" is intended to cover pretty much all of the cases you
describe. I'm not going to speculate on legal issues.

~~~
xnull2guest
I read that line a very different way and on its face it does mean something
very different. The (very good) lawyers at Google may want to look at how this
is phrased if it is intended to mean more than what it says plainly.

No need to speculate as appellate courts (and Google's own recent trials) have
made the law quite clear.

