Hacker News new | comments | show | ask | jobs | submit login
Flywheel: Google's Data Compression Proxy for the Mobile Web (research.google.com)
125 points by kid0m4n 963 days ago | hide | past | web | favorite | 60 comments



The paper states that although data use is reduced by 58% on average, the page load time is increased by 6%. Smaller pages are typically slower to load though Flywheel however large pages actually load faster.

They do lots of interesting things beyond gzip. They convert images to WebP and scale the resolution of the image based upon the resolution of the host device. They also minify CSS and HTML. What I noticed which I thought was quite cool was they add ETag headers to improve the client's ability to cache resources.

They rewrite the response headers to lower case to increase the effectiveness of compression and of course everything is served over SPDY (except in special cases where that fails)


> Smaller pages are typically slower to load though Flywheel however large pages actually load faster.

I was assuming that in this post-Web 2.0 internet, most webpages were heavy (like >1MB), but if I read Figure 3 from the paper correctly, I see that actually about 90% of webpages are <= 500kb.

Hm, considering metered data for many (most?) mobile plans, it makes sense to save data for the user, but my initial assumption was that this would make things faster for most people turns out wrong, as demonstrated by the median (not average!) load time increasing by 6%, as you point out.


>median (not average!)

Median is a type of average, and in this case is by far the most important one.


So basically the PageSpeed optimizations?


I'm the tech lead for the Flywheel proxy. The original version of Flywheel actually started with PageSpeed (with a bunch of customizations to focus on compression). You can indeed emulate Flywheel's optimizations with the appropriate configuration options for PageSpeed. We ended up rewriting Flywheel (in Go) in part because we didn't need all of the complexity that PageSpeed provides, but also to streamline the process of running the service in Google's datacenters, rather than in an Apache or NGINX environment.


A compressing proxy is something fantastic for low bandwidth and/or high latency mobile browsing.

Is the source code available so I can run the proxy on my own private server?

If not, here are some open source alternatives that I have used:

http://en.wikipedia.org/wiki/Ziproxy

https://wiki.mozilla.org/Mobile/Janus (written in Node.js)


I couldn't find any compression proxy with man in the middle support so I hacked something together a while ago for personal use:

https://github.com/barnacs/compy

If anyone got motivated by the paper or feels like improving it, contributions are welcome.


I did this for a while, and then realized my use of SSL Everywhere combined with a HTTP only compression proxy was self defeating. The only way to get decent coverage is to have the proxy also bust SSL and then repackage it, which is completely undesirable because you lose fine grained control over certificates.


Provided you own the proxy, you let the proxy implement the certificate policy you decide upon? You would then have the browser only trust the certs that the proxy creates.


I also want to build a self-host data completion proxy on my own. But the more web sites support SSL connection, the less impacts on the data completion in the future I think. WWW are moving toward the all encrypted internet, so is this kind of technologies a temporary stuff? Or is there a possibility to reduce data transfers under the SSL protocol?


You can mitm yourself with no problem. Firefox has the option to ignore pinned certificates if you have valid user installed certificate as trusted root.


More of a "best practices" proxy, if you will, than a compression proxy: https://developers.google.com/speed/pagespeed/module/filters


It is a pain that papers themselves https://static.googleusercontent.com/media/research.google.c... no longer carry the date of publication.


Interesting they felt the need to add:

> The majority of Flywheel code is written in Go, a fact we mention only to dispel any remaining notion that Go is not a robust, production-ready language and runtime environment.


It says in the paper that the system has been running for three years, maybe this comment was written at time when Go was just hitting 1.0

Or they're affirming the fact that the system has been running successfully for a long time, which is a good advert for Go as a production ready runtime.


Yeah, Flywheel was very much an early adopter of Go within Google and was written at a time when many teams were uncertain about whether to switch. Since then there've been a lot of commercial success stories both inside of Google and out (Docker, Kubernetes, Cloudflare, everything that Sawzall used to be used for inside Google, and a number of backend parts of YouTube), so using Go isn't nearly as controversial. It was quite a leap when the Flywheel work was done in 2011.


Using it even on desktop Chrome while using a data limited surfstick https://twitter.com/therealmarv/status/592303381093425152 Chrome extension: https://chrome.google.com/webstore/detail/data-compression-p...


Official (beta) data saver extension from Google itself: https://chrome.google.com/webstore/detail/data-saver-beta/pf...

Google Proxy is built in desktop Chrome for a while, this extension just enables it (and show some statistics).

Interestingly, this extension is the only one Chrome allows to use the API (dataReductionProxy permission on manifest).


Thanks for pointing to the official beta from Google itself. Did not know it existed :)


I think this is comparable to Opera Turbo, which also runs on iOS and Android. Opera Mini stripping gives even better compression but breaks js apps / complex sites. There is a russion reverse engineered firefox plugin which enables Opera Mini support on desktop operating systems. The Opera Mini server is not commercially available or open source las time I checked. I wonder if and how a PPM based compression proxy would be doable and if it improves existing browser compression support like opera turbo, mini or the google chrome solution.

I once was somewhere where there was only an abolutely minimal wifi connection with a latency and very low bandwith ( < 1 KB/s. Opera mini was still slow but worked wonders, where a regular browser was basicly unusable.


From the paper:

> The proxy service with the closest design to ours is Opera Turbo [9]. Although Opera has not published the details of their optimizations or operation, we performed a point comparison of Flywheel and Turbo’s data reduc- tion gains, and found that Flywheel provides comparable data reduction.


Here is the comparison between Flywheel, Opera Turbo and Mozilla Janus: http://browsingthenet.blogspot.com/2014/09/chrome-data-compr...


50% savings is not enough, for me, to justify the MITM. I would rather have data sovereignty than data bandwidth at this tradeoff.


I am the tech lead on the Flywheel proxy.

Flywheel does not MITM SSL connections; it does not proxy SSL at all. If you just mean "MITM" in a generic sense, the fact that you believe you have "data sovereignty" is interesting since we're talking about unencrypted HTTP here. Nearly all ISPs and mobile carriers undertake proxying and extensive analysis and manipulation of in-the-clear HTTP traffic. We agree that users need to opt into this feature -- since you have to trust Google to proxy your traffic, after all -- but it's important to keep in mind that many other parties on the path between you and a website already do transparently proxy your unencrypted traffic.


I do mean MITM in a generic sense.

Regarding the Third Party Doctrine, the fewer third parties with access to the information, the more sovereign over the data the remaining parties are. But I agree with you that ISPs and mobile carriers are other companies who intercept, profile, sell and partner away plaintext information - and indeed in the case of at least some mobile carriers encrypted communications too.

I agree with the assessment that this is an opt in feature. I have spoken about the reasons I won't be opting in. (In my opinion it is a very bad trade.)

Definitely agreed that many other parties have access to the data. I disagree that this is an argument to add another.

[By the way, thank you very much for taking the time to speak on HN about Flywheel from your position. :)]


I absolutely agree with your rationale for not wanting to opt in; we appreciate that this is a highly privacy-sensitive topic and one that users need to make up their own minds about.


The great thing is, even if you use Chrome, you can turn "Data Saver" off by going into Settings on Chrome for Android/iOS.

(I'm biased, but I'd personally rather have Google MITM me than a mobile carrier like Verizon or T-mobile.)


You seem to be conflating the choice; really, it's a choice of adding Google to the list of MITM or not. This system can't remove your mobile carrier from the circuit or stop them playing with your data.


This is not strictly true. In most cases, Chrome uses an encrypted (HTTP/2) connection to the Flywheel proxy which would bypass by ISP-side middleboxes. However, some carriers are downgrading the Flywheel connection to HTTP for the purposes of implementing adult content filtering; see https://support.google.com/chrome/answer/3517349. So we do "remove the mobile carrier from the circuit" in most cases.


Why do you trust an advertising company more than a telecom?


For me, it is because there is an expectation that my data will pass through the telecom. Not only is it expected, but as the end user I have little choice in the matter. If Eve wanted to inspect my traffic, that is a one-stop shop.

Flywheel ends up masquerading some of that traffic, so if for no other reason, it is atypical. I also perceive that it would be in Google's best interest not to abuse that privilege since advertisements are how they make their money. If they abuse that privilege, consumers will go elsewhere and they will lose their market advantage.

You only need but look at the cookie tracking the telecoms are doing right now to see that their oligopoly gives them little incentive to respect consumer's privacy.


> their oligopoly gives them little incentive to respect consumer's privacy

It's true they may not have much incentive to protect your privacy (besides perhaps competition from "better" companies and/or legislation).

But also keep in mind that Google has a huge incentive to breach your privacy, and have been taken to court over it, numerous times.


They mentioned that (it is an opt-in feature) that usage was far higher in developing countries where data bills can be between 11-25% of income.


I suppose the paper would show a substancial latency reduction would they select high latency networks for sampling.

I guess that on a satellite network (~600ms), latency would be greatly reduced as there is only one (and persistent) TCP connection to the proxy, with HTTP2 (or SPDY) over that (using http2 multiplexing capabilities).

This one tcp connection will be fully established and with a full tcp-window basically the whole time. The TLS with be fully negotiated and setup. Leaving the client with an optimized tunnel to google servers.

Considering that most latency on the path is on the satellite part, and google servers are on the other side of it, tcp handshakes and ssl setups to destination servers will occur on that low-latency side. Google will just push those optimized content over the tunnel to clients.

Looks like paradise.

TL,DR: Satellite clients won't open 4 connections to servers and wait for 4 tcp handshakes and slowly open those 4 connections tcp-windows for every site visited. They will open one connection to google server (on the other side of satellite) and let google do the hard work. High-latency paradise!


Seems remarkably similar to what Opera was doing with the Opera Mini, though not quite as intrusive.

I remember really appreciating how quickly pages rendered on that browser, compared to the stock chrome when I was using android.


It seems more similar in nature to Opera Turbo, which does a subset of the things described here.


"HTTP proxy service that extends the life of mobile data plans by compressing responses in-flight between origin servers and client browsers."

so this is part of the google mobile strategy.

"Flywheel is integrated with the Chrome web browser and reduces the size of proxied web pages by 50% for a median user."

is there any implementation of this outside chrome, OS?


I am the tech lead on the Flywheel proxy. The feature is currently launched for Chrome on Android, iOS, and desktop (including ChromeOS). There is no plan to implement this outside of Chrome.


"The feature is currently launched for Chrome on Android, iOS, and desktop (including ChromeOS)."

thx Matt, reading for background: 'Making the mobile web fast' ~ http://matt-welsh.blogspot.com.au/2011/05/what-im-working-on...


> Flywheel compresses all text responses using GZip.

Since this is for Chrome only, I'm surprised they don't use the other possible compression format: zlib (confusingly named deflate in the RFC). It's 12 bytes less and uses a faster to compute ADLER32 checksum, compared to CRC32 in gzip.


> For example, measurements of Flywheel’s workload show that 42% of HTML bytes on the web that would benefit from compression are uncompressed, > despite GZip being universally supported in modern web browsers [13]. This is in part because GZip is not enabled by default on most web servers, > yet only a single-line change to the server configuration is needed.

So really, it's a gzipping proxy written in Go. Hopefully they extend the protocol to better compression in future.


The proxy does much more than just gzip; this paragraph refers to only the impact of gzip on the Web (which was surprising to us given how widespread we assumed it would be).


I went to an interview at a startup a year or so ago who were doing this, they sold it as a service to corporations to help cut the cost of employees using too much mobile data along with other security features etc

Interesting idea I guess.


but what's their weisman score?


Approaching the theoretical limit of 2.89 if I'm not mistaken.


Are there still that many websites which has not enabled gzip to make this kind of proxy worth it? Enabling gzip is trivial.


http://www.bbc.co.uk/

You'd be surprised how many sites still do not enable gzip.


If you're cpu bound it might make sense to not gzip. Also if you send a lot of small files (in that case you probably would do better to change that so you don't need to send many small files but in general a single web page requests 10's if not 100's of small resources these days).


BBC site headers show heavy use of cache, they could have easily cached the compressed version and VARY'd on the ACCEPTability preference of the user-agent to handle the compressed version.


That's how it should work but until relatively recently it had some annoying hitches with clients which claimed to support gzip but were buggy[1], which meant you had to maintain a list of clients to never enable compression for. One of those clients is IE6 so I'm not terribly surprised that they put it off – for a site as widely visited as the BBC, even a small percentage of visitors means a potentially large number of complaints.

I still don't think that excuses not implementing it by now but I'd bet the explanation starts with some engineer having a bad week and not wanting to relive the experience.

1. Some major CDNs and caching proxies like nginx also haven't bothered to implement Vary but that doesn't matter for this particular scenario since they don't appear to be using any of them.


For similar products today: CloudFlare and Instart Logic have capabilities to do this.


Sure does feel like https://www.onavo.com was at least taken as inspiration for designing this system.


Or is this simply a service to spy on what people are browsing so that they can improve PageRank?


Not simply, but yes it will be used for that. And due to the third party doctrine it is information that is valid to request for Total Information Awareness.


I am the tech lead on the Flywheel proxy. We do not use the data passing through the proxy for any purpose other than compression. This is covered by the Chrome privacy policy which you can read here:

https://www.google.com/chrome/browser/privacy/


I hope this is and remains to be the case, but it is not in your control. Decisions to include this information into personalization of search or access to law enforcement is not a technical decision and it is something that will be considered in the future after the feature has been broadly adopted. Similarly, any EULA or privacy terms are subject to change.

If you did not serve up this information to request even though you had access to it, your company WOULD be breaking the law - this has already been settled in court. It is my conclusion that, as law abiding citizens and company, you would serve such requests. I also tangentially believe that if you felt the information would be useful to create a better product (and/or make more money) you would do this, as you are also compelled by law to maximize profit for shareholders and are incentivized do so to by financial compensation. I see no reason why your company would break the law to restrict the scope of this feature.

Furthermore, I can not tell where it says on the privacy policy that you do not or will not use this information for personalization of search or in response to compulsion of law. I saw a mention in the privacy policy of the compression proxy, but nothing relevant to Google not using the information or being able to provide it to a third party upon compulsion. Could you point me to that?


This line: "You do not need to provide any personally identifying information in order to use Chrome" is intended to cover pretty much all of the cases you describe. I'm not going to speculate on legal issues.


I read that line a very different way and on its face it does mean something very different. The (very good) lawyers at Google may want to look at how this is phrased if it is intended to mean more than what it says plainly.

No need to speculate as appellate courts (and Google's own recent trials) have made the law quite clear.


Just another vector for google to collect user data...


Note that use of the proxy is completely anonymous and we have no way of tying the traffic to a particular user.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: