Hacker News new | past | comments | ask | show | jobs | submit login
How to Combine GZip + CDN for Fastest Page Loads (alfajango.com)
28 points by JangoSteve on June 11, 2010 | hide | past | web | favorite | 28 comments

Amazon Cloudfront does support gzip, just not dynamically. So it requires uploading two versions of every CDN file, one uncompressed and a second that is compressed, being careful to set content-encoging and content-type headers correctly. Then all the pages on your site (assuming they're all dynamically generated of course) need decide whether the client can handle gzip or not and the link to the appropriate gzip/non-gzip version in the Amazon CDN.

The real "fun" begins when you have CSS or javascript files that reference other files in the CDN. You have to fix them to link to the correct gzip/non-gzip version of the files.

With some perl, trial-and-error it can be made to work. There are some Amazon S3 modules in CPAN, which is a big help.

The big win for Amazon CDN is the low cost to experiment: Pay as you go. Our bill for the first 3 months of playing around with S3 and the CDN (this is just for internal testing mind you) was less than a dollar. Looking at SimpleCDN, they want a minimum of $500 for a month of service before you can even access the full dashboard.

All the other CDNs I've looked at require a phone call to a sales and marketing droid to setup an account. When all I really want is a trial account and some API docs.

Amazon cloudfront supports serving gzipped files, it does not support reading the incoming request to see if it accepts gzipped files. If you read the first part of the original article, it explains why that doesn't work. It only works when you dynamically generate the page on EVERY request, which makes caching impossible. Hence, it's never been a viable option for us, which is what lead to this article.

No, I understand. I implemented the first suggested "fix" in your article that you say "won't work."

You seem to think caching two copies of a file is unreasonable (it all depends on how big your site is I guess), but in our case all pages are dynamic and uncachable so this requirement doesn't apply to us.

Ah, yes, if your site is uncachable anyway, then that really is the way to go. Caching two copies isn't unreasonable I guess, but in the context of Rails (which is the unstated context of this solution), it's more of a pain in the ass than it's worth.

[Edit: I updated the post to be clear that this solution does work if your page is uncachable anyway.]

This is a big win when you use ESI, so your actual page content is cached separately from your header information. You can even push the decision about which asset links to use out to the edge. Goooooo Varnish!


"I won’t go into detail about how to actually accomplish this, because the truth is, this won’t work either. " => the truth is.. you fail ;)

But yeah, nice loading graph and +1 for the mathematical/logical alternative solution.

Good article

EdgeCast CDN offers dynamic gzipping. You can also force a cache miss/refresh by altering the query string, which is really helpful if you are changing content. They are a bit more expensive though (about 25% more). I also had to go through a sales/marketing person to get a trial, which is a shame because their product was actually better than CloudFront, but I almost didn't try it out because I hate going through sales people.

Will someone please correct me if I'm wrong. But why is everyone worrying? Which browsers DON'T actually support gzip compression?

http://schroepl.net/projekte/mod_gzip/browser.htm This seems to suggest that all modern browsers support it.

IE. From your own link.

IE 5.5 and 6.0. Neither of which are modern browsers.

slightly ironic: that site is loading terribly slow.

Yes I apologize, the reason the site is so slow is simple: I haven't implemented any of this on that site yet. I'll be sure to do that before the next article though. If you'd like to see the effect of this actually implemented, check out http://www.ratemystudentrental.com

I haven't used their service, but SimpleCDN advertises their support for gzip right on the front page, in big letters. That might be a good alternate option to consider, if you have a bunch of text files that need CDN hosting.

Awesome, thanks! I actually meant to link to a couple CDNs that support it, but completely forgot to add those links to the article.

The problem with most of the CDNs that support gzipping are that they are much more expensive than Amazon CloudFront, and usually not pay-as-you-go. For instance, according to SimpleCDN's pricing page, the lowest tier for HotCache (their service that supports gzipping) is $500/mo. It's cheaper than cloudfront at the large scales it's meant for, but for most people it's overkill and too expensive.

I would think that any CDN would support gzipping, assuming you can set the content headers explicitly.

Amazon Cloudfront doesn't, but there are workarounds like this: http://blog.kenweiner.com/2009/08/serving-gzipped-javascript...

I wouldn't call it a workaround. You gzip it and set the content-encoding


I explained why this doesn't work well in the part of the OP article entitled "solutions that don't work".

You discussed server-side rewriting, but this is client-side. If the user's browser supports gzip, then the gzipped javascript file gets read properly and sets the flag, and further requests get directed to the gzipped versions. This doesn't defeat the speed improvement of the CDN, and doesn't add much complexity to the filesystem or code.

This is quite different from reading the request encoding in PHP, Python, etc, on the server side that you mention.

No, that solution requires that the client be able to accept gzip encoding. The problem is, the server needs to be able to read the incoming HTTP request to see if "gzip" is contained in the HTTP:Accept-encoding header and only serve the gzipped version if the answer is YES (and otherwise serve the non-gzipped version). Apache and most other servers do this easily. Amazon S3 and CloudFront do not.

That article ignores this fact, and just always serves the gzipped version with the gzip header encoding, whether the client accepts it or not. This works for probably 90-95% of the time (or more), but for anyone with a non-trivial app, that's not good enough.

the server needs to be able to read the incoming HTTP request to see if "gzip" is contained in the HTTP:Accept-encoding header and only serve the gzipped version if the answer is YES

You're wrong...this can be accomplished on the client side with Javascript, without ever involving server-side code:

I upload a gzipped test file to S3/Cloudfront containing a single line that sets a flag (e.g. supportsGzip = true).

I then include that script at the top of my HTML page. If the browser supports gzip, then that file gets read correctly, and the supportsGzip variable gets properly set to true. If the browser does not support gzip, the file is gibberish, and the flag does not get set.

Throughout the rest of the file, I use that supportsGzip variable to determine which versions of other static files to load (e.g. if(supportsGzip) {document.write(script tag src = gzip path)})

Ah, I actually mis-read the article. Sorry, I'm wrong, you're right, that would work ;-)

I guess the only situation this wouldn't work would be if the user's browser supports gzipping, but doesn't have javascript enabled, but then it'll just serve everything unzipped by default, so not a big deal at all.

Also, it seems kind of a pain to have to do that if-statement throughout all of your javascripts, html, and stylesheets. But I'm sure you could probably create a javascript 1-liner that runs after everything else that just goes through and changes all the sources for you, so it might end up being even easier than our method.

I'd be willing to bet that Amazon starts offering on the fly gzip within a year's time

this increases latency.

What I don't get is why anyone who wanted "fastest page loads" would touch CloudFront with a ten foot pole. Check the testing; they're by far the slowest.

This would be so much more helpful if it detailed more common CDNs like Akamai.

I'm not sure I understand. What do you mean detailed more common CDNs? If a given CDN does not support gzipping, just read the article inserting your CDN in place of "Amazon CloudFront". If the CDN does support gzipping, then this solution is unneeded, as you can just use that CDN normally.

Or, just use the www.instacdn.com api instead. It takes care of all the messy stuff.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact