Hacker News new | comments | show | ask | jobs | submit login

"""1) You can't always count on users having cached it from the CDN."""

No, but you can always count the MAJORITY of users will have it cached from the CDN.

You need to be careful with your wording here. The "MAJORITY" of the top 1m websites (based on Alexa rankings) don't use Google's CDNs (i.e. load a resource from googleapis.com):


...and that's across all versions of all libraries.

Quite how that correlates with how many of your first-time visitors will already have the library cached – because they happen to have recently visited another site that uses the same version of the same library – depends on how much your visitor demographic intersects with those sites that use the CDN. You then need to offset that against the DNS lookup time requires for the rest of your visitors to work out whether loading the file from Google's CDN makes sense.

If you're talking about repeat visitors, it doesn't matter where the file was served from, so long as you apply the correct cache-controlling headers.

17% of the top million sites using them (and rising) is like a guarantee the user will have it. It's not like they reset their browser cache every few days. Also, if their ISP does caching, it will most definitely be there too.

Besides, even when they don't have it, Google's CDN is better than a hit on your servers, both for your IO load, parallelism, delivery speed, etc.

Yahoo found that 40-60% of users were hitting the site with an empty cache at least once per day: http://www.stevesouders.com/blog/2010/04/26/call-to-improve-...

There's some research around suggesting that with so many versions of jQuery in use that the chances of finding the version of jQuery your site uses in cache is quite small. (can't find the article at the moment)

Components don't seem to stay in cache for very long these days because browser caches are max only 50MB (phones are much smaller) and with a bit of surfing it's easy to get to a position where components get ejected.

Also there is no guarantee that retrieving it from Google's CDN is faster than retrieving it from your servers e.g. there's DNS resolution, TCP connections to be setup etc., some of which will already be done for the main site.

I've just run my own analysis on the HTTP Archive data, and the fragmentation issues are very real. The most popular URL used to load jQuery was:


...and it was used by just 2.7% (945) of the 35,204 pages in the dataset. Note that it's not just version fragmentation - you have to take protocol into account too as browsers cache HTTP and HTTPS separately.

The next most popular was:


...used by 1.3% (460) of pages, followed by:


...used by 0.8% (285) of pages.

At this point there really isn't much of a debate; unless you have evidence to the contrary (e.g. all our visitors come from Facebook, and Facebook use the same version of jQuery as we do) using Google's CDN to load jQuery isn't likely to benefit the majority of your first-time visitors.

For those interested, I delved a little further and wrote up my findings here:


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact