

6,953 reasons why I still let Google host jQuery for me - Encosia
http://encosia.com/2010/09/15/6953-reasons-why-i-still-let-google-host-jquery-for-me/

======
nphase
I used to let Google host jQuery for me. And then one day their CDN went down
for a large chunk of the midwest (where I live), and I noticed the render time
of my site (and a few others) jump up anywhere between 5x and 100x.

That was the moment I resolved never to leave critical, blocking elements of
any site I run into the hands of others, no matter how well known or reliable
they are. (FWIW, this also includes ad network invocation scripts and similar,
which always seem to be notoriously slow to load).

~~~
Encosia
I decided it's outside the scope of that post (which was already laboriously
long winded), but you can/should use a fallback technique like this to
mitigate the potential for Google downtime:
[http://weblogs.asp.net/jgalloway/archive/2010/01/21/using-
cd...](http://weblogs.asp.net/jgalloway/archive/2010/01/21/using-cdn-hosted-
jquery-with-a-local-fall-back-copy.aspx)

Also, HTML5 Boilerplate has that built-in: <http://html5boilerplate.com/>

~~~
points
The fallback technique does nothing to mitigate the "It can connect but
loading takes forever" case surely.

~~~
sounddust
I've thought to myself, "I wish there was a timeout attribute on the <script>
tag," about once every few months for the past 10 years. Is there any good
reason you can't manually specify how long you want the browser to wait for an
external file to load?

~~~
stellar678
There's no reason you couldn't code this up in JavaScript:

    
    
      load jquery from google
        - onload, set flag jquery_loaded
      settimeout load_locally_if_not_flag 5sec

~~~
toolate
Don't forget that JavaScript is blocking. You'd need to load the jQuery from
Google dynamically by inserting script tags.

By the time you've added that, timeouts and fallbacks the amount of inline JS
would make hosting jQuery on a CDN pointless.

------
spokey
It seems to me that unless the likelihood of a cache miss is fairly small you
need to balance the probability of a cache hit against the expense of an extra
HTTP call, as opposed to bundling the JQuery libraries directly with your
custom JS with some JavaScript minimization trickery (and two HTTP calls if
you're using both jquery and jquery-ui).

I have no doubt that the likelihood of a cache hit here is growing, but I
wonder what the likelihood of an actual hit is? These data show that 4.7% of
the top 1000 Alexa sites use _some version_ of JQuery. What you'd need to
consider is the likelihood that your visitor has (a) visited one of the those
47, (b) that is using the same version of JQuery as you are, and (c) has done
it recently enough that the (relatively large) files are still locally cached.
I suspect that for most sites that works out to much more than 4.7%, but is it
more than 50%? If not aren't half of your users getting a slower response as a
result?

(Moreover, and I don't know if or how this effects the JQuery CDN, but doesn't
it seem like many sites drag because of delays in loading the Google Analytics
JavaScript files? Wouldn't this pose an even greater problem if you're using
Google to serve JQuery, since your UI depends upon it?)

~~~
photon_off
I started off using Google CDN to host my jQuery file, then later ditched it
because about 20% of the time there would be a noticeable delay in retrieving
it (if I'd cleared my cache).

There's really no reason not to just host jQuery yourself. Use GZip, and set a
far-future expires header. Ensure the jQuery file is named by version, so that
if you update the version the cached filename will be different. That's all
you need to do, really.

One last note: The benefit of putting script tags at the bottom of the body is
very similar to having the scripts cached in the first place. Just in case you
didn't know, putting script includes at the bottom of the page lets the
browser render the page progressively as it retrieves the HTML text [generally
very, very quickly]. Scripts in the HEAD block rendering, as the browser needs
to be load each script file sequentially, in case there are dependencies.
[Note: not exactly true, it will grab several in parallel and execute them in
order, but there's still a delay.]

Whether or not the scripts are cached, very fast page rendering will make the
page appear to have loaded quickly. Likely, the user will not require
javascript by the time the scripts are loaded anyway, if they aren't already
cached.

~~~
_harry
You can also speed up load time using LAB.js <http://www.labjs.com>

It loads and executes all scripts in parallel. Just be careful and hide any
elements that rely on javascript before the scripts finish loading.

------
mike-cardwell
I'm not willing to hand over the security of my websites and privacy of my
users to a third party, in exchange for my first page load to be fractionally
shorter for a small number of my visitors.

Googles jQuery hosting is now a _highly_ desirable target and I don't want to
be included in the victims if it does get attacked. We learnt earlier this
year how Google can be hacked.

~~~
Encosia
If Google's CDN were hacked (as unlikely as that is), it's almost certain that
you'd find out about it _far_ sooner than if your own server were hacked.
There would be a huge controversy and then it would be quickly fixed, probably
in the course of hours or minutes, just like with the Twitter CSRF issue this
morning.

Conversely, the Internet is absolutely littered with compromised sites that
have been modified to inject malicious scripts.

The situation is similar to Linus' Law.

~~~
mike-cardwell
Without using Googles CDN, they have to hack my website

With Googles CDN, they have to hack _either_ my website, _or_ Googles CDN.

Whichever you choose, it wont make any difference to how quickly you notice a
local hack. Increasing the attack space doesn't make you more secure.

~~~
mikemol
That's not a very high cost. They don't _choose_ to hit you, they let their
scripts and botnets look around for old and vulnerable software.

Have you looked at your raw httpd logs? When I look at mine, and grep away
known-cookies, I see that I'm frequently scanned by hundreds of IPs looking
for vulnerabilities in common software packages.

And that's just the stuff that shows up in logged HTTP queries. I don't want
to think about how likely it is that tools like nessus are constantly being
scan-run against IP ranges that I sit within.

Ok, sure, you can believe you're going to be more on top of things keeping
your site secure than a high-value target like Google. I don't know how the
target value of your site, but I doubt it's as high as the server the jQuery
plugin you're afraid of pulling remotely sits on--and you can bet that Google
knows they have high-target-value externally-facing assets, and are watching
them even harder and with more eyes than you would.

~~~
mike-cardwell
The thing we're discussing here is whether jquery.js is stored on my server
with the rest of my website, or some other third party server. I'm not sure
how the things you've said above apply to this discussion?

~~~
mikemol
You were critiquing the security cost of hosting on your own server verses
that other server. It was pointed out to you that the admins of that other
server would likely learn of (and react to) a breach on their end at a lower
latency than you would for your server.

You implied that the security cost for hosting on your server was actually
lower, because you weren't as much of a target. My reply was an attempt to
point out to you at a technical level why that was a specious argument; your
servers are likely being scanned by the same botnets that are scanning mine
with automated exploit attempts against old and vulnerable software, and
common errors in securing a server.

It's going to be far easier and cheaper for them to take a shotgun-scanner
approach against a large class of average systems than to apply manual,
concerted effort against a small set of high-value targets like CDN nodes.

The cost to the attacker to attack your system with automated tools is near
nil. They'll attack, and if they get in, that's gravy. Using "we're not a
target" as a security model makes about as much sense as putting an unpatched
Windows box in your home router's DMZ.

~~~
mike-cardwell
I'm already hosting my own website on my own server. That attack space already
exists. You seem to be misunderstanding this.

We're only talking about moving one of my files from my current website to an
entirely different third party service over which I have no control...

Do you not understand this? Spreading my website over multiple services
controlled by multiple people decreases the security... Obviously...

~~~
mikemol
I think the part I may have misunderstood was where you said, "With Googles
CDN, they have to hack either my website, _or_ Googles CDN.", and I
interpreted that as an exclusive condition, rather than an inclusive one.
Probably the "either" that did that.

With that misunderstanding corrected, I believe you're generally correct on
the security argument. There's still some plausible variation in terms of
server security policy and implementation of things like intrusion detection,
(Is it safer to keep all your money in your home, or is it safer to keep most
of it in a safe deposit box in a bank?) but that's not the key problem I
thought I noticed in your argument, and not one worth devoting energy into.

------
_delirium
One thing that doesn't seem to have numbers, though I think the data would be
sufficient to give them, is what the caching probability is like _after_
taking the fragmentation into account. If I reference the Google CDN URL for
jQuery 1.4.2, how many of the top 200,000 sites reference that? I assume it's
rather less than the 6,953 that reference any version, but how much less?

~~~
Encosia
The split is about 50/50 right now. I've run the crawler three times in the
last ~5 months and observed the transition from 1.3.2 to 1.4.2 moving along
quite nicely though. After my first run, 1.4 adoption was so anemic that I was
worried 1.3.2 was going to be jQuery's IE6. At this rate, it looks like 1.3.x
should be a small minority by the time 1.5 rolls around.

------
CWIZO
_HTTP errors – About 10% of the URLs I requested were unresolvable,
unreachable, or otherwise refused my connection. A big part of that is due to
Alexa basing its rankings on domains, not specific hosts. Even if a site only
responds to www.domain.com, Alexa lists it as domain.com and my request to
domain.com went unanswered._

 _At first, that may seem like an awful lot of potential error. However, the
one thing all of these inaccuracies have in common is that none of them favor
the case for using a public CDN._

I would have to disagree with the last paragraph there. I think that if one is
so incompetent that his page is not available without the "www.", that there
is a very strong chance that such person hasn't heard of a CDN. So domains
that are not working without the "www." are, in my opinion, favouring the non
CDN way.

~~~
estel
I've started to see some best-practice-if-you-ignore-user-expectation guides
out there which say that allowing the domain to ignore the www is Not A Good
Idea. I don't really know why this is the case, though.

~~~
eli
I've seen people worry that if incoming links are split between foo.com and
www.foo.com, it will affect Google's ranking of your site.

I don't think this is true and, anyway, the right solution would be to
redirect one to the other.

~~~
mike-cardwell
Google treats <http://www.example.com/foo.html> as a different page to
<http://example.com/foo.html> and it is completely right to do so because they
could be different pages.

You can use rel="canonical" to get around this, or a http redirect.

------
misterbwong
The one reason we don't use Google's CDN for our public website: Some of our
business users block sites by domain or IP, so they allow our site but block
google's CDN. It's a PITA to get a rule added to a client's security setup.

~~~
joshuacc
Do you know why they block Google's CDN?

~~~
misterbwong
I don't know their institutional reasoning, but I believe they are on a
whitelist based system. They're not so much blocking Google as they are
allowing us through.

------
redstripe
I was playing around with the google map API one day when it started throwing
very strange errors. Upon further investigation, I found the library URL was
returning an html captcha page - not very useful to a browser expecting a
javascript file.

Even google screws up simple stuff sometimes. So I think I'll pass on using
their CDN for something as small as the jquery library. You're optimizing the
wrong thing if you're worried about this.

------
endlessvoid94
I ran into a problem with https -- will google allow me to securely reference
their CDN?

~~~
seiji
Works for me: <https://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js>

~~~
endlessvoid94
damn, my solution was to host it locally. how did i blatantly not try this?

~~~
seiji
Everybody could use protocol relative URLs with the google CDN to simplify
things:

    
    
      <script src="//ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js"></script>

~~~
dasil003
holy shit! Is that universally supported?

~~~
ktsmith
Yes, but there's bugs when used with stylesheets in IE7 & IE8

[http://www.stevesouders.com/blog/2010/02/10/5a-missing-
schem...](http://www.stevesouders.com/blog/2010/02/10/5a-missing-schema-
double-download/)

------
jread
Another option for using a CDN that will let you maintain better control and
outage visibility, is to sign up for a paygo CDN account yourself. GoGrid and
Speedyrails resell Edgecast CDN and Softlayer resells Internap CDN both of
which are very good performing CDNs with both origin and pop pull models. The
cost for these services would only be about $0.57 per 100k jQuery hits
(assuming 24KB minified version is used).

~~~
Encosia
One of the biggest underlying benefits of using a shared, public CDN for this
is that you can take advantage of cross-site caching. As more and more people
use it, the potential for that is greater and greater.

However, it only works if sites are referencing exactly the same URL; just
referencing the same file is unfortunately not good enough. So, using private
CDNs like those don't confer quite the same benefit (though they're a great
idea for hosting site-specific assets, of course).

------
Encosia
My original post about using Google's CDN to host jQuery generated a lot of
discussion here, so I thought you might be interested in this one too.

------
il
I'm interested in the technology behind your crawler, you could potentially
use it to discover many more things, like which sites use popular APIs etc.
What language is it written in? What do you use for the backend/DB? How fast
is it? I'm working on a project involving similar large scale crawling and I
would love to know more.

~~~
spokey
You might be interested in <http://trends.builtwith.com/>

It doesn't give you a crawler or API, but they have done a lot of the analysis
you're talking about and put it together in an explorable interface.

~~~
il
That site is nice, but they also want to charge me close to $2000 for a list
of sites using a single technology, I could definitely do this myself for much
less.

Anyone want to build a free version with me?

~~~
spokey
To be honest, I didn't dig into that site deeply enough to notice that. I
pulled this out of an email I was sent just yesterday.

Sure, I'd be interested in collaborating if there's not one out there already.

The spider part is easy, you just need a web client (e.g. Ruby's or Python's
Mechanize, Java's HTTPClient, even just wget or curl) coupled with an HTML
parser (Hpricot, Nokogiri, Tidy, etc., or even some basic regular
expressions). One can readily hack something rough together in an hour or two.
Gabriel might have a lot of the data and certainly the code in order to
produce DuckDuckGo, but he may have good reasons to keep that private.

The harder part, and the part that I wonder if builtwith is doing correctly,
is to do the technology detection. Things like JavaScript libraries or CSS
frameworks might be fairly easy to detect, but it is not trivial to reliably
detect some of the server side technologies. I recently put together a script
to survey the operating system and web server in use at a large number of
domains from Alexa's top million list (similar to what Netcraft does) and
there are plenty of servers that make that difficult, let alone determining
whether a site is built with Ruby, Java or PHP. There are HTTP headers that
could tell you, but not everyone uses them. There are certain signatures that
give a pretty good clue, but those aren't always present and can be downright
misleading. (I've seen sites that migrated from ASP to Java Servlets, for
example, that kept .aspx URLs to avoid breaking links.)

If I remember correctly someone posted a JavaScript framework survey based on
a similar spidering approach on HN a while back, you might be able to find it
at searchyc.

