

How we hash our Javascript for better caching and less breakage on updates - __david__
http://blog.greenfelt.net/2009/09/01/caching-javascript-safely/

======
NateLawson
The hashing approach is fine, but I don't get why they do this at runtime
instead of at site push time. There are so many other things you want to do
before making the content live like run regression tests and validators, get a
commit message that explains the change, etc.

It seems like you will always have a script in there somewhere. Why not have
it do the hash tag replacement at the same time and then the content is
static?

I may just be a curmudgeon. I go to extremes to turn any runtime code into
periodically-generated static code. I once started a mini project that was
email-based blogging software. To post a comment, each page and existing
comment had a mailto: link with an embedded ID string. Replies would hit
procmail, which would pull out the ID and embed the comment in the HTML of the
original page at the right place. No Javascript or CGI present, only procmail
and static HTML. I wonder if Posterous does some of this?

~~~
IsaacSchlueter
Yes. You're doing it right.

Run-time works fine if you're loading all your JS and CSS from the same
machine as your markup. But that's not an approach that scales quite as well.

~~~
onedognight
I would assume the files would be in both places, but you just happen to point
to the cdn for speed, so at least the frontend could still use the same
scheme.

------
jim_lawless
I have seen some success in controlling caching of JS files by appending a
query-string ( including something such as a timestamp of the JS file ) to the
end of the URL.

script src="/whatever.js?x=09_02_2009_08_15_AM

Various versions of IE and Mozilla Firefox seem to treat this as a unique URL
because of the query-string, even if whatever.js is in the cache. The caching
proxies that I've dealt with seem to handle this, but I'm not sure that all of
them would ... so the author's approach is much more reliable.

~~~
Derrek
I've had similar success by appending the site version to the query string.
src="/foo.js?v=2.1" This has worked for me across all major browsers and
required extremely little work for setup and maintenance.

~~~
onedognight
This requires all api changes to touch the version (to be pedantic) which then
becomes a major source of version control conflicts (in my experience).

------
pilif
what I don't like is that their solution is still dependent of the file date
of the script and all its dependencies. While usually accurate, file dates can
be misleading, not to mention the performance hit of doing stat() on all the
related files.

I'm using quite the same thing here, but I'm doing the compilation step
manually, but also combining all the JS files into one big minified one.

The templating engine always uses that combined file if it's available (one
stat call). If not, it uses the non-combined versions (useful for development)

~~~
__david__
> what I don't like is that their solution is still dependent of the file date
> of the script and all its dependencies. While usually accurate, file dates
> can be misleading

I'm curious what you mean by that. File dates are what "make" and all the
other builders are built upon. If they weren't to be trusted then you could
never compile programs correctly...

> not to mention the performance hit of doing stat() on all the related files.

The thing is stats are really really fast. Check onedognight's benchmark above
(<http://news.ycombinator.com/item?id=800778>). On my server I get 1.6 million
stats per second.

------
jacquesm
What goes for .js goes for any content type, static html, png, jpeg, gif
files, videos and so on. If there is a caching issue then you will not see
updates on the client side.

I can see this method has advantages but it is a band-aid over a non-
functioning caching system in the first place.

Makes you wonder what caused the original problem!

This system has advantages, there is no doubt about that, but what caused the
original issue (some browsers not fetching updated javascript files) remains
unsolved.

~~~
jerf
Caching is Hard (TM). If you take the time to fully understand the HTTP
caching system, it will make sense, and it will become clear that problems
with stale content are because you are using it wrong, usually because you set
an expires time in the future for content that will change. It's really easy
to not understand the caching system, or default to using a framework, and
that's where the problem usually lies.

A framework will often choose to default to setting some expires time, often
because the framework authors don't really get caching either and don't fully
understand why that's not really going to work. (It is natural that they end
up here. E-Tags is a much safer default, but to make it efficient requires
more work from the framework user; why that is is a bit more than I'd like to
get into in this message.)

If content is going to change, you should be using E-Tags, not expiration
times. This includes content that you may not think is going to change, like
Javascript files, but in fact do sometimes. The best solution in this case is
actually to keep creating new URLs and set Expires into the far future, so
that if the browser needs "something_8ef38b.js" which had its expires set in
2030, it knows it doesn't even have to hit the server, and if you "change" the
file, your new web pages will actually reference "something_f32190.js", a new
file. The timestamp-on-the-end trick is the same. I wish more frameworks built
this idea in better, it's generally useful.

(Drawing parallels with functional programming's idea of "value" is left as an
exercise for the reader.)

~~~
onedognight
Funny, your best solution is exactly what the post described with the addition
of creating the filenames and cache on the fly for ease of use.

~~~
jerf
I didn't pretend it was new. I was wrapping more context around it, so that it
was less "follow this magical recipe" and more "here's why you should do this"
(and also "here's a bit on why so many people end up doing this wrong thing"),
though a full accounting of HTTP caching would be much longer. I also pointed
out similar other things in the comments made by other people.

------
amix
I have also used this method for some time. An advantage the author does not
mention is that this method is required if you want to host your files on a
CDN like Amazon's CloudFront. Another point is that you should host your CSS
and image files using this technique (like the author also notes... because
you don't have to rely on a browser's cache expiration and because you can
cache stuff aggressively [like setting an expiration date 10 years from now]).

------
pyre
Why bother with hashing? Why not just append 'version numbers' to the
JavaScript files? (e.g. sht.1.js sht.2.js sht.3.js )

This way you have a 'better' history because the numbers are in order and you
can immediately tell which JS file was the previous one. While you're at it,
why not just make this a commit/push hook rather than needing to generate it
on the first page load? It would still be automatic for you.

------
sh1mmer
You should really read Isaac's comments on the blog:

[http://blog.greenfelt.net/2009/09/01/caching-javascript-
safe...](http://blog.greenfelt.net/2009/09/01/caching-javascript-
safely/#comments)

They are really insightful and expose the knowledge gained from a lot of work
he's done at Yahoo! on this subject.

------
chrisb
Interestingly, this is the same solution that is used in GWT - for all the
same reasons given in this post.

------
jokull
Django-compress does most of this.

