

Static Asset Compilation - autoref
http://autoref.com/blog/2012/09/08/the-tech-behind-autoref-part-2static-asset-compilation/

======
coenhyde
Fairly standard stuff. If you're going to have a title with "you're doing it
wrong" you should have some unique insight to support your dramatised title.

~~~
jmtulloss
I felt the same way. I was hoping the article would be about a radically
different approach, but it's mostly just best practices.

------
voidfiles
With such a complicated system I think you are missing out on the most
signifcant speed optimization technique; reducing http requests.

For reference:
[http://developer.yahoo.com/blogs/ydn/posts/2007/04/rule_1_ma...](http://developer.yahoo.com/blogs/ydn/posts/2007/04/rule_1_make_few/)

It's laudable that you are paying attention to caching, but you don't compile
all your files in to one file. It seems like you could pick up a lot ground
here by at least concating all css into one file, and js in to one file.

Also you could load jquery from the Google AJAX API endpoint. That way the
users has a higher chance of having already loaded jQuery.

Also using the same CSS/JS products across multiple pages would help.

~~~
autoref
An excellent point, but you have to consider warm cache vs cold cache
optimizations. For a cold cache, it's better to combine assets and reduce HTTP
requests. We do that on our homepage.

For a warm cache, it's better to split assets up so they are cached in finer
chunks. If I added jQuery in to every page JS, there would be few HTTP
requests but it would pull jQuery every time, making the payload much larger.
There's a balance. I'll write another post about warm vs cold cache
optimizations soon.

"using the same CSS/JS products across multiple pages would help."

Definitely. Using jQuery on half your site and YUI on the other half is pretty
bad from all angles.

"you could load jquery from the Google AJAX API endpoint."

Yeah. Two reasons we don't: 1. I'm in security, and trust no one. 2. HTTPS
connection reuse vs negotiation with another host. I have yet another post in
the pipe about SSL optimizations.

~~~
codeka
I usually have three "chunks" of CSS/JS per page, one for CSS/JS that's shared
across all pages of the whole site, one that's common among a "group" of pages
and one that's unique to just that page.

Of course, not all pages get all three chunks, but I find it's a reasonable
tradeoff between reducing the number of requests and not just including
_everything_ on every page.

------
eli
You don't need to rename the files. As of a few months ago, you can configure
CloudFront to take query strings into account when caching, so you can simply
link to the file as normal but append "?<your_hash_here>" to the filename. (I
actually prefer using the last modified timestamp over a hash.) IMHO, this is
better because it requires less magic on the origin server. And even _ancient_
references to a file (a logo someone hotlinked, for example) will still render
rather than 404, so long as the name hasn't changed. No need to keep tons of
old revisions of files around.

~~~
autoref
This isn't recommended since many browsers and proxies do not cache resources
that are referenced using a query string, even if a cache-control or expires
header is set appropriately. Google says Squid up to 3.0 will not do so:

[https://developers.google.com/speed/docs/best-
practices/cach...](https://developers.google.com/speed/docs/best-
practices/caching)

~~~
latchkey
Squid 3.0 was released in 2007. It can be argued that this is an out of date
recommendation. My experience using query strings has been fine.

~~~
eli
To be fair, I think it was still the current version up through 2011.

The more important point is that the failure mode is simply that the assets
load from the CDN as if there were no squid proxy. This is not ideal, but it's
not so bad either.

------
captn3m0
Could someone say if using HttpGzipStaticModule really helps? Gzipping small
static resources on-the-fly should not take down your cpu by much.

Surely a nice thing to have, but does it help?

------
howardr
I found that renaming CSS file names using the hash of the contents does not
always work because any changes to dependencies (e.g. images) won't always
bubble up to the CSS. I forget all of the reasons why it didn't always work,
but it I think it had to do with CDN invalidation for files that I could not-
rename (e.g index.html).

The process I use computes the hash of every file and creates a dependency map
then I use the hash of the contents of a file and its dependencies to rename
the file.

~~~
autoref
Right. Images and fonts have to be written and hashed first, then used in the
template rendering of the CSS file. The CSS references the assets with hashes
in the filenames.

------
nestlequ1k
Can someone with knowledge of both this and rails 3.1 explain the difference.
Seems very similar.

~~~
amalag
Yes this article is written for people not using Rails & Sprockets. Pretty
amazing the best practices that Rails asset pipeline enforces. It will also
concatenate the JS & CSS files to reduce HTTP requests and does automatic .gz
files on disk. When used with asset_sync gem it can also push these to S3 or
your CDN to avoid your web server altogether.

~~~
sirn
In Python world we have webassets[1] that does something similar (to Jammit,
anyway). It is a little bit more complicated to use than Sprockets but I'd
argue that it is also a bit more flexible. (Thanks to filters chaining e.g.
compile SASS, merge them, add vendor prefixes, optimize, compress then Gzip as
a single chain.)

[1] <http://elsdoerfer.name/docs/webassets/>

------
moonboots
Good tips. I've found that <http://pngquant.org> generates smaller pngs than
optipng, but the former is lossy (reduced color palette). I can't tell the
difference though.

~~~
tetravus
You can have lossy compression that results in zero difference to the end
image on a pixel by pixel basis.

E.g. if the PNG was 32 bit, and had a full color palette but was filled with a
single 8 bit color. You could safely, and "loss-ily", convert the PNG to 8 bit
and replace the entire color palette with the single entry for the color that
is actually used.

That said, PNGQuant uses dithering so there will often be changes apparent if
you perform a pixel by pixel comparison in code.

Just like you, I can't visually identify the difference between a PNGQuant
image and the 'raw' PNG that was used to create it (at least not on any images
that I've seen so far).

~~~
ciniglio
To nitpick a little: If your source and final are the same, I don't think you
can call it lossy, by definition.

------
rthprog
Aside from minifying javascript, you should probably also consider using
Google's Closure Compiler in 'Advanced-Compilation' mode. I believe it does a
much better job than traditional minification.

[https://developers.google.com/closure/compiler/docs/api-
tuto...](https://developers.google.com/closure/compiler/docs/api-tutorial3)

------
brown9-2
Is putting hash digests in filenames really easier than sending _Last-
Modified_ headers in the response, parsing If- _Modified-Since_ headers and
returning 304 when applicable, and/or using ETags?

I would have thought that most web frameworks do all these things for you
automatically by now.

~~~
jmtulloss
Putting the hash in the filename allows the browser to not even make a request
that would result in a 304 request. It also works behind badly behaved proxies
and caches that don't properly respect cache headers.

~~~
lotyrin
It also allows for pages that were generated and cached before changes were
made to still have resources, as well as other cases where you might have
divergent sets of resources (split tests, rolling deployments, etc.)

------
malyk
We use the git commit hash of the checkin that is pushed to production as part
of the folder structure for our assets. Has worked really well for us.

It does mean we use more space on s3 though, but it guarantees we won't miss
re-seeding some of the files.

~~~
autoref
The bad part is no file is cached between pushes, right?

------
ryetoasthumor
Interesting series [http://autoref.com/blog/2012/09/07/the-tech-behind-
autoref-p...](http://autoref.com/blog/2012/09/07/the-tech-behind-autoref-
part-1-our-stack/)

Full disclosure (bizdev at AutoRef)

------
cbhl
Why is it safe to include a subset of the SHA1 digest instead of the whole
digest? What's the reasoning behind this? Would it make sense to use a shorter
hash (e.g. CRC32) instead if your filenames have to be that short?

~~~
patio11
Because SHA1 tends to have every byte of the digest change if so much as one
byte of the message changes (if you can disprove that, you have a much more
important result than "Oops our caching is slightly borked"). Accordingly, 10
hex digits is sufficient to guarantee that a change breaks the old cache (1 -
1 / 2^40) of the time. You wouldn't be at risk of birthday-paradoxing your
caches even with billions of files in your site's history.

------
csense
You could probably shave a few bytes off your URL's, while achieving the same
collision resistance (or alternatively increase the collision resistance in
the same number of bytes) if you base64 encoded the hash.

------
dazbradbury
If you're using .net, RequestReduce [1] is an excellent tool for managing your
static assets.

[1] - <https://github.com/mwrock/RequestReduce>

~~~
andrewdavey
Also check out Cassette <http://getcassette.net/>

------
kmfrk
Is ImageOptim still a good choice to go with? I really like the simplicity of
the GUI.

~~~
kevinconroy
Yes, ImageOptim is a wrapper around pngoptim and several other programs. It
tries them all and goes with the one that provides you with the optimal
compression for that specific file. It also supports JPEG and GIF files.

