

How To Optimize Your Site With HTTP Caching - zerop
http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/

======
bitops
This is a good writeup, though I'm surprised no-one has mentioned the holy
grail of caching with HTTP. That of course is good old RFC 2616:
<http://www.ietf.org/rfc/rfc2616.txt>

There's an entire section in there devoted _just_ to caching in HTTP. Very
well worth reading in its entirety.

~~~
d0mine
<http://tools.ietf.org/html/rfc2616> adds errata, RFCs that update it and
hyperlinks (it is a web after all)

Direct link to Caching in HTTP <http://tools.ietf.org/html/rfc2616#section-13>

------
ceol
This is a great article for an introduction to HTTP caching. It's well-written
and even covers how to set up caching in Apache.

~~~
kalid
Thanks for the kind words (I'm the author). My main, ever-evolving goal when
writing tutorials is to try to write what I'd like to see:

1) Explain the underlying concept

2) Show variations

3) Explain how to do it yourself

4) Show how to verify you did it correctly

5) Meta: be as concise as possible, maximize bang for the buck

~~~
raghus
I've always enjoyed your writings kalid. I esp liked
[http://betterexplained.com/articles/a-visual-intuitive-
guide...](http://betterexplained.com/articles/a-visual-intuitive-guide-to-
imaginary-numbers/). Thanks so much for your site!

~~~
kalid
Thanks, I appreciate it!

------
js2
Be careful with ETags if you're serving content from a web-farm -
<http://developer.yahoo.com/performance/rules.html#etags>

~~~
riledhel
In the same spirit, there's a lot to gain and very little to lose if you
activate ETags _and_ serve content from just one source.

------
js4all
A pretty basic write up. Static caching is standard these days. The article
doesn't help with speeding up today's dynamic sites.

------
ehc
Article from 2007

~~~
kalid
Yep, I was surprised too to see it on HN :).

I think one of the meta-takeaway is that understanding the fundamentals of web
caching can help with your general CS knowledge ("There are only two hard
problems in Computer Science: cache invalidation and naming things." -- Phil
Karlton).

Looking at Apache, we see a few strategies:

    
    
      * Include last-modified metadata
      * Include content metadata (eTag/md5 of content)
      * Include explicit expiration date
      * Include a max-age
      * Include metadata about who can cache (public/private/no-caching, i.e. users can cache but proxies cannot)
    

These approaches could be used when designing data flows with Memcache, Redis,
etc.

~~~
SquareWheel
One thing I don't understand. If the server has asked the client to cache an
image for a year, and the image is indeed updated in that time, is there some
way of telling the client to download that image anyway?

I'd take it to Google, but I have no idea how I'd ask that in Google query
form.

~~~
mkchandler
This is actually referenced in the article. You can use the Last-Modified date
and the server will either return a 304 (Not Modified) or the modified image
if it is newer.

~~~
SquareWheel
I read that, but if you say "this image won't change for exactly one year" and
the client doesn't even request that resource from the server any more, how do
you start that dialogue again?

pork has offered that you add a junk parameter to the end of a GET request and
that should disrupt the cache, I'll need to read in to this. I'm interested in
optimizing web speed as much as possible and this sort of thing and caching
has always been something I've understood poorly.

~~~
kalid
Yep, that's the problem with long expiration dates -- the client may never
check again (that's what we wanted, right?). The workaround is to request a
new url which restarts the process.

Separately, the easiest way to get started with all these optimizations is to
run the page speed check online:

<https://developers.google.com/pagespeed/>

and follow the recommendations, most important to least.

~~~
SquareWheel
I've actually been playing around with this stuff all day, pretty much since
my last comment above. I've enabled smarter caching on my website, replaced
multiple image requests with a single spritesheet, optimized my images, and
cleaned up my CSS file to remove unused code. Google's PageSpeed has been an
invaluable tool, as well as webpagetest.org which breaks down the data in an
intelligent way.

Turns out Google Analytics is actually doubling my page load time, but the
data is too valuable to give up.

Anyway, thanks for the tips.

