
Cache-Control: immutable - chadaustin
https://bitsup.blogspot.com/2016/05/cache-control-immutable.html
======
briansmith
This Google Chrome document is worth reading:
[https://docs.google.com/document/d/1vwx8WiUASKyC2I-j2smNhaJa...](https://docs.google.com/document/d/1vwx8WiUASKyC2I-j2smNhaJaQQhcWREh7PC3HiIAQCo/edit#)

In particular, it gets into the heart of the matter: What does the user want
to happen when they click the Refresh button?

It does seem worthwhile to try to change the default behavior of the Refresh
button to mean "refresh the page" instead of "fix the page" (what it currently
does), which would make this "immutable" proposal unnecessary, AFAICT.

~~~
developer2
IIRC this is exactly what the reload button _used_ to do. You had to hold down
what I believe was the Control key while pressing the reload button to do a
"force refresh". Now it would seem it's the default behaviour. That, or maybe
a normal refresh does the revalidation checks (which return 304), while a
Control-refresh does a full download of all resources?

~~~
SwellJoe
Browsers behave the same (though they've acquired additional nuance since the
HTTP 1.0 and pre days) as they always have, generally speaking. IE had a cache
control bug for many years that made it impossible to force a reload in some
circumstances, but was fixed in IE 6.

The change is on the server side, not the browser. Modern single page
applications do all kinds of janky things, and a lot of them break caching,
either explicitly (with cache-control headers) or accidentally (with
uncacheable URLs).

As far as I know, every major browser has standards compliant cache-control
implementations, and all have some way to force a full reload.

Source: I worked on cache-control browser compatibility in Squid many years
ago. The browsers took a while, but did get it right eventually.

------
manigandham
Judging by the comments here, there seems to be some confusion.

This is exactly like long-lived cache settings today. Right now browsers send
a request on basic reloads and get back a 304 from the server which states
that nothing has changed. All this setting does is let the server tell the
browser to skip that check/roundtrip instead of wasting the time/bandwidth on
confirming with a 304 after the initial load.

The browser is still completely in control here and can do a full reload or
just reload all the time if it wants to. Web scrapers and other HTTP clients
are unaffected.

~~~
Klathmon
Which is fantastic for "modern" web apps.

We use webpack and all filenames are just sha hashes of the contents in the
production builds. There is no need for the browser to ever ask anything about
that file again (unless its purged from the cache...).

~~~
mitchtbaum
Why use hashes? Wouldn't UUIDs make more sense?

~~~
STRiDEX
Frontend builds sometimes destroy the dist folder and rebuild all assets. Hash
will result in the same name if it is unchanged.

------
0x0
What happens if a random wordpress blog's frontpage (/) is compromised and has
malware injected, setting the immutable keyword? Cloudflare and letsencrypt
means most sites will be https sooner than later, so the https part will be
"taken care of". (At least that's better than nothing; imagine the power
granted to captive wifi portals if not!)

~~~
JoshTriplett
What happens if a random site gets compromised and serves an HPKP header
pinning a bad key for ten years?

~~~
lstamour
There's max-age support, the ability to preload pins in the browser, and
certificate transparency to work around this, see section 4.5:
[https://tools.ietf.org/html/rfc7469#page-21](https://tools.ietf.org/html/rfc7469#page-21)

As to this original point, it would be best if this didn't apply to the
address bar URL / main document request. But it's a good point, worth
considering. Perhaps the UA should set a timer, and two or three refreshes in
a row would be the equivalent to the prior refresh behaviour.

------
kazinator
"Immutable" with "max age" is an oxymoron. If it expires it isn't immutable.
Use another word.

What you need is an absolute date and time in the cache header which says "we
promise this page does not change before this date and time". This could be
treated as a "lease" and automatically extended in some configurable
intervals. For instance, if it is 30 days, then the file is good for 30 days
since its modification time stamp. When that time passes, this is renewed
automatically: it is now good for 60 days since its modification time stamp.
Basically, it is always good for N*30 days since its modification time stamp,
where N is the smallest N required for that time to be in the future.

When the webmaster publishes a new version of the file, he or she knows
precisely when browsers who have seen the cached the previous version will
start picking up the new one. Changes can be co-ordinated with the expiry time
to minimize the refresh lag: the time between when the earliest new client
sees the new page, and the last old client stop seeing the old one. If we know
that a page expires for everyone on June 1, 2016 at noon, we can update that
page in the morning on June 1. By afternoon, everyone sees the new one.

~~~
jakub_g
Yes I think it would still be a good idea to require expiry date for
"immutable" content, just as a safety net if something is misconfigured
somewhere etc - then when you fix a bug, you will know the precise time at
which it will be gone for everyone (hopefully the expiry date was not set to
10 years).

However I wonder what is the typical cache lifetime of resources on current
web. IIRC someone on HN posted like a week ago that from their study, it's
rather short - stuff is evicted from cache quite rapidly if not used. So fast
that getting a cache hit for jquery from CDN is quite unlikely very often.

------
rwmj
Half-related to this, we need content-addressable web proxies:

[https://rwmj.wordpress.com/2013/09/09/half-baked-idea-
conten...](https://rwmj.wordpress.com/2013/09/09/half-baked-idea-content-
addressable-web-proxy/#content)

With these, you fetch data by its hash. You provide a primary URL where the
item is known to exist, but the browser is free to fetch the data from any
proxy (or local cache) with the same hash.

This could replace package mirroring, git clones, parts of bittorrent, CDNs
and more.

It does assume that you use a hash with enough bits that collisions are
extremely unlikely, and also that your hash is cryptographically strong (else
a rogue proxy can inject data).

~~~
ianopolous
Exactly! IPFS already has a web proxy to their content addressed network. (e.g
[https://ipfs.io/ipfs/QmXZnH2WVmFoiE7tRJQk9QstLGhSKpVyEQ4Rywx...](https://ipfs.io/ipfs/QmXZnH2WVmFoiE7tRJQk9QstLGhSKpVyEQ4RywxqJKkG5A/cap.png)
) And hopefully browsers will learn to speak the protocol natively so then
there's no need for a http proxy at all.

------
ars
I've learned to press Enter on the URL line instead of reload for exactly this
reason.

~~~
combatentropy

      > I've learned to press Enter on the URL line
      > instead of reload for exactly this reason.
    

Yes, this is what I do too.

The browser can do one of three things:

1\. Serve the file from cache. This is what happens when you put the cursor in
the address bar and hit Enter. Well, at least for ancillary files. It likely
still will ask whether the main HTML document has changed. But it will load
CSS and JavaScript files from its local cache --- if the webmaster properly
set the HTTP headers, like Expires, to tell the browser that it can cache the
files.

2\. Ask the server whether the file has changed. This is what happens when you
click the Reload button. This is the area of dispute. The article is saying it
would probably be better if the browser acted just like it does when you put
the cursor in the address bar and hit Enter. Instead the browser seems to
check not only the main document file but also every single CSS, JavaScript,
image, whatever, file. It doesn't redownload them all, but it sends an If-
Modified-Since header, to ask whether they have changed, and then requests the
whole file only for ones that the server says have changed. The payload back
and forth is usually just a few hundred bytes for files that have not changed.
But the network requests take a noticeable slice of time, because it's one
request per file.

3\. Ask the server for the whole file, regardless. This is usually when you
hit Shift and Reload.

------
eloff
I apologize if someone already mentioned this, but there's a way to eliminate
the penalty to the user without changing HTTP at all. Browsers can simply
check all the non-expired resources _after_ all the rest. Now the latency is
the same, but we still do the checks, just after everything else and we're
already rendering the page. Only if one of those resources did actually change
then we re-render the page.

The immutable solution is cleaner, and doesn't load the server as much, but
it's not backward compatible and requires the people who run the server to
know what they're doing. Maybe the two solutions could be combined?

The biggest potential drawback I see is that maybe most resources, including
the html don't expire, so every page will be rendered and re-rendered. Giving
little benefit and making the rendering choppier. Some of that maybe could be
mitigated by starting the rendering in the background and not displaying it
until a certain percentage of requests return, or special casing the "page"
itself as opposed to page resources.

~~~
mediumdeviation
What you're suggesting won't work. The problem is that a lot of pages require
their resources to be loaded in a specific order. The C in CSS stands for
cascading, and means that rules that are loaded later override earlier ones
(if selector specificity match). The same goes for JS since later scripts
might depend on the framework or libraries loaded in earlier, unless the
script has the async attribute. And then there are the content which are
loaded by the CSS and JS themselves, which in most modern web apps make up the
majority of the content.

~~~
eloff
Nonsense. You have the css and javascripts. You just aren't certain that
they're the most recent version. So you go ahead and render the page using
either the version you have, or requesting it from the server, still doing
everything in order. Then you validate your assumptions in the background
about the stale versions you used still being the most current. If your
assumptions are right (and mostly they will be) nothing happens. If they're
wrong, you re-render the page, again all in order.

~~~
jasonkester
Sense. It makes a difference if you run an old javascript file before loading
its new replacement and running that. As in,

Original script:

    
    
      <script>
        location.href="http://google.com";
      </script>
    
    

New script

    
    
      <script>
        // oops!  forgot to remove this
        // location.href="http://google.com";
      </script>
    
    

You'll want to make sure you have the right version in place before you try to
run either of them.

~~~
eloff
You have a point. What about CSS? At first I think it would just render funny,
but some javascript actually interacts with the CSS, e.g. jQuery selectors
based on style classes or something. So it has the same problem really.

Images would be safe.

Or an alternate plan, start rendering, but do not run the javascript until the
resource checks for css+js files return. It would slow things down some more,
but not as much as waiting for everything.

~~~
jasonkester
Before:

    
    
      body
      {
        background:blue;
      }
    
    

After:

    
    
      body {}
    
    

It would play out similar to the Flash of Unstyled Content issue, but
substituting "unpredictably wacky" for "unstyled".

------
brianwawok
Is there a danger here of getting a corrupt resource, then no matter how many
times you mash reload it never gets fixed? What do we have to stop this... I
don't think CSS files have a SHA checksum header by default do they?

~~~
detaro
from the article:

 _Correcting possible corruption (e.g. shift reload in Firefox) never uses
conditional revalidation and still makes sense to do with immutable objects if
you 're concerned they are corrupted._

Also, there is Subresource Integrity, which adds a hash to the including tags,
and if it is integrated correctly with the caching logic could catch this:
[https://developer.mozilla.org/en-
US/docs/Web/Security/Subres...](https://developer.mozilla.org/en-
US/docs/Web/Security/Subresource_Integrity)

~~~
Matthias247
How do I do shift reload on my smartphone browser? And how do I even know what
shift reload is (most people won't) or that the site is corrupted (it still
shows something - how do I know that it isn't the most recent stuff)?

And for my general understanding of this proposal: Even if the current domain
owner might guarantee that the content never changes - the domain can switch
to another site which might use the pathes but of course wants to put
different contents there. Is this somehow covered?

~~~
manigandham
You can just empty your cache completely - not ideal but still easily done on
mobile.

Your 2nd question is confusing - are you asking what happens if you have the
same exact path but from another domain? Then it depends on what that server
responds with. This is just a HTTP response header, nothing more.

~~~
Matthias247
Yeah, I could clean the cache. But most people won't know how - and what a
cache even is.

The second question was about that I expected that the new server won't even
get queried if the immutable caching policy from the old server prevented
this. And so it doesn't have a chance to signal that it's content changed.

~~~
tobz
The potential for a domain to transfer ownership and still use the same paths,
yet have different content seems incredibly unlikely. Like, it feels like you
were trying to come up with potential issues for the sake of finding a way to
say "see, this won't work!" :P

The biggest reason to use this is for versioned resources. Things that will
_never_ change. Say I create a minified JavaScript file. Its MD5 hash is
123456789abcdef...., and so in the output file, the filename is
"foo.123456789abcdef.js". If the file changes, the hash changes. If I request
the version of the file with "123456789abcdef", I should get that one.
Ignoring the unlikely potential for hash collisions, everything in this
scenario is working as intended. There is no conceivable reason to ever want
to change the content while keeping the same hash.

Now, let's say that file, somehow, gets corrupted AND cached in your browser.
I can't say I've ever seen something like this happen, but I suppose it's
possible? I'd be very interested to hear if something like this is possible,
to be honest. It seems like between TCP retransmissions and Content-Length,
you would need some sort of subtle corruption that flips a bit and isn't
corrected?

EDIT: As Klathmon points out, Subresource Integrity is probably a better
solution to "corrupted file in cache" scenarios. As it stands, if a file was
corrupted on disk, let's say, but the ETag and/or the Last-Modified values
were accurate, the origin would only ever respond back saying "nope, no
changes! you seem to have the latest copy" and you'd still be stuck with the
corrupted file. Only a hard reload/cache clear solves that.

~~~
Matthias247
I don't want to come up with potential issues just for the sake of preventing
this. But it's my job as an engineer to think about all potential issues and
to avoid them as long as possible. And I'm not directly involved in this topic
here or in the web in the large, but I just read this and have wondered if
this is fully thought through or not and asked therefore.

Of course domain changes are unlikely. But nevertheless they are possible in
our system and we have to cope with it. I just googled subresource integrity
and it doesn't seem like an appropriate solution for this scenario. This would
mean a new domain owner would need to generate those for ALL his links - just
to be sure that the previous site didn't mess anything up. This means at first
extra work and second you wouldn't even know for how long you need this (until
all previous users have visited the new site).

There would be even possibilites for major annoyancies, if a previous site
owner put that feature on things like index.html before owner change - just to
avoid that visitors see the new page as long as possible.

~~~
tobz
I mean, you could say the same thing about HSTS and key pinning. Domain
changes hands, but "oops", HSTS was set and the old keys were pinned.

Is that actually a problem? No, it's not. Similarly, as the owner of a new
domain, why would I want the _old_ content? The only reason I can think of is
that I brought a company outright, or something. In that situation, if I don't
want to change the content, everything still works. If I want to change it,
and they did something stupid -- like unversioned paths using this proposed
flag -- then yeah, I'm in a weird spot. That seems like the most trivial and
unlikely of scenarios, though. It requires such a complex chain of events to
occur.

I think it's safe to say that malicious usage of the flag is entirely out of
scope when considering the validity of it, again, because it requires a
contrived situation.

------
mnarayan01
I'm pretty sure that in the past at least, Firefox would skip the request even
on a refresh for resources that had the appropriate Cache-Control headers and
which that did not have any of the various Conditional-Get related headers:
e.g. Etag, Date, etc. Did this change?

------
zedr
> Facebook, like many sites, uses versioned URLs - these URLs are never
> updated to have different content and instead the site changes the
> subresource URL itself when the content changes. This is a common design
> pattern...

Why not use the ETag header instead?
[https://www.w3.org/Provider/Style/URI.html](https://www.w3.org/Provider/Style/URI.html)

------
return0
What about overriding the reload page event with javascript?

~~~
duskwuff
There is no "reload page event". Even if there were, it wouldn't be
overridable.

~~~
return0
What if they made it so?

------
ams6110
TLDR, our web pages are so bloated we want a new HTTP standard to deal with
it.

