
Using Immutable Caching to Speed Up the Web - discreditable
https://hacks.mozilla.org/2017/01/using-immutable-caching-to-speed-up-the-web/
======
achairapart
Maybe it's time for browsers to go beyond the cache concept and implement a
common standard package manager. Download once, stay forever. True immutable.

As developers, we try everyday to squeeze till the last byte and optimize
things. We all know how performance is important.

So why download for every website the same asset: React, jQuery, libraries,
CSS utils, you-name-it? What a waste!

~~~
benjaminjackman
We don't need to have anything as complex as package manager. It would be much
easier to just link to these libraries (and any other resource) by the hash of
their content. If you do that you would just have to link to resources by
their content hash.

I'm not really sure why this isn't already in place.

edit: The reason I'm not sure why because it sure seems to me that multiple
threads on the post are all suggesting basically the same simple idea, the
ability to serve a file by it's hash (either just by it's hash alone, or by a
url + the hash). Personally, I think whatever form these url's take i think it
ought to be backward compatible which I think is possible.

~~~
Perseids
> I'm not really sure why this isn't already in place.

Everything we need _is_ already in place, except for a tweak in the caching
strategy of the browsers[1]. With Subresource Integrity [2] you provide a
cryptographic hash for the file you include, e.g.

    
    
      <script src="https://example.com/example-framework.js"
              integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
              crossorigin="anonymous"></script>
    

As it is, browsers first download the file and then verify it. But you can
also switch this around and build a content addressable cache in the browser
where it retrieves files by their hash and only issue a request to the network
as a fallback option, should the file not already be in cache. Combine this
with a CDN which also serves their files via
[https://domain.com/$hash.js](https://domain.com/$hash.js) [3] and you have
everything you need for a pretty nice browserify alternative, without any new
web standardization necessary.

[1] And lot's of optimization to minimize cache eviction and handle privacy
concerns, but that are different questions.

[2] [https://developer.mozilla.org/en-
US/docs/Web/Security/Subres...](https://developer.mozilla.org/en-
US/docs/Web/Security/Subresource_Integrity)

[3] Imagine if some CDN would work together with NPM, so every package in NPM
would already be present in the CDN.

~~~
bugmen0t
Folks in W3C webappsec are interested, but the cross-origin security problems
are hard. We'd love feedback from developers as to what is still useful
without breaking the web. Read this doc and reach out!
[https://hillbrad.github.io/sri-addressable-caching/sri-
addre...](https://hillbrad.github.io/sri-addressable-caching/sri-addressable-
caching.html)

~~~
EE84M3i
These problems are really hairy. Thankfully, all the privacy issues are only
one-bit leakages (and there are TONS of one-bit leakages in web browsers), but
the CSP bypass with SRI attack is really cool.

One thing that I've found incredibly disappointing about SRI is that it
requires CORS. There's some more information here:
[https://github.com/w3c/webappsec/issues/418](https://github.com/w3c/webappsec/issues/418)
but it essentially means that you can't SRI-pin content on a
sketchy/untrustworthy CDN without them putting in work to enable CORS (which,
if they're sketchy and untrustworthy, they probably won't do).

The attack that the authors lay out for SRI requiring CORS is legitimate, but
incredibly silly - a site could use SRI as an oracle to check the hash value
of cross-domain content. You could theoretically use this to brute force
secrets on pages, but this is kind of silly because SRI only works with CSS
and JavaScript anyway.

~~~
bugmen0t
I, as someone who worked on the SRI spec find this incredibly disappointing as
well. We've tried to reduce this to "must be publicly cachable", but attacks
have proven us wrong.

And unfortunately, there are too many hosts that make the attack you mention
_credibly_ silly:

It is not uncommon that the JavaScript served by home routers contains
dynamically inserted credentials. And the JSON response from your API is valid
JavaScript.

------
btilly
What I really want is the exact opposite. I'd like to see a flush-before
header to have a particular web page NOT pull older static resources.

The reason is simple. Websites have lots of static content that seldom
changes. But you don't know in advance when it is going to change. However
after the fact you know that it did. So you either set long expiry times and
deal with weird behavior and obscure bugs after a website update, or set short
ones and generate extra load and slowness for yourself.

Instead I'd like to have the main request send information that it does not
want static resources older than a certain age. That header can be set on the
server to the last time you did a code release, and a wide variety of problems
involving new code and stale JS, CSS, etc go away.

~~~
rakoo
Then you should do what Facebook does
([https://code.facebook.com/posts/557147474482256/this-
browser...](https://code.facebook.com/posts/557147474482256/this-browser-
tweak-saved-60-of-requests-to-facebook)): content-addressed resources. Have a
resource be available not at /resources/something.js, but at
/resources/<sha1(something.js)>; whenever there is a new version of
something.js, the sha1 changes and is put in your root document, and the
browser will never again try to reach the older version, which you know is
invalid now.

~~~
jjoe
But what happens when you need to cache the actual page not just static
resources? That's what the content-addressed resources trick doesn't address.

~~~
cortesoft
You cache the actual page with a much shorter cache timeout.

------
zokier
I'd be very wary of using any HTTP headers with permanent effects. They seem
like a way to get easily burned by accident. For immutable caching in
particular, I'd probably try to utilize some variation of content-based
addressing, eg having the url have the hash of the content.

See also: [http://jacquesmattheij.com/301-redirects-a-dangerous-one-
way...](http://jacquesmattheij.com/301-redirects-a-dangerous-one-way-street)
and the related HN thread with good discussion

~~~
mmastrac
It's not permanent if you don't make it permanent - like the other cache-
control headers, you can specify the amount of time that a resource is
considered immutable. All this does is change the time period in which the
browser avoids sending 304s.

~~~
Confiks
This might be considered similar to the DoS vector that was also present in
the HTTP Public Key Pinning proposal [1]; an adversary seizing temporary
control over the server can convince clients to not request a response from
the server for a certain period, or in the case of HPKP request a response
that cannot possibly be fulfilled.

Its seems much more mundane in this Cache-Control case, but it is the same
vector nevertheless.

[1] [https://blog.qualys.com/ssllabs/2016/09/06/is-http-public-
ke...](https://blog.qualys.com/ssllabs/2016/09/06/is-http-public-key-pinning-
dead)

------
georgeaf99
The concept of immutable assets and links is at the core of IPFS, a
distributed alternative to HTTP. Since Firefox inplements the concept of
immutable assets now, it would be totally reasonable to load these assets in
the browser peer to peer (see WebRTC and webtorrent). I think this would be a
great way to retrofit some decentralization into webpages!

~~~
lclarkmichalek
I'm not sure it would be reasonable. To do that without deanonymising people
would be an awful lot of work, and at the end if it lead to faster results,
that would seem to be ripe for timing attacks to I.e. Figure out if anyone is
browsing stonewall.org on your LAN

~~~
ianopolous
OpenBazaar are using IPFS over a Tor transport.

You could have a fast lane direct from the (http) server, then a slow P2P lane
over Tor, both using the same content addressed protocol, and caching the
results locally for instant reloads.

------
whack
Sorry if this is a dumb question. How is immutable caching any different from
cache-control headers with a max age of 100 years?

~~~
discreditable
When you click refresh, your browser will revalidate those 100yr lifespan
resources. If they are immutable, it won't revalidate.

~~~
whack
According to this link, the browser won't perform any revalidation for the
duration of the max-age?

[https://developers.google.com/web/fundamentals/performance/o...](https://developers.google.com/web/fundamentals/performance/optimizing-
content-efficiency/http-caching#max-age)

 _However, what if you want to update or invalidate a cached response? For
example, suppose you 've told your visitors to cache a CSS stylesheet for up
to 24 hours (max-age=86400), but your designer has just committed an update
that you'd like to make available to all users. How do you notify all the
visitors who have what is now a "stale" cached copy of your CSS to update
their caches? You can't, at least not without changing the URL of the
resource.

After the browser caches the response, the cached version is used until it's
no longer fresh, as determined by max-age or expires, or until it is evicted
from cache for some other reason— for example, the user clearing their browser
cache. As a result, different users might end up using different versions of
the file when the page is constructed: users who just fetched the resource use
the new version, while users who cached an earlier (but still valid) copy use
an older version of its response._

~~~
tempestn
That's all true, but it applies to regular page loads, not the case where the
user explicitly presses the reload button.

------
grizzles
As an engineer, I've always resented the unnecessary time I spend waiting for
data in web browsers, so the bad state of caching in web browsers is an issue
that's been on my mind for a long time.

There are a few ways to improve things:

1\. Predictive modelling user resource demand in the browser (eg. preloading
data) Very very easy to do nowadays with great accuracy.

2\. Better cache control / eviction algorithms to keep the cache ultra hot.

3\. (This). Immutable caching is one of the major ways we could improve
things. I'm not a fan of the parent articles' way of doing it though, because
if widely implemented it will break the web in subtle ways, especially for
small companies and people that don't have Facebooks resources and engineering
talent. It doesn't take into account usability issues and therefore leaves too
much room for user error.

I've written up a very simple 11 line spec here that addresses this issue.
[https://gist.github.com/ericbets/5c1569856c2ad050771ec0c866f...](https://gist.github.com/ericbets/5c1569856c2ad050771ec0c866f531f3)

I'll throw out a challenge to HN. If someone here knows chromium internals
well enough to expose the cache and assetloader to electron, within a few
months I'll release an open source ML powered browser that speeds up their web
browsing experience by something between 10X-100X. Because I feel like this
should have been part of the web 10 years ago.

~~~
brlewis
The size of your URLs will be an issue for pages with lots of assets. So much
so that I bet if everything's warm in the cache except the page itself, then
this proposal would be slower.

I'd rather see stale-while-revalidate implemented in browsers.

~~~
grizzles
Publishers could simply use regular urls for their tiny files.

~~~
brlewis
If everything's warm in the cache they're effectively all tiny files.

------
nicolaslem
What is the difference between an immutable resource and setting the resource
to expire in 10 years?

Many websites already do that, they change the URL each time the content
changes.

~~~
Klathmon
I'm not sure if this is in-use anywhere, but it may lead to better "cache
eviction" policies. Saying something is immutable has different implications
than saying "I want it cached for 10 years".

The former would allow a browser to apply some heuristics and say for example
evict something from the cache when it hasn't been referenced on a page for
the last 5 times you went there (the assumption being that it was changed and
the new resource is taking over in it's place).

That can lead to things being held in the cache for longer without having to
use a straight "oldest evicted first" kind of method which can lead to
thrashing if the cache size is too small or some resources are too big.

~~~
jonknee
> Saying something is immutable has different implications than saying "I want
> it cached for 10 years".

Only if you have a system that you keep for more than 10 years. If a browser
dumps stuff that it was told it can keep for a decade that means it can also
dump stuff that it was told will never change. If the cache is full of things
that are immutable something still has to go.

~~~
Klathmon
Oh definitely, I meant more in the sense of being able to evict things more
"intelligently".

If stuff needs to go, it needs to go. It's going to happen. But if you have to
choose between a file which was told to be cached for 10 years, and one which
was labeled "immutable" but hasn't been referenced on the page it was
previously for the last 10 loads, it might be a better choice to evict the
immutable one.

~~~
jacobsenscott
So I can build a site that stuffs your cache full of "immutable" content, and
all your frequently used "cache for 10 years" stuff gets evicted?

~~~
Klathmon
You can already do that...

Generally browser caches are "last used first out". So if you start stuffing
the cache with a bunch of stuff you can already push the oldest out. And if
you fill the whole thing you'll remove everything else.

It was actually quite a problem on mobile browsers for a while. IIRC iOS had
something like a 10MB cache for the longest time. A few heavy websites and
your whole cache would cycle through. Android had a 5MB cache at one point as
well! A shitty news site can already fill that single handedly.

But my idea here (which I gave about 5 minutes of thought...) Was that
immutable content would be evicted _before_ the "10 year" stuff. The idea
being that if something replaced an immutable "thing" it could be evicted much
sooner.

I'm not sure if it would really help at all, just kind of throwing out some
ideas.

------
roddux
This is related to the Chrome caching update; as discussed here:
[https://news.ycombinator.com/item?id=13492483](https://news.ycombinator.com/item?id=13492483)

Two wholly different strategies, which has ultimately split how the browsers
handle caching.

------
mixedbit
I just realized that until http is completely replaced with https, private
mode should be always used to browse the Internet on a not trusted wifi
network. Otherwise malicious content injected on such a network can be cached
and reused by the browser forever.

~~~
re
> In Firefox, immutable is only honored on [https://](https://) transactions.

[https://developer.mozilla.org/en-
US/docs/Web/HTTP/Headers/Ca...](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Headers/Cache-Control)

> Clients should ignore immutable for resources that are not part of a secure
> context

[https://tools.ietf.org/html/draft-mcmanus-
immutable-00](https://tools.ietf.org/html/draft-mcmanus-immutable-00)

------
jjoe
With WiFi hotspots dropping connections more often than not, how many people
would know they need to CTRL-F5 to "fix" a broken page/image/JS/CSS?

I just hope the draft as-is expires and never makes it to an RFC.

~~~
twoodfin
Could you clarify your objection? Presumably clients would never immutably
cache a resource they couldn't validate.

~~~
jjoe
How does validation work in the case of a 200 status and no Content-Length
header?

~~~
scrollaway
I'd also like to know. I hope subresource integrity can be implemented
alongside the cache control and prevent caching a bad file.

[https://developer.mozilla.org/en-
US/docs/Web/Security/Subres...](https://developer.mozilla.org/en-
US/docs/Web/Security/Subresource_Integrity)

------
romaniv
HTTP caching is a mess. I wonder why no one proposed a properly redesigned and
negotiable protocol that covers all the edge cases. (And maybe supports
partial caching/partial re-validation of pages.)

~~~
bhldr
On my phone, so unfortunately no reference, but there is a HTTP 2 spec
underway that allows a client to send a cache manifest frame. A server can
then push the resources that are newer. Pretty much exactly what's needed.

~~~
romaniv
Unless we're talking about different things, cache manifest is an HTML 5
feature designed to enable websites to work offline. That's quite different
from HTTP-level caching, which would be applicable to any files/resources and
designed primarily with performance and bandwidth savings in mind. I might be
unaware of some relevant HTTP 2 features, though.

~~~
bhldr
Definitely a different thing. This is a HTTP 2 frame, sent by a client

~~~
romaniv
Could you post a link to the relevant part of the spec or some article dealing
with this feature?

------
Udo
What happens if the immutable file is borked in transfer, leading to a partial
file sitting in cache? Will this lead to a new class of problem that can only
be solved by nuking the entire browser cache? It seems to me the way to do
this right would have included a checksum of the content.

~~~
danarmak
A hard refresh (Ctrl+F5) of the page will refresh immutable resources too, so
you won't need to clear the whole browser cache.

------
mnarayan01
I wonder if this will bring back the "hard refresh".

~~~
cpeterso
I'm curious, too. The blog post says refreshing the page won't revalidate the
immutable resources, but doesn't say what happens for a CTRL+SHIFT+R hard
refresh.

The Firefox bug mention hard refresh, but don't say what was implemented:

[https://bugzilla.mozilla.org/show_bug.cgi?id=1267474](https://bugzilla.mozilla.org/show_bug.cgi?id=1267474)

~~~
cpeterso
Patrick McManus, the Firefox developer of this feature, confirmed that hard
reload will load immutable resources from scratch, so the user always has a
nuclear option for fixing cache corruption. :)

------
niftich
What a mess, but perhaps a happy ending. I made two other comments prior to
this one in this thread, but then I read the Bugzilla thread [1] opened by
Facebook that laid out the issue and Mozilla's defense. It's a _highly_
enlightening read; I can't recommend it enough.

To summarize, the issue is that Facebook was seeing a higher rate of cache
validation requests than they'd expect, and looked into it. Chrome produced an
updated chart documenting different refresh behaviors [2], which is the
spiritual successor of this now-outdated stackoverflow answer from 2010 [3],
and in response to Facebook's requests, and have re-evaluated some of their
refresh logic.

In this thread, Firefox was being asked to do the same, but they pushed back
on adding yet another heuristic and in turn proposed a cache-control
extension. Meanwhile, Facebook proposed the same thing on the IETF httpbis
list, where the response not enthusiastic [4], largely feeling that that this
is metadata about the content and not a prescriptive cache behavior, and that
the HTTP spec already accounted for freshness with age. One of Mark
Nottingham's responses [5]:

 _(...) From time to time, we 've had people ask for "Cache-Control: Infinity-
I-really-will-never-change-this." I suspect that often they don't understand
how caches work, and that assigning a one-year lifetime is more than adequate
for this purpose, but nevertheless, we could define that so that it worked and
gave you the semantics you want too.

To keep it backwards compatible, you'd need something like:

Cache-Control: max-age=31536000, static

(or whatever we call it)_

[1]
[https://bugzilla.mozilla.org/show_bug.cgi?id=1267474](https://bugzilla.mozilla.org/show_bug.cgi?id=1267474)
[2]
[https://docs.google.com/document/d/1vwx8WiUASKyC2I-j2smNhaJa...](https://docs.google.com/document/d/1vwx8WiUASKyC2I-j2smNhaJaQQhcWREh7PC3HiIAQCo/edit)
[3] [http://stackoverflow.com/questions/385367/what-requests-
do-b...](http://stackoverflow.com/questions/385367/what-requests-do-
browsers-f5-and-ctrl-f5-refreshes-generate) [4] [https://www.ietf.org/mail-
archive/web/httpbisa/current/msg25...](https://www.ietf.org/mail-
archive/web/httpbisa/current/msg25463.html) [5] [https://www.ietf.org/mail-
archive/web/httpbisa/current/msg25...](https://www.ietf.org/mail-
archive/web/httpbisa/current/msg25505.html)

------
anon1253
We should really try to integrate something like
[https://ipfs.io/](https://ipfs.io/) in browsers

------
hpagey
From what I understand, the browsers don't really adhere to far-future expiry
headers when the user manually reloads a page, instead they re-request every
resource to see if it's really not expired. For resources that the server
flags as immutable (upon first request), the browser won't make further
requests, but instead instantly reloads the elements out of its local cache.

------
Animats
This should be done using subresource integrity. Then, you _know_ it hasn't
changed. There should be some convention for encoding the hash into the URL,
so that any later change to an "immutable" resource will be detected.

With subresource integrity hashes, you don't have to encrypt public content.
Less time wasted in TLS handshakes.

------
nachtigall
The last image about Squid proxy is included too small in the post, here it is
in a readable way:
[https://hacks.mozilla.org/files/2016/12/sq.png](https://hacks.mozilla.org/files/2016/12/sq.png)

------
EdSharkey
A very creditable action, cheers to Mozilla!

This increases the democratization of the web and allows small fries to have a
disproportionately larger footprint.

------
z3t4
How do you update the URL and everywhere its used once an asset (image or
script) is changed ? Automatic script or manually ?

------
stockkid
> The page’s javascript, fonts, and stylesheets do not change between reloads

So is this like Rails' Turbolink but built into the browser?

------
spectrum1234
Wow I'm out of touch with front end dev. Someone please explain to me why this
hasn't been standard already.

------
kingkool68
What's the difference between Cache-Control: Immutable and setting expires
headers really far in to the future?

~~~
imaginenore
When you do a hard refresh (Ctrl + F5 / Apple + R), your cache gets
invalidated. The immutable parts don't need to be reloaded, because they are
immutable.

~~~
cakoose
Oh, interesting. Is that the only difference?

How often people actually do a hard refresh? I would guess most people don't
know about that key combination.

