And speaking of privacy... if everyone across the web is loading resources from one CDN, that seems like an interesting stream of data for that CDN.
It doesn't help that relative to everything else the churn in websites is immense, making the chance you'll have to pull in things more likely. And relative to everything is quite a statement, as churn in software is pervasive.
EDIT: that is, I'm just complaining, not claiming the status quo (or what was before) was better, obviously.
The other side was that people notice slow performance more than fast, and the failure modes were always worse than the savings when some fraction of connections would take, say, 2 seconds to connect to Google’s CDN even though their time to yours was much better. You don’t have an easy option for those slow clients hitting your property but you can at least reduce the number of dependencies to that one service.
For example, <script src="/jquery-3.4.1.min.js" try-shared="https://code.jquery.com/jquery-3.4.1.min.js" try-shared="another-src">
@src can be locally hosted. If it's not in cache, the browser can try each @try-shared attr (without loading the resource from CDN). If no match, the browser downloads @src from your own domain.
Of course, this doesn't solve the Shared Cache issue raised by the article. Suppose the only way to solve that would require adding resources to the shared cache explicitly. The most effective way (I assume) would be a header provided by the CDN of a shared resource, eg, X-Shared-Cache: true, that a browser would recognize... Then @src/@try-shared could still get the benefits of the shared cache and developers don't have to worry about it.
Note the proposed scheme doesn't prevent anyone from using a CDN as the src while also trying other CDNs to increase the chances of a cache hit, which has its own benefits.
For example, <script src="https://code.jquery.com/jquery-3.4.1.min.js" try-shared="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.mi... try-shared="another-src">
That way, you can safely leverage any version regardless of its downloaded location.
Content digests are already used in the `integrity` HTML attribute; it could be used for a cache key too.
It still has the version fragmentation problem, but you don't have to worry about picking a popular CDN.
And if big brother isn't balls deep in CloudFlare I'll eat my hat.
For the vast majority of people, the negative effects of Google tracking them is probably more concerning than the government tracking them.
For users on the other hand...
Maybe I'm missing something, but the obvious solution to me would be more cache-control headers.
The only notable case where shared cache is useful are resources on public CDNs hosting libraries and other common resources. These could just send a "cache-control: shared" header, or "cache-sharing: true" if adding new values to existing headers breaks too many existing implementations. This puts them in a shared cache, everything else gets a segmented cache.
There is some potential for leakage with uncommon assets. Maybe only a handful of websites use JQuery 1.2.65 or Helvetiroma Slab in font weight 100. It's a less severe vector than just testing if someforum.example/admin.css is cached, but still it's leaking data. The CDN could mitigate that by only sending a cache-sharable header on sufficiently popular assets, but depending on others going out of their way to preserve privacy is probably a bad idea.
Your webmail provider has a search box, and the content that is returned is styled with Roboto. If the search finds nothing, then Roboto isn't loaded. The attacker forces Rotobo out of the cache with a specially formatted fetch() request, then loads an iframe of the search. Then the attacker checks if Roboto is in the cache or not. This allows the attacker to essentially read your email inbox.
<script shared src="//:jquery.com/jquery.js"></script>
But I suspect this will be unnecessary; even the bandwidth-constrained use case is getting to be more bandwidth every year.
In my experience, websites are piling on bullshit faster than my mobile internet is getting faster.
The threat is that when you navigate to creepy website, it loads some library and tracks the timing. They use that to infer that you've accessed some resource from a sensitive site.
None of the workarounds with extra attributes are going to help, because they rely on the web developer to
1. know about the attack
2. know that some library or asset is a realistic candidate for the attack, and take appropriate action.
Neither one is that realistic. We developers are just too lazy to get stuff like that right, even if we know about it. Cargo culting is the rule.
As for the effects, I suspect this will have a modest effect on the average website. The sources I've encountered seem to cast doubt on the effectiveness of share cache (https://justinblank.com/notebooks/browsercacheeffectiveness....). I poked around the mod pagespeed docs and project, and couldn't find any indication of how they'd measured impacts when they implemented the canonicalization feature.
I wonder if you'll see a big impact on companies like Squarespace and Wix, where there are a lot of custom domains that are all built using the same stack.
One way is for the requester to specify if the asset is shared. A new 'shared' attribute on html tags and XMLHttpRequest would do this. Browsers enforce cache isolation _unless_ the shared attribute is set, in which case it comes from a 'shared' cache.
So if the attacker requests a www.forum.example/moderators/header.css from the _shared_ cache, but the forum software itself didn't specify it was shared so it never got loaded into the shared cache, then nothing is leaked.
And as it would only make sense to opt to share stuff like jquery.js from a CDN, the forum wouldn't naturally share that css file and so on.
The other approach is for the response to specify sharing, e.g. new cache control headers. Only the big CDNs would bother to return these new headers, and most programmers wouldn't have to change anything to regain the speedup they just lost from going to isolated caches once the CDNs catch up and return the header.
In either case, sharing can _still_ be an information channel if the shared resource is sufficiently rare e.g. the forum admin page is pretty much the only software stuck on version x.y of lib z. The attacker can see if its in the cache, and infer if the victim is a logged-in admin or not. Etc.
A long time ago PHK wrote some very salient comments about HTTP 2.0 efforts https://varnish-cache.org/docs/trunk/phk/http20.html https://queue.acm.org/detail.cfm?id=2716278 etc. He puts forward the case for a browser-picked client-session-id instead of a server-supplied cookie.
It's not that the developer is the enemy.
Pretend I create a website called "Democratic Underground: how to foster democracy under a repressive regime." I'm naive, or I want it to load quickly, or I accidentally include a framework that is either of those two -- some library versions are cached.
Now, the EvilGov includes cache-detection scripting on its "pay your taxes here" webpage. Despite my salutatory goals, shared caching leaks to the government some subset of my readers.
The browsers have always allowed cross domain requests which have been tolerated until now but involved all of us being aware of XSS and CSRF issues, or suffering the consequences.
Removing shared cache is the beginning of the end for cross domain requests by default. The other obvious use these days is ad networks, but they also get used for integrations like SSO and shared services like Apple Pay and presumably PayPal? And other collaborations between companies.
But those could also be opt in.
a) the origin sharing the resource must place a .well_known/static_resource file in place.
b) The presence of .well_known/static_resource prevents any request on this origin to send cookies, and any set-cookie header is ignored.
c) The document that includes the resource on this sharing origin must use subresource integrity attributes when loading the shared resource.
d) the resource cannot be cached unless the cache-control header is public and has a lifetime of at least 1 hour.
This guarantees that the resource is always requested cookieless, and that the resource can't vary per request, otherwise the subresource integrity check would fail.
> "early experimental results in canary/dev channels show the cache hit rate drops by about 4% but changes to first contentful paint aren’t statistically significant and the overall fraction of bytes loaded from the cache only drops from 39.1% to 37.8%."
What about exceptions for loading common JS libraries from a shared CDN? I'm looking at the Google Chrome design doc and don't see how one gets around this. Maybe I'm just missing something, but if not it seems like they need to dig more into perf from the perspective of the slower end of the distribution, it could make a big difference.
Besides, Webpack and similar bundlers with tree-shaking abilities makes it practical to just load a subset of a large library.
And last (but certainly not least) there is the security angle. Imagine if someone managed to sneak malicious code on to CDNJS or Bootstrap CDN, how many nasty things they might be able to get up to, even if everyone remembered to set crossorigin="anonymous" on their shared assets.
The SRI spec github project has an issue for shared cache  that seems to be coming to the consensus that there will not be a shared cache for SRI:
> "it seems rather unlikely that we can ever consider a shared cache"
Of course you get a false delay on first load but still saves network bandwidth while still preventing information leakage.
It's an arms race where the browser would ultimately have to simulate every consequence of actually downloading every resource over the slowest link in the network. You're making the problem (and its solution) more complex but not completely solving it.
Although if the network hasn't changed much the true and simulated should be very similar where would you really know if its a real or simulated request.
If it saves network bandwidth, then you just have to measure the network bandwidth, like a speedtest page does. As Spectre and friends have shown, even the tiniest difference can be used for an information leak.
Consider, as an example, an HTTP resource that contains a text string representing the current time and which is updated once a minute. Its cache lifetime is set to 1 minute. A page fetches it at 9:01:30AM and gets the string "9:01AM". This goes into the cache. At 9:02:15AM (45 seconds later), an attacker loads it, you give the cached data which is still the string "9:01AM". To cross-check the data, the attacker hits another server (say, its own proxy that it runs, which fetches the resource and forwards it on), so it can tell that the data it should have gotten is "9:02AM".
In other words, if you give it stale data slowly, it might be able to detect the staleness instead of trying to detect the slow loading time.
Perhaps you could fix this by validating the freshness of the cache, using ETags or something. You'd hit the server, validate what's in your cache is fresh, and then still delay it, thus giving a more complete illusion to the attacker.
I'm not sure if HTTP allows a page to access the TTL of cached data, but if so, you might want to fake that too. If you give the real TTL numbers, then some of the time it's going to look like you just loaded it but it's about to expire.
Where they come apart, I think it’s common to say latency is more important than bandwidth. It certainly is to me, though if you’re on a metered connection, you could certainly view things differently.
It's behind a flag, browser.cache.cache_isolation: https://hg.mozilla.org/mozilla-central/file/tip/modules/libp...
Similarly Chrome has a bunch of feature flags, I'm not sure if they can be enabled from the UI: https://cs.chromium.org/chromium/src/net/base/features.h?typ...
(but I can't find this one yet)
Still can't find this option as a flag though, must be compile time only.
From what I can tell a.example.com, b.example.com, and example.com would all have their own caches, correct?
We have multiple (sub)domains a|b|c.xxx.example.com that share a template, and therefore resources (we're a .edu). If we're now looking at an initial load hit for all of them, that may impact how we've been setting up landing pages for campaigns.
I can't see us completely moving away from a CDN because of the other benefits they provide.
You would expect fewer requests to www.forum.example/moderators/private/ than to, for example, www.forum.example/public. If you look at caching from the server load angle vis-à-vis security, then it could be inexpensive to not cache www.forum.example/moderators/header.css so you would simply not allow browsers to cache this resource.
If site A thinks that allowing the user's browser to cache a certain resource puts them at a security risk, then this resource should be treated as not-public.
<script src='jquery.js' allowcache></script>
That way we can specifically say which items we're willing to share with other sites and which ones we want an independent copy of.
If your plan to fix this situation is "trust that developers are competent and benevolent", we can achieve the same result by not doing anything.
"allowcache" would cause developers to do something stupid like put it on all images, but "multisite-shared" may cause developers to make reasonable choices.
A change that makes small sites a bit slower to load things like font becomes another brick in the wall of walled gardens.
The other is people who want things to go faster and flip a lot of switches that sound fast without really understanding what they do, and then not turning off the useless ones because they're not doing any real benchmarking or real-world performance profiling. This group will get little or no benefit but open up security holes.
Given the declining usefulness of shared caching (faster connections, cheaper storage, explosion of libraries and versions), I expect the second group to be one or two orders of magnitude larger than the first.
> I can imagine a future where library payloads will increase significantly.
TBH I see the opposite; to use the focus of the article, jQuery was obviated by browser improvements, the pace of which is not really slowing down.
This sounds to me no different than a developer wanting to opt-out of memory protection, on the basis that it will be a little faster -- and my program doesn't have any bugs or viruses!
For a lib hosted on a CDN, who cares?! However, if someone wants to track if you've been to myservice.com, they could try and load myservice.com/logo.png - if it's from the cache, then bingo, you've been there. That's a leak.
Maybe I've misunderstood; could you explain your timing attack mechanism in more detail please?
That's the example used in the article (www.forum.example/moderators/header.css).
And maybe they made sense for things like JS in the 2000s but many super-cheap hosting providers provide unmetered bandwidth nowadays. (and OF COURSE the privacy/security things)
Browsers throttled the number of requests per domain because parallelism was expensive for the servers. Loading from another domain could happen simultaneously. If you had a fast internet connection you’d see a reduction in page load time. You’d also see that to a lesser extent if your connection was shared with others.
Exception - the analytics stuff I'm obliged to add.
If you'll never benefit from eg. a shared JQuery library on a CDN anyway, might as well include a (reduced) version of it in your bundle.
If you're using the same framework-x.y.z library for months at a time, but doing daily/weekly code changes and pushes, you're losing out on the cachability of the library.
But if your project is only being updated as frequently as the third party libraries it uses, maybe it makes sense.
Broswser vendors could choose to bundle some popular fonts and libraries but that comes with its' own set of problems.
Besides, this was only ever a concern for bad devs loading tons of tracking scripts and hacking together sites via copy-paste anyway. If you're really concerned about performance, you should be building, tree-shaking, and minifying all of your JS into a single file.
If the cache control headers says it expires in the future, the browser will not usually make any request, just load it from the disk. Hence a typical practice of setting a expiration date very far in the future, and just changing the URL when the resource is updated (thereby forcing the browser to request the new representation).
So self-host those JS files, and also use fewer of them if possible.
Basically the website says "I need com.google.angularjs:2.0.1" and browser grabs and caches the package for all future usages? It seems to work very well for Java... why hasn't there been any such initiative for the web?
But I guess it could work too
Correction: only moderators and anyone who visits the author's page which loads header.css for everybody. And any other page which is doing the same speculative probe.
We've known that cache can fingerprint forever. This change won't be that bad if it encourages greater adoption of web standards.
CDNS, which usually are the use case for global caches, are also kind of critical when it comes to the GDPR and other privacy laws.
Having no global cache may kill of the usefullness of CDNs (which is somewhat doubtfull given the number of stuff available).
But you are not allowed to use them anyway unless the site is plastered with some allow-all-the-things-popup.
Afaik this is the main use case of CDNs.
I am pretty shure that there are way more pages with google fonts than cloudflare protection.
And even for the sole purpose DDoS prevention the privacy issue still holds. Sadly that means popups, redirects or other user unfriedly crap on the pages.
Why can your page know if a certain resource came from cache? Can't that hole be plugged, instead?
For example there's nothing preventing someone from timing all my keyboard events for keystroke biometrics: https://en.wikipedia.org/wiki/Keystroke_dynamics
What do you do in the case where a ton of website's use this API for legitimate animations?