Hacker News new | past | comments | ask | show | jobs | submit login
Do not let your CDN betray you: Use Subresource Integrity (hacks.mozilla.org)
313 points by Sami_Lehtinen on Oct 1, 2015 | hide | past | web | favorite | 176 comments

This could also be used to remove the need for a CDN for common libraries like jquery and similar resources. If the browser knows it has a file in its cache with the same properties (has, size, name) even if it come from a different site it can be pretty sure that the content is the same so no new request is needed.

So your site could use the cached copy of jquery (for instance) that was originally brought down to serve my site, or vice versa.

We've been toying with this idea in earlier revisions of the spec, basically using the hash as a cache key and not loading the same file from websiteB if it has already been loaded form websiteA.

Unfortunately, this could be used as a cache poisoning attack to bypass Content Security Policy.

See the section about "Content addressable storage" at <https://frederik-braun.com/subresource-integrity.html>.

(If you can come up with a magical solution to this problem, join the W3C web application security group mailing list and send us an email.)

Cache poisoning doesn't make sense when you are using hashes. If someone can generate sha384 collisions in a way that allows them to substitute malicious files in the place of jQuery, we have bigger problems.

> Content injection (XSS)

If we assume XSS, an attacker could simply inject whatever they want. The cache isn't needed. This still wouldn't poison any legitimate cache keys.

> The client still has to find out if the server really hosts this file.

So use (URL, hash) as the key in the permanent cache. This removes most of the bandwidth, and using a CDN allows for one GET per file across many sites.

So what exactly is the attack? I'm really not seeing how someone could attack a permanent cache without first breaking the hashing functions that we already have to trust.

edit: after reading https://news.ycombinator.com/item?id=10311555

This would work in the cases where we allow XSS (which is already a compromised scenario). Simply adding the URL (or maybe even just the hostname) prevents this entirely, and we still get almost all of the benefits for local resources, and we get all of the benefits when using a CDN.


There are two issues being discussed. 1) Is the file we loaded form a (possibly 3rd party) site correct? 2) Did we ask for the correct file(s).

Cache poisoning is when you can fool #1, while XSS attacks manipulate #2.

The idea behind content-security policy is that it allows scripts to come only from whitelisted domains. You can't inline evil scripts and you can't link them from any domain. So, in the case of XSS, the attacker CAN'T just do whatever they want. They need to make the browser think that the script is being hosted on a whitelisted domain.

Hence, the attack here is making the victim load that keyed script on a different page, then redirecting them to an XSS hole that links to that script as 'hosted' by a whitelisted domain. Since it seems to be on a whitelisted domain and match the original script's hash, it will execute on the page, which is not ordinarily possible on a page which is running CSP.

I hope this encourages you to not immediately assume that large groups of people working on technically complicated problems are stupid in the future.

This took me a second to understand:

The scenario in question has Protected Site A vulnerable to an XSS attack, but protected from it due to their CSP not allowing scripts from foreign domains (only trusting scripts from `trusted.example.com`). This is what CSP is for: it's not for what you expect to serve, it's for protecting against what you don't expect to serve.

In the theoretical attack content-addressable scripting could open up, the user visits Malicious Site B, which loads a malicious script with the hash `abad1dea`. The owners of Malicious Site B use their XSS attack to insert the (simplified) HTML `<script src="https://trusted.example.com/payload.js" hash="abad1dea">`. If Malicious Site B tried to insert a direct link to their payload at `malicious.example.com/payload.js`, it would be blocked due to the site's CSP - however, if the site trusted the fact that it's seen `abad1dea` from `malicious.example.com` as evidence that it could get the script from `trusted.example.com`, this would open up a vector allowing Malicious Site B to run the `abad1dea` payload in a way that would not be blocked by the CSP. This is why the UA still has to make the request, even though it already has the content.

With the behavior that's been specced, a request will be made to `trusted.example.com` which will either 404 or give a different script, causing the XSS attack to be blocked by the page's CSP.

CSP already has a mechanism for hash-based whitelisting - if this is the only limitation, it'd be just as easy to allow cache-sharing whenever CSP is absent and/or the specific hash is explicitly white-listed.

Exactly, thanks for explaining it a lot more clearly than I did.

If you add the domain, then how is it any different from existing caching? If using a CDN you're already all set; the CDN can return the file with cache forever headers.

> If we assume XSS, an attacker could simply inject whatever they want.

An attacker can inject whatever they want, but they can't run whatever they want. That's the purpose of a Content Security Policy: the problem isn't the content of the script being run, it's the context in which that script is being considered.

Because different scripts are given different permissions (eg. access to cookies) based on their domain of origin, the existence of said content must be verified to be true in the context in which it asserts its presence.

It's not a cache-poisoning attack so much as it is a cache-use attack, but it is a legitimate attack.

>Is the file we loaded form a (possibly 3rd party) site correct?

But there are also parts important to interpreting the file that aren't part of the hash, like the mime type. I think this problem is a lot more complicated than you're saying.

How about an additional attribute named "global", "shared", "public", "use-global-cache", share-with="*", etc. that the developer can use to opt in to the behavior?

A site operator would only opt in to the behavior for assets that are not unique to the site.

A second idea would be to wait until several unique domains had requested the asset before turning on the behavior for that asset. (By unique domain I specifically mean the part of the domain that's written in black text in the URL bar, excluding subdomains that are in gray.)

These are two easiest-to-implement solutions I can think of.

  How about an additional attribute named "global", 
  "shared", "public", "use-global-cache", share-with="*", 
  etc. that the developer can use to opt in to the behavior?
Allowing people to opt-in to a cache poisoning vector seems like a bad idea.

  A second idea would be to wait until several unique domains 
  had requested the asset before turning on the behavior for 
  that asset.
This just raises the bar to a cache poisoning attack from "owns one domain name" to "owns a couple". Some gTLDs are $0.99 per year, or free. (The user would only have to visit a single page, which has a dozen other sites open in invisible iframes)

Could someone explain how cache poisoning would work here? The hash is already being verified, I assume that you would not cache a file if the hash doesn't match, the same way that you would reject a file from the CDN if the hash doesn't match.

There's no hash to verify. Bad site preloads a bad script with hash=123. On good site, XSS injects a script src=bad.js hash=123. The browser, makes a request: GET https://good.com/bad.js. BUT! The hash-cache jumps in and says "Wait, this was requested from the script tag with hash=123. I already have that file. No need to send the request over the network." bad.js now executes in the context of good.com.

If the hash-cache wasn't there, then good.com would have returned a 404. There's no hash collision because the request is completely elided (which is a large part of the perf attractiveness).

Thanks for spelling it out so nicely. I was having a bit of trouble coming up with the scenario too.

And as for the "collisions are unreasonable to expect people to generate", remember the use case: these are going to be extremely long-lived hashes.

With the cache poisoning, once you find a collision against jQuery 2.1.1 (to beat the example horse), you can continue to use that against all requests for jQuery 2.1.1. And we know how wide-applicable targets of cryptographic opportunity typically fair against adversaries with substantial brute-force processing resources...

Once SHA-2 is broken, browsers can simply no longer treat those hashes as safe. The spec suggests browsers don't use anything less than SHA384, including MD5.

The impact of SHA-2 failing would be far, far, larger than poisoning jQuery.

Possibly this could be gotten around with by having the good site serve its own idea of what the hash should be. This would be much smaller than the actual resource.

Well that's what you do, via the HTML. But the concern is that CSP treats the HTML served as untrusted. You could fix it by asking the server, but at that point, you're making a request to the server for the resource, killing some of the point of caching. And it seems wrong to have "if-not-hash" headers as part of this proposal; that'd be better off as an improvement to the HTTP caching stuff overall. But putting the verified hashes in the HTTP headers is fine as CSP already relies on the headers having integrity.

Hi! I'm a bit confused by this.

What would the attacker here be doing, and how? I read the piece on your site, and it's not clear to me what the attacker would be updating, and what effect it would have. Can you explain?

0. evil.com hosts evil.js, <script src=evil.js integrity=foo>.

1. you visit evil.com and the browser stores evil.js with the cache key "foo".

2. you visit victim.com which has an XSS vulnerability, but victim.com thinks it is safe because it uses Content Security Policy and does not allow inline scripts or scripts form evil domains.

3. the XSS attack is loading <script src=www.victim.com/evil.js hash=foo>

4. the browser detects that "foo" is a known hash key and loads the evil.js from cache. Thinking that the file is hosted on victim.com - when the file is in fact not even present.

5. the evil.js script executes in the context of victim.com, even though they use a Content Security Policy to prevent XSS from being exploitable.

I still don't see how it's a problem either way. The browser should check that the hash matches before storing it into the cache so evil.com/evil.js has not the right hash so it's not stored. If they can craft a sha256 collision then, we have other problems anyway and sha256 should be deprecated. If the hashes are the same, then the files are the same.

The hash would be correct. The JS file is the same. The key to the attack is: "the XSS attack is loading <script src=www.victim.com/evil.js hash=foo>". So victim.com was never hosting evil.js and never intended to serve it. The visitor to victim.com gets it because of an XSS vulnerability.

victim.com should be protected because it's content security policy tells the browser not to run scripts from evil.com, but the browser thinks that evil.js came from victim.com, even though victim.com doesn't host evil.js and the browsers cache got evil.js from evil.com.

But if the scenario is an attacker who can inject HTML tags, why wouldn't they simply run their script directly via <script>do_evil();</script>?

Because a properly configured content security policy will block any inlined js (and external js files on non whitelisted domains)

The problem is that www.victim.com/evil.js doesn't exist, and never did, but your browser won't know that if it is in it's cache -- this gives you a way of faking files existing on other servers at the URL of your choices, and as long as they are in the cache you'll get away with it.

Oh I see ! I actually did not consider this case. This requires indeed more reflection than I initially thought.

Edit: But if we only make the global cache work on the same domain, this problem should disapear completely (it's obviously not as powerfull then but still a massive improvement to the current system)

> still a massive improvement to the current system

How so? Browsers have had caches for consistent URIs for some time now.

Yes sure it works but it's still domain based. The browser is still downloading the same copy of jquery 100s of times during a browsing session.

No, the attacker decides the hash, since they inject the <script> tag using XSS.

Maybe use ETag or a similar mechanism? This way we'll need to contact the server, but we save bandwidth that would be spent resending the resource, which is still faster than what we have.

Given something like this:

    <script src="https://code.jquery.com/....." integrity="sha384-R4/....." shared>
(where 'shared' invokes the caching mechanism)

The browser sends a request for it with If-None-Match: "sha384-R4/....." header set.

I think this solves 99% of the problem:

If the integrity tag doesn't match the ETag of the resource, the server interprets it as out-of-date cache and responds with content of that resource. If the integrity tag matches the ETag of the resource, it will respond with '304 Not Modified'.

And that's the remaining 1% of the attack surface: basically the attacker wins iff the site can be tricked into serving a resource with the same ETag as the hash of his payload. We don't need to worry about collisions: even if someone uses ETags that match the form of subresource integrity tags without intending to, the attacker would still need to generate a collision, which is just as hard as finding a collision with any other hash. But if there are servers out there that will serve files with externally-set ETags then they'd be exploitable.

I agree with the guy that said the HEAD request. If you couple that with an "integrity-hash" header that is just like the attribute on the script tag than you can compare the hashes.

What do you think are the downsides to something like that?

That's a clever attack. Possible solution: Have two tables hash -> file and hash -> "set of domains we have verified has the file". If victim.com uses a CSP, then we look in the second table. We see that so far we only know that evil.com has the file. We therefore request the www.victim.com/evil.js and hash it. If it matches, we add it to the set. If it doesn't we bail.

EDIT: Although I guess the current URL based cache may already dedupe, in which case my solution would be roughly equivalent to just turning off hash-based caching for domains with CSP.

What if at least a HEAD request is required?

The browser could submit a special query to get the hash for victim.com/evil.js before running it. If victim.com returns the same hash, it's clean, if it doesn't respond, or responds with a different hash, fail in the same way as if a CDN had modified it.


The original poster said to also use the size. If you include that, my understanding is that crafting a hash collision is into the realms of impossibility.

Am I wrong?


Turns out I was completely misunderstanding. I now do. Thanks!

I'm not seeing how size would prevent the attack. If I understand bugmen0t correctly, in the attack outlined, the file "evil.js" size (and hash and contents) are completely controlled by the attacker. If you did need to specify size, the attack would simply change to:

3. the XSS attack is loading <script src=www.victim.com/evil.js hash=foo size=123>

> If you include that, my understanding is that crafting a hash collision is into the realms of impossibility.

Crafting a hash collision is already in the realm of impossibility. (They're using cryptographic hashes: if you can make SHA256 collide, we have bigger problems.) The attack here isn't that you're getting the wrong file, it's that you're getting a file the webserver does not have, at all. Step 4 is where we go wrong: we load the file from cache, while we should instead request it from the server, which will 404 the request because it does not have the file.

(And it's JS: even if size did matter, you can just add spaces to the end…)

How about you'd do a request saying "hey, I’d like to have victim.com/evil.js, I have a file with hash 93987590837309, is that still up-to-date?", and then you’d hope for 304 Not Modified (and load from evil.com/evil.js), or you’d get a 404 or 200 or whatever.

Sorry, my brain clearly imploded while I was reading that. On re-reading it makes perfect sense. Thanks!

There is no hash collision happening.

I see. Genuine question: I haven't thought this through but why not just execute the file in the context of the original file?

Because if the file was legitimate, the site might need it to run under its own context.

And what if you add filesize checking?

Just do a request and hope that the server returns you a 304 Not-Modified.

This should prevent the issue, right?

Isn't the solution to only allow caching using the key if it's over https (to stop modification) AND in the original HTML (i.e. not added afterwards by JS). Limiting, but would cover above.

If you can inject/alter data over the wire, all protections are usually moot - e.g. you can simply omit the CSP header and inject your code.

Most attackers don't have that capacity, though; XSS is usually done by tricking the page into running your own JS code (for example, by finding a publicly editable text area which doesn't properly escape HTML). Those restrictions wouldn't stop this attack.

6. you just need a collision where evil.js would generate the same sha3sum "foo"

Let a resource have one hash and potentially many source domains. Define a CSP whitelist consisting of trusted domains. This list is applied to resource loading in the browser, and it is also applied to cached resources in the following way:

   cache = {
        {domains: ['evil.com','airbnb.com'], content: '...'},
        {domains: ['javajosh.com'], content:'...'}
You go to secure.com, but a malicious user has put the b0af301 script in your path. CSP's white list for secure.com is [secure.com, javajosh.com]. The browser dereferences the hash, checks against the associated domains, and rejects if a whitelisted domain isn't in that list. Your browser running secure.com would reject the b0af301 script.

(Something I personally would like would be for for orgs like EFF.org to post known-good hashes, so I can always add the EFF hashes to my site's CSP whitelist, and have a warm-and-fuzzy feeling.)

Possible solutions:

1. Add an If-Hash-Mismatch header so you don't need to transfer the body

2. Add a list of hashes to accept to the content security policy headers

3. Add a list of public keys to accept to the content security policy, and allow the content if it's signed by one of those (this requires some standard way of signing things, maybe PGP/MIME or a dedicated HTTP header)

4. Only allow this from <script> and <style> tags that are in the <head>, or that are at "end" of <body> (meaning there are no tags other than <script> or <style> afterwards), or resources referenced from CSS and JavaScript files loaded that way.

EDIT: 5. Add an ECC public key (Curve25519?) to the content security policy, and accept hashes where an extra attribute is specified providing an inline signature of the hash with the key

The idea of the last one is that XSS would usually happen in in the middle of the body and not in the head or footer.

That said, you can XSS with inline script, so it seems this only mitigates XSS vulnerability with length limitations on the payload (EDIT: nope, CSP blocks inline script).

CSP allows blocking inline scripts right?

But 1. should exist regardless to complement the existing caching options. It shouldn't be sent by default to avoid adding another tracking method, but if the source page specifies a hash, and you have that hash, then If-Hash-Mismatch is perfect.

2. Bingo, winner.

How about this:

There's a warehouse 'owner' internal to the browser (and not exposed to pages/extension), who 'remembers' the resources the browser has, and the times it took to access them (for commonly-accessed resources). When a page requests the cached resource, the owner 'returns' the resource with a delay of whatever the original access time was, fudged around by some noise.

A weakness of this would be that websites would be able to 'communicate' with each other by engineering response times to your browser, and then checking how long it takes your browser to access that. But this is a different scenario than random websites trying to figure out where else you've been: here the pages need to be in collusion with each other.

The servers might try to use fancy algorithms to try to figure out if you're using cached versions by hitting different distributed servers and figuring out if the resource load time is an outlier. But that's prone to a lot of noise and other issues, and lesser of an issue than the original concern. Right?

This obviously won't help with page load speeds, but will help network load for bot users and servers. One possible issue might be: if you've been in a slow connection previously, all your connections will seem slow even after you are in a faster connection. For that, you can just purge the cache and force the browser to reload the resources.

Edited for formatting.

The danger is not that malicious websites figure out where you've been, the danger is that a malicious website could poison the cache for a critical piece of JS used in, for example, gmail. Visit a malicious site, boom, Russians can read your email.

I think both of these things (history snooping, XSS), plus the Dropbox problem of injecting a hash without ever actually having the file, will need to be addressed.

This would require generating collisions for hash (e.g. sha385). We can trust the hash because SR?I already assumes the hash function works in the integrity="" attribute.

What about using HTTP headers? So websiteB wants to load something by hash, it would have to whitelist it with "AllowedCAS: hash1 hash2 hash3". In fact, CSP already does this for inline scripts. So add another attribute to the CSP header, like 'cache-src' or something, listing good hashes. XSS can't modify the CSP header, so isn't this safe?

If you submit a request with Etag: <integrity>, the server can validate with 304, or deny with 4xx/5xx

I think this would also allow the server to "pre-validate" with HTTP2 push.

And what if you’d just check the filesize? In the same way as you check for modified resources with existing caching methods?

Couldn't you just store the URL and hash together, or salt the hash with the URL?

This would be a security problem. By measuring resource timing of a script copied from another site, one could determine if the browser has visited that other site.

(Resource timing is disallowed for cross-domain resources for exactly this reason).

You can do that at the moment.

I made a site where I 'guessed' if a user was an employee of a certain company by embedding an <img> from an employee portal (from the login screen) and timing how long it took for onLoad to trigger. Times under a certain threshold indicated that the image was probably cached and they work there.

Of course, once you load it once it's cached, so you need to persist the first result... but you get the drift.

The <img> was from a different domain?

Yes. Images work cross domain and they still fire onLoad correctly.

CORRECTION - resource timing is not disallowed entirely; only network timing details (to calculate latency, etc.) are disallowed.

Sharing resources across pages based on has values would render this restriction moot.

From http://www.w3.org/TR/resource-timing/:

"Statistical fingerprinting is a privacy concern where a malicious web site may determine whether a user has visited a third-party web site by measuring the timing of cache hits and misses of resources in the third-party web site. Though the PerformanceResourceTiming interface gives timing information for resources in a document, the cross-origin restrictions prevent making this privacy concern any worse than it is today using the load event on resources to measure timing to determine cache hits and misses."

One could have sites opt-in to it, e.g. by a HTTP header? CDNs and other common resources could allow it, but identifying files would still be protected.

Can you further explain? I can't see how this would work.

You can request to load a particular script that you know comes from a given site. If it loads instantly (for some value of "instantly"), then the request was almost certainly served from cache.

> size, name

I don't see why it would be necessary to provide a name and a size. If the hash is the same, we can be pretty sure that's the same file.

I'm indifferent about name, but size has been shown to provide additional security.

Generating two files where their hashes collide is extremely difficult. Generating two files where their hashes collide at the same size is near impossible, even after you break the hash function itself (e.g. with MD5 it requires much more compute power to generate two files with matching hashes and sizes than just hashes alone, since you're effectively looking for a subset of all collisions).

I feel like you're just effectively adding more bits to the hash length by appending the resource size; bits that might be better used by just adding more length to the hash.

All MD5 hash collisions I've seen have been the same length: http://stackoverflow.com/questions/1224113/examples-of-hash-.... I assume the common algorithm for making MD5 collisions requires you to start with a single value and mutate it in specific ways, while keeping the same length.

An additional layer of security does not hurt. Forging a hash.collision at the exact same filesize is harder than an arbitrary size.

That's not really true. In fact, practical collision attacks vs. MD5 preserved the size - which if you think about it, makes sense. Unless a has function is utterly terrible, it's not unreasonable to assume it's easier to find a few correlated bits than it is to append bits. After all, to append bits you're going to need to somehow ensure that the internal fixed-size state of the hash algorithm cascades into the same state or nevertheless gives the same output, and since it's trivial to have a very, very high period state machine, that may mean appending huge numbers of bits.

If safety were an argument you'd add an extra, unrelated hash function. E.g. even md5 is likely much harder to break if you also have the CRC32, even though CRC is a thoroughly insecure hash (and of course, you wouldn't use an insecure hash, now would you?)

Thank you!

You are right that it would not be necessary at all technically, but some people might be more reassured by the extra check.

It would not be necessary at all technically, but some people might be more reassured by the extra check.

Different sites usually use different versions of libraries. Also nowadays sites use to put caching headers for all their resources so the cache quickly gets filled. I think there would be low cache hit ratio.

> it can be pretty sure that the content is the same so no new request is needed

I'm really glad that the SHA-2 family of hash functions has held up thus far.

I don't imagine any browser vendor implementing this at all. A hash collision seems unlikely but if it can be constructed, the results are potentially disastrous. No one else should be able to decide where code from site is loaded except me.

"pretty sure" isn't really good enough.

Then developers who feel the same can simply not add any hash to their <script> tags, no?

Yes. And then not the security benefits that subresource integrity is intended to provide.

HTTPS, code and binaries signing, etc all depend on the fact that getting a malicious message with a hash signature that collides with another arbitrary message is all but impossible. If you don't trust that, you're pretty much unable to trust anything that's transmitted over the 'net, and this potential hole is not significant in that context.

That's a fair point. But it still increases the attack surface for questionable gain. You're not just relying on the attacker being unable to create a hash collision. You're also relying on all browsers to provide a correct implementation of this code sharing mechanism.

Allowing this sort of code sharing provides another attack vector. If a malicious site is able to exploit a browser vulnerability that allows it to populate the shared cache, then you've suddenly enabled code injection on any site using subresource integrity.

Not sure why this is being downvoted. It's a reasonable point.

It's not. If browser vendors thought that the potential for breaking a decent hash function was a real threat, they would also not implement HTTPS, signed updates, etc, all of which depend on the trust in hash based signatures.

I, for one, welcome EdDSA signing of Javascript resources.

One thing about SRI that's also great - even beyond the security concerns with 3rd party scripts - is the benefit of stability. You know whoever controls the other end of the src attribute on your 3rd party <script> tag won't go changing things (even with the best of intentions) quietly that break your site. It's an out-and-out win and I hope all browsers support it soon.

I'd guess that the more annoying 3rd party scripts (looking at you, addthis) would simply tell people not to use this attribute as it will "break compatibility and hinder our ability to deploy critical, potentially security impacting, fixes". In fact, if I were a 3rd party script provider, I'd want to make sure people don't do this to my scripts if I haven't "opted in" to keeping compat (i.e. never modifying the content at a URL). In addition to what you say, it'll also happen in reverse: Well-meaning webmasters will add this tag to improve security, then end up with a broken site.

I'd be surprised if 3rd party providers don't start intentionally adding a random byte on each request (or every hour or something) to make sure that webdevs don't take a dependency on the contents of their files.

What is "addthis" anyway? I always wondered if the domain was targeted at NoScript users. It's telling me to add it -- maybe I should whitelist it!

I think it's a vile "sharing" widget thingy. Nothing of value lost by blocking it.

Any idea if there is a catchable event to detect an integrity check failure?

"On a failed integrity check, an error event is thrown. Developers wishing to provide a canonical fallback resource (e.g., a resource not served from a CDN, perhaps from a secondary, trusted, but slower source) can catch this error event and provide an appropriate handler to replace the failed resource with a different one."

Source: https://w3c.github.io/webappsec/specs/subresourceintegrity/#...

Just have JavaScript code check for variables that would show up in the load, no?

You could probably make an addon that tries to fetch the data from IPFS instead of the CDN. Might be a nice way to bridge the existing web.

Thanks for that reference. https://ipfs.io/ is very interesting!

This was my first thought as well. Why go through the trouble of redundantly fetching a resource by URI and verifying its checksum when you could just use a system designed for this in the first place?

> An important side note is that for Subresource Integrity to work, the CDN must support Cross-Origin Resource Sharing (CORS).

If the CDN doesn't support CORS and the browser does support subresource integrity, subresource integrity is ignored (bad, since an attacker can disable CORS before changing the js) or enforced, thus refusing to execute the js (good)?

spec co-editor here.

SRI returns false (i.e. non-matching integrity) for scripts (or stylesheets) that do not enable CORS and are not same-origin [1]. Otherwise, an attacker could just disable CORS to bypass SRI.

flies away

[1] https://w3c.github.io/webappsec/specs/subresourceintegrity/#...

If the CDN doesn't support CORS then presumably the user won't even get the script back, but if they do then the browser shouldn't execute it due to the lack of CORS authorization, regardless of the integrity issue.

This could have been used to stop the Github DDOS from China, if I'm understanding it correctly

No. They injected code into some visitor analytics script (like GA) and those scripts are constantly updating so you cannot calculate and store the hash.

Well, seems like a good reason to block scripts that are constantly updating, no?

I suspect this is going to become a turbo AdBlock. If the original page doesn't sign the content, block it.

I guess so. GitHub hit by DDoS attack https://news.ycombinator.com/item?id=9275041

Only if this feature had widespread adoption. And if the GFW doesn't replace the hashes in the original content.

Today, whenever any resource (script/stylesheet) on an HTTPS website loads from HTTP, the browser rightly warns the user about insecure content.

But with SRI, it should be possible to send scripts, css, etc. over plain HTTP right? As long as the landing page is HTTPS, and the hash checks out, is there any reason for browsers to show a warning to users then?

You would loose privacy: A snooper could see exactly what you are downloading. It may not matter in most cases, but still not something browsers want to compromise.

This exposes a general issue that sometimes you want data integrity but not privacy. With https it's all or nothing.

Yes that's the only thing I could think of -- loss of privacy. (On the other hand, thanks to SNI, its easier than ever for a snooper to know which website you are visiting even if its completely on HTTPS; so the loss of privacy by switching to plain HTTP here is only incremental).

Subresource Integrity works on both HTTP and HTTPS.

This seems like exactly the thing that they were talking about when they started depreciating http [0]. Does this mean they've changed their mind?

[0] https://blog.mozilla.org/security/2015/04/30/deprecating-non...

No, it just means that this method works for both HTTPS and plain HTTP - they are not stating an opinion of either protocol here. If you are using HTTP then this would protect you from one class of problem, but it leaves all the other potential problems open if they are relevant to your content.

I've been plugging subresource integrity for months on HN, and have been modded down for it. Now Mozilla says "Don't let your CDN betray you". I've called some CDNs "MITM-as-a-service".

Pages which use this should detect subresource integrity fails and report them to both the browser user and a non-CDN logging machine. Subresource integrity should put a stop to CDNs and ISPs inserting ads and spyware, because if even a few major sites use subresource integrity, they'll get caught quickly and will suffer bad publicity.

This encourages using a CDN for only the bulky parts of a site. Put the important pages (entry pages, login pages, credit card acceptance) on a server you control, with your own SSL cert. Put the resources loaded with subresource integrity on a CDN. Now you're not trusting the CDN at all.

Tools for website maintenance will need some improvement. Files need version info in their names; if the content changes, the URL should change, too. Maybe use the hash as part of the URL. Such files can have indefinite cache expiration times; they're immutable.

Any idea why SHA384 is used over SHA256?

Presumably it's not for collision avoidance, and it's not like anyone's going to be hitting the maximum message size of SHA256 with anything stored in CDN..

Edit: So it seems that all of the main variants of the SHA-2 family must be supported[1], and the spec supports multiple hashes being presented at once. It's just that SHA-384 seems to be used in all of the examples I've seen so far.

> Conformant user agents MUST support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a request’s integrity metadata, and MAY support additional hash functions.

> When a hash function is determined to be insecure, user agents SHOULD deprecate and eventually remove support for integrity validation using that hash function. User agents MAY check the validity of responses using a digest based on a deprecated function.

1: http://www.w3.org/TR/SRI/#cryptographic-hash-functions

Reading the second example, it doesn't, it supports at least both of those.

It could be that SHA384 is immune to length extension attacks, SHA256 is not.

It doesn't look like anything new AFAICT. I don't think they ever recommended sha-256 for TOP SECRET documents and with a quick scan I didn't see any new discussion in fips-180-4 itself.

About a year or 2 ago I saw someone here on HN suggest doing this very thing. If my memory is correct, I think it was after jQuery's CDN once became compromised.

[Edit] I think this was the discussion, just over a year ago! - https://news.ycombinator.com/item?id=8359223

http://jimkeener.com/posts/http is my concept from 2013 (hash attribute) and I'm sure I'm not the first, either. It's an old idea and I'm _very_ glad to see someone doing it!

EDIT: I just reread my post and there are some wonky ideas mixed in with (what I think are) decent ones. Sorry!

Hopefully it will soon be possible to use this for software downloads even if the files themselves are served over http or ftp (including, sadly, many open source projects; the thing I like best about the popularity of github is that you know you can clone over https). It is depressing how much software can be downloaded from any of dozens of mirrors without a hash anywhere (unless your antivirus software checks hashes of such downloads, which it sounds like many on Windows do). Of course, these sites could already have an https page with hashes for manual checking but almost never do. Hopefully automated checking would convince a few to do it.

P.S. Blake2s would be a great additonal hash to support.

Or, here is another idea: host the Javascript you need in your own damn domain!

At its face this looks like a good idea, but given the hassle of injecting the new checksum in the base file for every change I doubt it will be done by anyone but the most security conscious companies.

1. Not many people hand-write HTML tags these days, they tend to get generated at the level of apps like WordPress (which already has an SRI plugin) or frameworks like Rails (which can already do things like javascript_include_tag :application, integrity: true).

2. The people who do hand write HTML tags tend to be precisely the type of people who would go out of their way to generate an md5 checksum on the commandline, or write a script to post process their HTML files.

Seems trivial. You go to the CDN page to get the script URL and right next to it is the hash. Maybe they have the whole tag for you to copy and paste.

> An important side note is that for Subresource Integrity to work, the CDN must support Cross-Origin Resource Sharing (CORS).

So, an extra request per resource, in other words?

CORS doesn't require preflight requests for simple GETs and POSTs, since you can always trigger those anyway. The browser can just do the request and check the headers on the response.

Wouldn't doing this on POST be dangerous? POSTs are not expected to be idempotent, so you're trusting the server to understand and check the Origin header etc.

You can send POST to any server using a JS-submitted form so it doesn't introduce new attack vector.

You get slightly more power in that you can post malformed multipart stuff.

No, it will just compare given hash to file.

An extra header.

Why require CORS? What good is that, here? If you're a malicious CDN, you'll just not support CORS and so no one can use this to protect themselves. If the CDN is benign but gets hacked, this just leaves the (unlikely but possible) option of the hackers breaking CORS, waiting for the inevitable "I guess I'll just disable SRI", and then altering the subresource.

Requiring CORS simply makes security more difficult to achieve.

The browser must not know the content (or the hash of the content) of files on your intranet (or any other domain that is not the one you are visiting right now.)

See https://annevankesteren.nl/2015/02/same-origin-policy and http://w3c.github.io/webappsec/specs/subresourceintegrity/#c...

If the hash functions are secure, the only way for this to leak information is if the attacker can make good guesses as to what the resource is a priori, and then use this to verify it.

Fair enough, that's some information leakage, but it's certainly not easy to exploit. Normal cross-origin limitations still apply, so you'd need to get creative to even get the information in the first place, and if you can, it's not clear what this adds over a timing attack.

I'm still a little skeptical such a heavy handed restriction is necessary to maintain the current level of security; but then again - why take the risk?

I think it's just better to download the files and host them yourself.

The only reason why you want to include third party scripts is that you want automatic bug updates etc!?

I have to disagree. Typically, having a 3rd party CDN hosted library like jQuery change on you is most definetly un-expected behaviour and not desired.

The main purpose for CDN hosted scripts is allowing them to be cached in your browser, reducing latency via close edge servers and load times across website via caching.

This proposal prevents malicious changes, as well cache poisoning, which was a very scary threat up until this announcement due to attack vectors like the one described in this defcon talk: https://www.youtube.com/watch?v=kLt_uqSCEUA

(PDF slides here: https://defcon.org/images/defcon-20/dc-20-presentations/Alon...)

> The main purpose for CDN hosted scripts is allowing them to be cached in your browser...

Hopefully, subresource integrity schemes will eventually allow browsers to fetch from local cache based on the checksum of the file contents, rather than the URI from which the resource is served. :)

Fair enough. Add httpS to that and it will be a good security layer.

Next can we apply it to whole HTML documents (including iFrames)? Then malicious ISPs/malware/etc would be unable to inject their ad spam code.

This works because you trust the HTML provider (as a source of a valid hash) but not the CDN. If you don't trust your ISP then getting trusted hashes becomes more interesting. You'd need an encrypted/unmodifiable connection to a 3rd party repository of hashes for HTTP content but that'd only work for shared identical content (like the stuff in a CDN). For personalized content (any website with a login) you need a personalized hash, an HTTPS response could have a header with a hash (RFC 3230: Instance Digests in HTTP) but you can only trust that if you trust HTTPS and if you trust HTTPS then what does the header really add? You probably need two ISPs or at least VPNs if you want to start detecting tampering by an ISP.

The idea is to provide protection even in HTTP-only environments (as HTTP is enforced in some places).

It would add overhead to verify hashes in the manner that you have mentioned but I think it's worth it.

For the main HTML document, what would prevent the ISP from simply changing the hash to include/account for the injected ad spam code?[1]

(for iframes, I think this would work fine)

[1]: I'm presuming you're not using TLS, because if you were, then TLS would do this and more.

Not a whole lot, TBH.

It raises the bar though from blind injection, which I think is quite good.

You can just use HTTPS for that.

Make sense. Subresource integrity looks very easy to implement for developers, but what about CDNs? Is it difficult for them to implement CORS?

For static resources, it's nothing more than sending this HTTP header "Acess-Control-Allow-Origins: *".

We reached out to jQuery and code.jquery.com does this for a few months now.

As long as jQuery is compliant, I feel like you've probably covered 80% of all use cases on the web.

I still don't really understand why we can't just run any files in cache (from other sites) with the same hash.

If I have the sha-256 of an exe file, I'm perfectly happy to run any exe file with the same sha256 simply because collisions don't happen. Why is this different for JavaScript?

If an attacker can inject HTML script tags into your website haven't you already lost?

Content security policy is a defensive technology which makes the answer to your last question "no." Attackers still need to have their script appear to execute from a whitelisted domain, which is only possible if the system you propose is enacted. IE - have your own random webpage which loads a script with hash X, then redirect to an XSS hole which appears to load that script on a client's site that also comes from a whitelisted domain. Since it is cached from the first site, it will be loaded as if it was hosted on the second site and thus bypass CSP.

Perhaps I'm missing something here, so the browser can check the integrity of the included scripts as such:

    <script ... integrity="sha384-...">
How will the browser confirm that the source HTML requesting script isn't modified by the CDN?

Well known CDN services, such as CloudFlare, are known for modifying the HTML -- This is not uncommon.

This is probably meant for pages where you serve the HTML, but JavaScript libraries and styles may come from a CDN.

Why is this better than the Content Security Policy header where you can specify a hash or the hash attribute on a script tag?

I'm a big fan of CSP, but even I have to admit that the CSP is quite large/expensive for a HTTP header field.

It might be different for other sites/stacks but on ours we deliver CSP at the whole site level, meaning it is delivered with every response we send.

Script integrity is only sent when that specific script is used, and it means our workflow doesn't have to change to rewrite a HTTP header dynamically with each page (based on which scripts are or aren't on that specific page).

I legitimately have no idea how I would implement CSP with hashes for the scripts on that specific page. It would require me to actually patch the software stack upstream. I do however know exactly how I'd use the integrity field on a script block and could implement it with just raw HTML.

PS - Not to mention that few browsers support level 2: http://caniuse.com/#feat=contentsecuritypolicy2

CSP header doesn't need to be added for every response, e.g. for images or JS files.

You can enable it via editing HTML rather than altering the HTTP headers.

that only works on inlined scripts, not on external references.

Wouldn't it be faster (browser being able to use resource) to just skip the CDN and host the resources yourself?

Not necessarily. Take jQuery: many sites use it. If both sites A and B load it from a CDN, and I've already visited site A, then when I visit site B, jQuery is already in my cache: we might not even need to request it, despite never having been to site B before.

However, traditionally, the CDN now controls the content of your JS, and could inject whatever they want into it. That's where this proposal comes in…

(Of course, if it isn't in the cache, then you might need as much as a DNS lookup+TCP connect+a TLS handshake to another host… tradeoffs. HTTP/1.x is also limited to n connections to a DNS name at time, so you can parallelize requests by hosting across multiple domains, such as a CDN, but I find this argument less compelling.)

Different sites might use different versions of a library. Also the cache gets polluted very quickly because many sites add caching headers for every request and cache size is limited.

If average page size is 500 Kb and jquery size is 30 Kb gzipped you do not save much by hosting is on a CDN. What you get is more DNS requests, more downtime when that server fails or stalls and give out data about your users.

I think it is easier just to host everything on your own server.

Exactly, people are giving up fundamental security for very little free resources and a tiny, tiny one-off performance improvement. But, what people also don't seem to care about is that these CDN's are actually created to track both users and site traffic as well. No bank uses (or should be using) third party CDN's, but other sites seem to not care.

This is absolutely excellent! I always hold off on using 3rd party hosted scripts when feasible, but this'll change things for sure!

Congratulations Mozilla!!! This is one of the few recent changes I've seen to browsers that fundamentally changes web-page security in a simple and novel way.

Chrome already have this for your information: https://www.chromestatus.com/feature/6183089948590080

Here is a nice node.js package to generate a SRI from a file. https://www.npmjs.com/package/node-sri

Great! Now if only we can get asymmetric document signing going, including a "must be signed" header to compliment HSTS...

SRI does little for trust when your CDN is also proxy-caching your HTML (e.g. Cloudflare).

Simple, yet seemingly effective. As long as the hash the browser is checking matches the hash the server produced, the only options to beat this that I see are hash collision and bypassing the mechanism.

I like the concept but there will probably more failed integrity check because of procedural mess-ups than actual attacks.

I wonder about situations that is used with 3rd parties (such as Facebook) and how they could distribute changes.

I doubt Facebook or any third parties will find much benefit to this. They will just say that they will protect their own CDN. Otherwise they effectively kill their ability to update their scripts. In fact, I'd expect them to deliberately disable this by updating the script with a random byte every so often.

If a site wants to pin a third party to a specific version, they'll need to copy the file themselves. Though I'm not sure if this can be detected and "fixed" by the script author. I've noticed that Stripe's js file logs a warning if it thinks it's being loaded from another domain.

I mean if the Subresource is common place, someone might do this with a third party. The spec does not seem to address this.

or just don't use CDNs... they are just another privacy issue on top of all other problems...

Though, I don't see why CDNs wouldn't replace the sha1 with sha1 of injected file.

The CDN doesn't control the SHA1, your website does.

Some sites host their static HTML on a CDN too. Example: anyone who uses Cloudflare.

If you use proxy server that terminates HTTPS connections you have to trust it. There is nothing you can do.

I was thinking the same thing. For this to work you'd have to not use a CDN to host the static HTML (or maybe a different CDN?) Otherwise, it would be trivial for someone already sophisticated enough to inject a malicious script to also change the hash.

Incidentally, I always wonder about this for non-HTTPS sites that offer binary downloads and crypto hashes to verify the files. How can you be sure someone isn't MitM'ing you?

That is absurd. To prevent malicious modifications and data leak you can just host code on your own server. Using free CDN (without SLA) you just increase your site downtime and get nothing in return (well except that company and NSA can now collect IPs, UAs and referers of your users).

CDNs are used to optimize traffic cost but hosting just single JS library there won't save you much.

Calculating hashes is bothersome and requires modification of an app so probably nobody is going to use it.

I never use libraries hosted at free CDNs.

EDIT: I cannot think of a scenario where this feature can be useful.

EDIT 2: And you cannot use this feature for scripts like Google Analytics because they can be modified anytime.

It's not just about CDN-hosted files, it's also a way to not execute untrusted JS. Let's say you have a high-traffic WordPress site with a dev environment. You might have a clean theme and know all your JS, but be using plugins other people have written.

Using this it looks like you could put the hash in your markup and if their JavaScript code changed it wouldn't execute. For some people having that deadman's switch might be better than an always-execute policy on JS that you aren't writing yourself.

There is CSP ( https://en.wikipedia.org/wiki/Content_Security_Policy ) to protect againgst inclusion of unapproved third party scripts. Hash check won't help against malicious WP themes because they would not output integrity attribute into generated HTML.

You are right in everything you say here. I don't know why you are downvoted. Cross-domain JavaScript is cancer and most if not all websites that follow this practice could do without it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact