Hacker News new | past | comments | ask | show | jobs | submit login

I've coded something similar up in WebKit and considered landing it in Chrome, although I was more concerned with being able to use HTTP resources securely in HTTPS pages.

Firstly you need a way to incrementally verify a file: you don't want to have to download and buffer the whole file before figuring out whether it's good. Thankfully you can do this with Merkle trees: just make them `degenerate' (every left child is a leaf node).

If you make the browser cache content addressable (as the post suggests) then you need to do more than look at the caching header. Consider a site with Content Security Policy enabled: If an attacker found an XSS, they could inject <script href="http://correct.origin.com/foo.js digest="degen-hash:sha256:123abc...">. That would match the CSP origin restrictions without the server ever having to serve foo.js. (Thanks to abarth for pointing that out.)

So I believe content addressable caches would need a separate hash:// URL scheme for this. But I didn't attempt to make the cache content addressable in my code.

I don't know what to do about img srcsets[1]. There's no obviously good place to put the hashes in the HTML.

And lastly, and possibly fatally for the "secure HTTP resources in HTTPS pages" use case, many networks will now transcode images on the fly (and possibly other files too). So matching a known hash will result in broken pages for users on those networks.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/...




> Firstly you need a way to incrementally verify a file

Following the proposal of the article, you don't really need that. If you don't have the hash in your local cache, you should assume that the link provided by the site is correct and resume as you normally would without the hash present. When it's downloaded, you verify the downloaded contents with the hash, and add the file to the cache if the verification succeeds.

If you already had the hash in the local cache, then you already have the entire file and there's no need for incremental verification.

As I understand it, the point of the hash is purely for caching purposes, not strictly for validating that the downloaded file matches it.


> As I understand it, the point of the hash is purely for caching purposes, not strictly for validating that the downloaded file matches it.

That is a very useful consequence that prevents malicious content.


I deliberately avoided images in my proposition, because they are rarely shared across domains and are frequently targeted by dodgy proxies.


I'm not certain that Javascript isn't similarly `optimised' by such networks, but images are significant for the mixed content case.

I believe that we collected some data in Chrome about the efficacy of a content addressable cache in improving density that was underwhelming, despite expectations. I don't have it to hand, but I suspect that it'll appear in a paper at some point.

I've not seen data about how well it reduces network loads yet, although I believe the same colleagues were intending to collect that too.


The fact that JavaScript would break on networks that transparently rewrite JavaScript is a feature, not a bug. If hash codes were implemented by all major browsers, those networks which tamper with JavaScript would have to stop doing so pretty damn quickly, or face a lot of angry customers.

As it is, I think customers would already be angry if they knew that their network provider was rewriting code that will be executed on the customer's computer; anything which makes this more obvious can only be good.


I'm afraid users would be angry, but they would be angry at me. Blame is assigned to the last thing to move.

I'm not claiming that this is good, but it is the reality of writing browsers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: