Hacker News new | past | comments | ask | show | jobs | submit login
Authenticating Shared Web Caches (thesharps.us)
52 points by luu 28 days ago | hide | past | favorite | 14 comments

When thinking about the same problem I discovered TLS Notary, which was interesting.



At the time I was interested in using website data on the blockchain. The limitation was that proofs couldn’t be processed in a contract, they had to be passed off to a trusted 3rd party. Unclear on the state of play today.

This can't get better (than having trust in a third party) as being able to prove these things actually goes against one of the common security guarantees of the cryptography protocols used by TLS. You need the server to actually play along and give up their right to deniability, such as by supporting the new Signed Exchanges protocol being worked on for AMP.

You might be able to use a blockchain here.

Each block in the chain stores "transactions" that are cryptographically secure hashes of the content of a URL. To archive an entire page, which may link to various resource URLs, you would need multiple transactions; one for each resource. Effectively, a transaction acts like a claim that a page looked a certain way on a certain date.

Mining operates like a normal blockchain where blocks are validated by other parties based on downloading the content of the URL, comparing it to the hash, and discarding the block if the content doesn't hash to the same value. Miners are compensated either with a type of in-chain currency or a smart contract granting them another cryptocurrency based on how many blocks they mine.

This system still has a few flaws:

1) Websites could still prevent their archival by changing the website on each access. Since each node in the network downloads the page independently, the downloads must all serve the same content for the transaction to validate.

2) Malicious miners could request archival of a website, then change the content of the website as soon as a new block is mined. With enough luck, they can prevent the verification of a new block, stopping the original miner from claiming their reward.

1 is a pretty big flaw. As a trivial example, the "Date" header changes for every request on most web servers. News websites have minute to minute updates.

The problem statement in the blog post covers these issues and sticks to validatation of a already-downloaded cache.

That seems like a massively overengineered solution which does not solve the actual question:

> If I want to contribute my own crawl results, how can the Archive verify that I didn’t forge the pages I’m submitting?

Using a blockchain for this solves nothing and only introduces new problems.

As I've discovered in the past few weeks of archiving a soon-to-be offline service: archiving is tricky and often requires tampering with traffic.

The IA uses a MITM proxy [0] to produce WARC files from intercepted traffic. Another software suite for producing high-fidelity archives injects JS [1] into pages served by its MITM proxy.

[0] https://github.com/internetarchive/warcprox [1] https://github.com/webrecorder/wombat

Have you tried grab site [1]?

[1] https://github.com/ArchiveTeam/grab-site

This idea is similar to DECO [1]:

"DECO is a privacy-preserving oracle protocol. Using cryptographic techniques, it lets users prove facts about their web (TLS) sessions to oracles while hiding privacy-sensitive data."

[1] https://www.deco.works/

A similar idea, but using email DKIM keys instead of web TLS keys, is WebFist:


It makes me wonder if some day we could break our data out of their silos by storing responses from the sites where we currently have our accounts. In theory, my account on this site should be able to express a "like" on a Tweet, or an up-vote on a Reddit post.

I'm not sure how to do that in a way that doesn't break privacy expectations, and that doesn't involve using a blockchain somewhere, unfortunately.

Are you familiar with the IndieWeb stuff? This sounds kind of like WebMention.

Thank you. I seem to keep "independently" reinventing WebMention:


There are probably a few pieces missing beyond WebMention to make a decentralised pseudonymous reputation/voting system, but you're right that I should follow the IndieWeb stuff more closely.

From article, Jamey Sharp:

"...not terribly useful but still interesting enough to share.

Say you want to give somebody a copy of a web page that you’ve previously retrieved, and you want to convince them that what you’re giving them really did come from the original web server at some point."

Au contraire.

Solving this fixes journalism and "fake news". Mostly.

Imagine coupling this with QuoteBack, recently featured on the HackerNews front page.

Journalism: cite your sources, share you data, sign your work. Anything less is gossip.

why not replace TLS with GPG/PGP? completely?

> for the purposes of this exercise I want to be able to do it for any web server as it is deployed today, rather than requiring support for a new protocol.

We want the server to attest that the content I'm signing is valid, and you can't introduce PGP into a normal HTTPS browsing session without a new protocol.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact