At the time I was interested in using website data on the blockchain. The limitation was that proofs couldn’t be processed in a contract, they had to be passed off to a trusted 3rd party. Unclear on the state of play today.
Each block in the chain stores "transactions" that are cryptographically secure hashes of the content of a URL. To archive an entire page, which may link to various resource URLs, you would need multiple transactions; one for each resource. Effectively, a transaction acts like a claim that a page looked a certain way on a certain date.
Mining operates like a normal blockchain where blocks are validated by other parties based on downloading the content of the URL, comparing it to the hash, and discarding the block if the content doesn't hash to the same value. Miners are compensated either with a type of in-chain currency or a smart contract granting them another cryptocurrency based on how many blocks they mine.
This system still has a few flaws:
1) Websites could still prevent their archival by changing the website on each access. Since each node in the network downloads the page independently, the downloads must all serve the same content for the transaction to validate.
2) Malicious miners could request archival of a website, then change the content of the website as soon as a new block is mined. With enough luck, they can prevent the verification of a new block, stopping the original miner from claiming their reward.
The problem statement in the blog post covers these issues and sticks to validatation of a already-downloaded cache.
> If I want to contribute my own crawl results, how can the Archive verify that I didn’t forge the pages I’m submitting?
Using a blockchain for this solves nothing and only introduces new problems.
The IA uses a MITM proxy  to produce WARC files from intercepted traffic. Another software suite for producing high-fidelity archives injects JS  into pages served by its MITM proxy.
"DECO is a privacy-preserving oracle protocol. Using cryptographic techniques, it lets users prove facts about their web (TLS) sessions to oracles while hiding privacy-sensitive data."
It makes me wonder if some day we could break our data out of their silos by storing responses from the sites where we currently have our accounts. In theory, my account on this site should be able to express a "like" on a Tweet, or an up-vote on a Reddit post.
I'm not sure how to do that in a way that doesn't break privacy expectations, and that doesn't involve using a blockchain somewhere, unfortunately.
There are probably a few pieces missing beyond WebMention to make a decentralised pseudonymous reputation/voting system, but you're right that I should follow the IndieWeb stuff more closely.
"...not terribly useful but still interesting enough to share.
Say you want to give somebody a copy of a web page that you’ve previously retrieved, and you want to convince them that what you’re giving them really did come from the original web server at some point."
Solving this fixes journalism and "fake news". Mostly.
Imagine coupling this with QuoteBack, recently featured on the HackerNews front page.
Journalism: cite your sources, share you data, sign your work. Anything less is gossip.
We want the server to attest that the content I'm signing is valid, and you can't introduce PGP into a normal HTTPS browsing session without a new protocol.