
Authenticating Shared Web Caches - luu
https://jamey.thesharps.us/2020/06/13/authenticating-shared-web-caches/
======
RileyJames
When thinking about the same problem I discovered TLS Notary, which was
interesting.

[https://tlsnotary.org/](https://tlsnotary.org/)

[https://github.com/tlsnotary/tlsnotary](https://github.com/tlsnotary/tlsnotary)

At the time I was interested in using website data on the blockchain. The
limitation was that proofs couldn’t be processed in a contract, they had to be
passed off to a trusted 3rd party. Unclear on the state of play today.

~~~
saurik
This can't get better (than having trust in a third party) as being able to
prove these things actually goes against one of the common security guarantees
of the cryptography protocols used by TLS. You need the server to actually
play along and give up their right to deniability, such as by supporting the
new Signed Exchanges protocol being worked on for AMP.

------
tylerhou
You might be able to use a blockchain here.

Each block in the chain stores "transactions" that are cryptographically
secure hashes of the content of a URL. To archive an entire page, which may
link to various resource URLs, you would need multiple transactions; one for
each resource. Effectively, a transaction acts like a claim that a page looked
a certain way on a certain date.

Mining operates like a normal blockchain where blocks are validated by other
parties based on downloading the content of the URL, comparing it to the hash,
and discarding the block if the content doesn't hash to the same value. Miners
are compensated either with a type of in-chain currency or a smart contract
granting them another cryptocurrency based on how many blocks they mine.

This system still has a few flaws:

1) Websites could still prevent their archival by changing the website on each
access. Since each node in the network downloads the page independently, the
downloads must all serve the same content for the transaction to validate.

2) Malicious miners could request archival of a website, then change the
content of the website as soon as a new block is mined. With enough luck, they
can prevent the verification of a new block, stopping the original miner from
claiming their reward.

~~~
captn3m0
1 is a pretty big flaw. As a trivial example, the "Date" header changes for
every request on most web servers. News websites have minute to minute
updates.

The problem statement in the blog post covers these issues and sticks to
validatation of a already-downloaded cache.

------
jswrenn
As I've discovered in the past few weeks of archiving a soon-to-be offline
service: archiving is tricky and often requires tampering with traffic.

The IA uses a MITM proxy [0] to produce WARC files from intercepted traffic.
Another software suite for producing high-fidelity archives injects JS [1]
into pages served by its MITM proxy.

[0]
[https://github.com/internetarchive/warcprox](https://github.com/internetarchive/warcprox)
[1]
[https://github.com/webrecorder/wombat](https://github.com/webrecorder/wombat)

~~~
toomuchtodo
Have you tried grab site [1]?

[1] [https://github.com/ArchiveTeam/grab-
site](https://github.com/ArchiveTeam/grab-site)

------
Confiks
This idea is similar to DECO [1]:

"DECO is a privacy-preserving oracle protocol. Using cryptographic techniques,
it lets users prove facts about their web (TLS) sessions to oracles while
hiding privacy-sensitive data."

[1] [https://www.deco.works/](https://www.deco.works/)

------
dane-pgp
A similar idea, but using email DKIM keys instead of web TLS keys, is WebFist:

[https://github.com/bradfitz/webfist](https://github.com/bradfitz/webfist)

It makes me wonder if some day we could break our data out of their silos by
storing responses from the sites where we currently have our accounts. In
theory, my account on this site should be able to express a "like" on a Tweet,
or an up-vote on a Reddit post.

I'm not sure how to do that in a way that doesn't break privacy expectations,
and that doesn't involve using a blockchain somewhere, unfortunately.

~~~
anderspitman
Are you familiar with the IndieWeb stuff? This sounds kind of like WebMention.

~~~
dane-pgp
Thank you. I seem to keep "independently" reinventing WebMention:

[https://news.ycombinator.com/item?id=23097466](https://news.ycombinator.com/item?id=23097466)

There are probably a few pieces missing beyond WebMention to make a
decentralised pseudonymous reputation/voting system, but you're right that I
should follow the IndieWeb stuff more closely.

------
specialist
From article, Jamey Sharp:

 _"...not terribly useful but still interesting enough to share.

Say you want to give somebody a copy of a web page that you’ve previously
retrieved, and you want to convince them that what you’re giving them really
did come from the original web server at some point."_

Au contraire.

Solving this fixes journalism and "fake news". Mostly.

Imagine coupling this with QuoteBack, recently featured on the HackerNews
front page.

Journalism: cite your sources, share you data, sign your work. Anything less
is gossip.

------
bandie91
why not replace TLS with GPG/PGP? completely?

~~~
captn3m0
> for the purposes of this exercise I want to be able to do it for any web
> server as it is deployed today, rather than requiring support for a new
> protocol.

We want the server to attest that the content I'm signing is valid, and you
can't introduce PGP into a normal HTTPS browsing session without a new
protocol.

