
Store the proof of a webpage saved with SingleFile in Bitcoin - gildas
https://blog.woleet.io/woleet-singlefile-extension-bitcoin-proof/
======
kwantam
Since there's little technical detail it's hard to be certain, but I doubt
this is useful as a proof in the way one might wish (that is, proving that a
web server delivered content X at date Y). The reason is, it does not appear
that anything prevents the user from modifying the web page and then
generating a "proof" about the modified version.

TLSNotary (tlsnotary.org) is an example of a project that attempts to use a
(modified) TLS connection for non-repudiation (which is roughly the property
that we would want here), but it requires a trusted third party to act as the
notary.

It's possible this project is taking a similar approach (which would be fine,
for those who trust the trusted third party). But given the lack of technical
detail, and reading between the lines, I don't see a reason to believe this is
the case.

(Happy to be wrong, though! Maybe a more detailed description would help us
understand what's going on.)

~~~
gill3s
hi, I'm woleet's ceo. I can give you all the details you want. To explain it
simply, for each signleFile export we "anchor" the hash in Bitcoin. It means
each hash is link to one particular bitcoin transaction. Feel free to ask any
question I'll be happy to answer

~~~
hanniabu
This sounds extremely expensive due to the cost of bitcoin transactions. What
ZeroNet is doing seems much more feasible.
[https://zeronet.io/](https://zeronet.io/)

~~~
gill3s
We use layer 2 technology, the main idea of woleet is to stamp many hashes
(possibly millions of hashes in one bitcoin transaction) our service is
running for years and we produce thousands of proofs daily.

~~~
this_was_posted
what do you mean with layer 2 technology? If you're talking about the data
link layer of the OSI model I am not sure how that applies here..

~~~
Ohn0
"Layer 2 refers to a secondary framework or protocol that is built on top of
an existing blockchain system."

from
[https://www.binance.vision/glossary/layer-2](https://www.binance.vision/glossary/layer-2)

------
aazaa
The article is light on technical detail, but I presume this works by
generating a hash value from the page of interest. Then a service such as
opentimestamps ([https://opentimestamps.org](https://opentimestamps.org))
gathers a bunch of these hashes together, publishing the Merkel root as data
in an OP_RETURN transaction output.

If so, one should be aware of a kinda sneaky attack that might be feasible.

Let's say I want to prove my clairvoyance by predicting the winner of the 2020
presidential election. I generate a text file containing the name of my pick.
Then I hash the result. Next, I publish the hash value to the block chain
using an OP_RETURN output.

On November 4, 2020, I publish the transaction ID containing the hash of my
predicted winner. I also publish the text file used to generate the hash
value. Clearly, I must have known that hash value when the transaction was
confirmed, implying that I knew the winner of the election at that time as
well.

Except I cheated. Instead of making just one transaction, I made two. The
second one contained an OP_RETURN output with the hash value of a document
containing the name of the opponent.

On the day after Election Day, I simply publish the transaction ID I know to
contain the winner, and never mention the existence of the other transaction.

Depending on how SingleFile works, it may be possible to do a similar attack.

Also, you don't exactly get a proof that pinpoints the date. Rather, you get
proof that the hash value existed as of the date the Bitcoin transaction gets
its first confirmation.

~~~
petertodd
As the founder of OpenTimestamps I wanted to say you're analysis is absolutely
correct. People often don't realize how weak timestamp proofs are - you really
need to think carefully about what exactly is being proved by one and take
into account the "timestamp all the things" attack. This is particularly true
in efficient, scalable, timestamping solutions like OTS where timestamps are
essentially free to create: an attacker could write a script to create via
brute force literally _trillions_ of alternate variations of a prediction.

------
onyb
At my previous job, in a legal-tech company, we used Woleet to build a
copyright protection product for intellectual property. However, I believe
IPFS [1] is a superior solution for proof-of-existence, compared to
timestamping on Bitcoin.

With Woleet, you must keep the original payload (file + personal
identification) that was timestamped, for eternity. In the event of a
copyright violation, you must be able to prove in front of a judge that hash
of the file in your possession is indeed what exists on the Bitcoin
blockchain.

With IPFS, you only need to save the hash of the payload (or a human-readable
name, with IPNS [2]), to convince the judge that you authored the original
file at a certain point in time. Additionally, IPFS has version control. This
means that if you want to prove to a court that some revision to the T&Cs of
your product were made before a certain date, it makes more sense to use IPFS.

[1] [https://ipfs.io](https://ipfs.io) [2]
[https://docs.ipfs.io/guides/concepts/ipns](https://docs.ipfs.io/guides/concepts/ipns)

~~~
jmeyer2k
You can't prove a file existed before a certain date with IPFS like you can
with Bitcoin.

~~~
capableweb
Yes, if I understand IPFS correctly, you can. Since IPFS works as a content
addressed system, if you embed the date, send the document to the judge (the
hash which is based on the content), don't show it until a later point, you
can prove the document is the same as you sent, even without revealing the
content until later.

IPFS doesn't seem to have anything about "version control" as onyb mentioned.

~~~
bluesign
how you will embed the date?

~~~
jstanley
The IPFS hash is a hash of the content. Simply including the date as text
would suffice.

~~~
lmm
What stops you from backdating a document when you write it?

~~~
jstanley
Nothing. Including the date doesn't do anything other than commit yourself to
stating that date, the important part is the date at which you commit to the
hash.

~~~
capableweb
I'm assuming to you add the document (with date) to IPFS without being
connected to the network (no actual data gets shared, only hash), gets the
hash and send it over to the person you need to prove the document's date to.
They won't be able to get it (since content itself is offline) and once you
want to prove it, you add it again or make it otherwise online. Then they can
fetch the same hash and confirm they received the hash at the date.

------
marcinjachymiak
There's already a better service that timestamps files in Bitcoin. It also
uses blockchain space efficiently using servers that aggregate data that must
be timestamped into a single Bitcoin transaction. You just need to publish a
Merkle root and hold onto your Merkle proof.

[https://opentimestamps.org/](https://opentimestamps.org/)
[https://petertodd.org/2016/opentimestamps-
announcement](https://petertodd.org/2016/opentimestamps-announcement)

------
dqv
Nice. I'll give it a try when I get to my primary PC.

Journalists have a bad habit of linking to tweets which are often ephemeral
because accounts are deleted, tweets are deleted, or accounts go private.

Another problem is where publishers themselves change the open graph meta (or
whatever it's called) after a tweet has been published. One memorable example
(for me) is where Washington Post changed the image on an article about
Alexandria Ocasio Cortez's Jewish heritage depicting her with her hands
clasped similar to the Happy Merchant meme[0]. Obviously they realized the
resemblance enough to change the image, but didn't comment on it. If you look
at the original tweet[1] now, you can see the replies look completely out of
context because they changed it.

[0]:[https://knowyourmeme.com/memes/happy-
merchant](https://knowyourmeme.com/memes/happy-merchant)
[1]:[https://twitter.com/washingtonpost/status/107212454556018278...](https://twitter.com/washingtonpost/status/1072124545560182784?lang=en)

------
RileyJames
Great solution. The previous blockchain enabled solution for this problem that
I’d found was [https://tlsnotary.org/](https://tlsnotary.org/).

A few questions, how does notarising the tls handshake vs the entire document
differ in terms of “proof”.

Is one form of proof better than the other? Or do they prove something
different?

------
joshspankit
I believe this technique of storing hashes on a blockchain is how public
figures should be inoculating against the upcoming risk posed by deepfakes.

If a video surfaces that’s faked from an existing hashed one, that’s a _very_
easy proof.

~~~
maxfan8
How would this work? We’d still need something that can detect
similar/probably deepfaked content (a good cryptographic hash has random
distribution).

~~~
joshspankit
At it’s most basic it _only_ focuses on the original source with the original
encoding. For consistency, metadata would be stripped before hash, but it’s
understood that even “save as” can produce a file with a totally different
hash.

It would be up to the creator (or some third party) to keep the original
unedited video so that if there was ever a dispute about a fake surfacing that
original could be undeniably verified to be the authentic one.

------
verdverm
Suspicious new accounts commenting

~~~
kick
Interestingly, at least one deleted their comment after it was flagged, just
in case any reader sees this and thinks "Wait, there's only two suspicious new
accounts commenting!"

~~~
verdverm
HN will delete them too, I think. Both are now gone

~~~
kick
No, only one is. Turn showdead on.

------
mathiasrw
A more complete way is to put all of the website data on the blockchain
including metadata like [https://etched.page](https://etched.page) does. Other
files are added in a way that you can see how the page looked at the time. The
user selects what files from the document is kept (.css is normally a good
idea).

The website, and the metadata about when it was stored including a signature
from etched.page can be unpacked directly from chain.

~~~
maxfan8
What’s wrong with just a hash of all the relevant files? You can then store
the actual files conventionally and provide them upon request. The security
guarantees are still there (you implicitly trust SHA256’s resistance against
collision attacks if you use Bitcoin).

------
fiatjaf
Why not use opentimestamps.org for this extension instead?

~~~
maxfan8
Yeah, having an open, easy to verify standard is part of what makes a “proof”
a good proof. No need to reinvent the wheel here.

~~~
gill3s
we do not reinvent the wheel here the proof generated is downloadable in the
chainpoint format
([https://w3id.org/chainpoint/v2](https://w3id.org/chainpoint/v2)) and we also
support Opentimstamps format. The verification tool allows you to download the
chainpoint V2 format only though

------
drdeca
If webpages supported tls-N, then this seems like it could be cool, but as is,
I don’t see what this does beyond what originstamp (and similar services)
provide . The tls notary thing others have mentioned here sounds cool, and I
hadn’t heard of it. The interactivity allowing for a proof even without the
server supporting like tls-n is impressive, if I am understanding correctly!

------
jonnydubowsky
The Rebooting The Web of Trust workshops and conference represents an
interesting intersection between self-sovereign identity authentication &
verification, certificate validation, and reputation assessment.

The website proof on btc blockchain seems to touch on these concepts.

The W3C Digital Verification Community Group is working on a number of
interesting solutions for digital verification as well.
[https://www.w3.org/community/digital-
verification/](https://www.w3.org/community/digital-verification/)
[https://w3c-dvcg.github.io/http-signatures/](https://w3c-dvcg.github.io/http-
signatures/)

Does anyone have any experience working with the w3 digital verification stuff
to help inform us on how this is progressing in the wild?

------
kichik
Why not just timestamp it with RFC 3161 timestamp server? BitCoin sounds like
a bit of an overkill here.

~~~
eli
Or, for most people, just tweet a hash

------
realty_geek
Is there a simple solution to allow people to timestamp a static webpage like
Patrick McKenzie does here:

[https://twitter.com/patio11/status/958494488061595649](https://twitter.com/patio11/status/958494488061595649)

------
zemnmez
this is a proof that a user viewed a webpage rather than a proof that a
webpage existed with given contents, which would need many users to find
consensus on the content of a webpage.

as a result of this model, there's no way to verify the content is 'correct',
since the user can arbitrarily modify it before submitting

so my understanding of this is that cryptographically it can be used to say
that a user submitted some content at a given time.

what's the use case for this?

~~~
maxfan8
Prior art in patent, trademark, or copyright cases.

------
stretchwithme
This sort of thing is going to become very important for photos and videos.
Being able to prove that an original earlier unaltered version of a video
exists will allow us to reject AI-altered fake videos.

Indeed, once it becomes commonplace, videos that do not have a blockchain
record recorded close to creation time will be suspect. The less time an AI
has to alter a video, the less likely alteration is.

------
WhiteOwlLion
You could use Bitcoin SV to store the actual file on the blockchain, but
SingleFile creates an enormous HTML file. So, you're better off printing the
page as a PDF, encode in the blockchain, and only spend 10% of what you would
have spent (250kb SingleFile vs 25kb PDF).

~~~
gildas
I can understand you prefer the PDF format for saving HTML webpages even
though the PDF format is not adapted for this use-case. But do all the pages
on the Internet saved with SingleFile weight 250KB and all the pages saved in
PDF weight 25KB? I did a quick test by saving [https://github.com/gildas-
lormeau/SingleFile/issues?q=is%3Ai...](https://github.com/gildas-
lormeau/SingleFile/issues?q=is%3Aissue+is%3Aclosed) and the HTML file
generated by SingleFile is almost 3 times smaller than the PDF file.

FYI, the SHA256 of the page is stored in Bitcoin, not the full page.

------
Kinnard
B:// Bitcoin Simple Storage Protocol
[https://b.bitdb.network](https://b.bitdb.network)

C:// Content Addressable Files over Bitcoin
[https://c.bitdb.network](https://c.bitdb.network)

------
crazypython
I use DEVONThink, and it saves stuff in Web Archive (WARC) format when I want
it to. Like Evernote clipper, but better.

I also use httrack to download files offline, which I can then have DEVONThink
index. Bam, offline search engine!

~~~
solarkraft
While I'm intrigued and the personal version at 100$ looks okay, the server
version (which is the only one I could use, as I have no device running an
Apple OS) costs 500 fucking $!

Does it even run on a Linux server? The website is super low on information.

------
m-p-3
Can we alternatively get the ability to upload that to IPFS?

