Hacker News new | past | comments | ask | show | jobs | submit login
Store the proof of a webpage saved with SingleFile in Bitcoin (woleet.io)
107 points by gildas 27 days ago | hide | past | web | favorite | 77 comments



Since there's little technical detail it's hard to be certain, but I doubt this is useful as a proof in the way one might wish (that is, proving that a web server delivered content X at date Y). The reason is, it does not appear that anything prevents the user from modifying the web page and then generating a "proof" about the modified version.

TLSNotary (tlsnotary.org) is an example of a project that attempts to use a (modified) TLS connection for non-repudiation (which is roughly the property that we would want here), but it requires a trusted third party to act as the notary.

It's possible this project is taking a similar approach (which would be fine, for those who trust the trusted third party). But given the lack of technical detail, and reading between the lines, I don't see a reason to believe this is the case.

(Happy to be wrong, though! Maybe a more detailed description would help us understand what's going on.)


> I doubt this is useful as a proof in the way one might wish (that is, proving that a web server delivered content X at date Y). The reason is, it does not appear that anything prevents the user from modifying the web page and then generating a "proof" about the modified version.

The point of this service is to prove data existed at a certain time. It can't prove the authenticity of the data.


The linked post says something else:

> It is therefore quite natural that this collaboration was born, allowing all users of the extension to retrieve a neutral, irrefutable and usable evidence worldwide, including in court.


This is potentially very useful for IP. It's a great way to provide some proof that you had an idea before anyone else.


Basically, you're right. I found the idea interesting despite the fact that the solution isn't "perfect". For your information, the code used to upload the proof is here [1]. A SHA256 of the content of the saved page is generated (i.e. the "hash" parameter the "anchor" function) and is sent to the API of Woleet.

[1] https://github.com/gildas-lormeau/SingleFile/blob/master/ext...


Another possible solution to what you want to solve is to use a signed exchange signature: https://wicg.github.io/webpackage/draft-yasskin-http-origin-...

The publisher server must support it, but this results in the document being signed with the publisher's certificate. This won't "date" the signature, but combined with a blockchain solution like this could prove that a web server delivered content X at date Y.


> I doubt this is useful as a proof in the way one might wish (that is, proving that a web server delivered content X at date Y). The reason is, it does not appear that anything prevents the user from modifying the web page and then generating a "proof" about the modified version.

Thanks for saying it so clearly, this is exactly the first thing I thought (and it reduces the value of the functionality by heaps, since everything can be forged).


hi, I'm woleet's ceo. I can give you all the details you want. To explain it simply, for each signleFile export we "anchor" the hash in Bitcoin. It means each hash is link to one particular bitcoin transaction. Feel free to ask any question I'll be happy to answer


Actually, I also cannot find any information how this is actually achieved by your service. As OP mentioned, if "just" a browser extension is used to create a hash of the html page, one could use dev tools to modify the dom inside the browser and then create the hash...


I think the OP is thinking in a scenario where if the proof is generated locally, it just proves the existence of that file, not that the file was public on the internet, you could use a proxied network (or just hosts file ?) to fool the browser extension. I'm not implying this is the case, but if you could explain how it is implemented would be great.


I think kwantam's point is that you are merely storing a hash of the resulting file, it proves it existed at a certain datetime, but it doesn't guarantee it wasn't modified (which TLSNotary does, albeit with a trusted third party required)

This wasn't clear from the link, as there was very little technical information provided.


I get it, and no, nothing proves you didn't modify it. Maybe a solution is to create some king of "witness community" stamping the same page at the same time. It will have diferent hashes each time and the evidence could be stronger in the end


If the website uses SSL, would it be possible to prove that the server signed the particular sequence of bytes you received? That doesn't prove that nobody modified the data but it does prove that anyone who did was able to sign things with a key that nobody else should have access to.


My understanding is that with usual SSL, it seems that one cannot make a proof based on your interaction with the server that you could use to convince a skeptic at some undetermined later time, However, others here have mentioned the tlsnotary idea which, if you are interacting with the person you want to prove it to live while you get the info from the server, then you can. And, if the server supports the TLS-N extension, then you can instead make a proof that should convince arbitrary people later.


That is actually a problem. It should be usable inside the deep web.e.g. to proof that i made a purchase at a website and as a second step that my review of thepurchase is real.


This sounds extremely expensive due to the cost of bitcoin transactions. What ZeroNet is doing seems much more feasible. https://zeronet.io/


If they are using Merkle trees to aggregate pages, they can have many page proofs in one transaction.


We use layer 2 technology, the main idea of woleet is to stamp many hashes (possibly millions of hashes in one bitcoin transaction) our service is running for years and we produce thousands of proofs daily.


what do you mean with layer 2 technology? If you're talking about the data link layer of the OSI model I am not sure how that applies here..


"Layer 2 refers to a secondary framework or protocol that is built on top of an existing blockchain system."

from https://www.binance.vision/glossary/layer-2


Layer 2 is essentially an application layer abstracted on top of a blockchain. Among other things it is a way to allow for better scalability.


It basically means off-chain, so I'm not quite sure where the accomplishment is. Sorry for the bluntness, not trying to be an asshole. Open to further explanation.


How do you prove that the user did not modify the page before computing the hash?

Do you have documentation of how this is done?


There are some, many even, instances where the kind of fraudulent modifications you might want to do could not realistically be done at the time the page is saved, but could be done at a later date.

So for that, it's useful.


Could you describe such scenario? I fail to find one where this would be useful.


this tool does not guarantee that you've not modified the export before you stamp it. The main protection is the timestamp and the fact that the hash is calculated by the extension itself. This proof just guarantee that this particular file existed at this date. I personnally believe that even if it's not a siver bullet, the certain date is the main protection it provides. If you want to make some fraud with a bitcoin timestamp, you need a proper timing and preparation. In conclusion it just makes things harder


Being able to prove that a certain webpage exists locally at a given date is rather useless. The only utility there is if the page contains confidential information and you want to prove that you had that information at that time, but you could do that just as easily without saving a whole web page in the process.


Yeah, ironically the whole point of SingleFile for me is that I can locally edit webpages to strip out ads and other crap so I can send them to friends or family who might not have ad blockers.


The article is light on technical detail, but I presume this works by generating a hash value from the page of interest. Then a service such as opentimestamps (https://opentimestamps.org) gathers a bunch of these hashes together, publishing the Merkel root as data in an OP_RETURN transaction output.

If so, one should be aware of a kinda sneaky attack that might be feasible.

Let's say I want to prove my clairvoyance by predicting the winner of the 2020 presidential election. I generate a text file containing the name of my pick. Then I hash the result. Next, I publish the hash value to the block chain using an OP_RETURN output.

On November 4, 2020, I publish the transaction ID containing the hash of my predicted winner. I also publish the text file used to generate the hash value. Clearly, I must have known that hash value when the transaction was confirmed, implying that I knew the winner of the election at that time as well.

Except I cheated. Instead of making just one transaction, I made two. The second one contained an OP_RETURN output with the hash value of a document containing the name of the opponent.

On the day after Election Day, I simply publish the transaction ID I know to contain the winner, and never mention the existence of the other transaction.

Depending on how SingleFile works, it may be possible to do a similar attack.

Also, you don't exactly get a proof that pinpoints the date. Rather, you get proof that the hash value existed as of the date the Bitcoin transaction gets its first confirmation.


As the founder of OpenTimestamps I wanted to say you're analysis is absolutely correct. People often don't realize how weak timestamp proofs are - you really need to think carefully about what exactly is being proved by one and take into account the "timestamp all the things" attack. This is particularly true in efficient, scalable, timestamping solutions like OTS where timestamps are essentially free to create: an attacker could write a script to create via brute force literally trillions of alternate variations of a prediction.


I don't see this as a real issue because of course this protocol can only prove that a certain website existed as represented at that time. It's impossible to prove something didn't exist anyways. Can you prove I'm not wearing 8 different hats stacked on top of each other right now? No, impossible without a video recording and method of biometric verification both time-stamped showing I wasn't wearing the hats. This only works because there is arguably only one me at a time.


You can also have more precise time stamps with the Roughtime protocol [1] (Peter Todd has talked about this [2]), although it is easier for the servers in the Roughtime protocol to cooperate and forge time stamps than in Bitcoin.

Another way to prevent "double spending"/attacks like these, you can burn Bitcoin with each publishing of the hash, ensuring that it is at least costly for an attacker to preemptively timestamp all of the probability space.

Your security analysis looks correct to me.

[1]: https://roughtime.googlesource.com/roughtime

[2]: https://groups.google.com/a/chromium.org/d/msg/proto-roughti...


This analysis is correct and it looks like SingleFile doesn't provide protection from this class of attack.

However, there are solutions that can protect against this type of attack. The attack you describe above is similar to what is known as a "double-spend" attack. Using a digital signature you can prove that a particular key (and therefore user) signed a particular transaction, but cryptography alone cannot prove that there was not another competing transaction that spends the same funds. Prior to Bitcoin, digital currencies required a central database to prevent these double-spends. With Bitcoin and subsequent cryptocurrencies this attack is prevented via a distributed database with a consensus algorithm.

Peter Todd (and others) have generalized prevention of double-spends to the idea of a "single-use seal" which can be implemented using the Bitcoin blockchain as an implementing decentralized service. [1] The idea is that there is some kind of formal protocol used (a higher layer in the protocol stack) and in the example above, the political prognosticator ties the hash of their single prediction of the winner of the presidential election to a particular Bitcoin UTXO (unspent transaction output) which serves a "single-use seal". The combination of the higher-layer messaging protocol and the "lock" to a single commitment tied to the UTXO limits the prognosticator to a single prediction. They don't have to reveal their prediction at the time it is made, but they do have to publicly (or privately) commit via a higher-level protocol to the single prediction.

[1] https://petertodd.org/2016/commitments-and-single-use-seals

Update: For the content hash of the (SingleFile) web page, you could have a sequence of these single-use seals each creating a new, linked transaction (UTXO) every time the website content is changed. This would allow users to verify the current content or what the content was at a particular time in history. As long as the root of this chain was somehow published (uniquely) at some start time in the past (in OpenSeals Terminology [2] this is called a "Root Proof".

[2] https://github.com/rgb-org/spec/blob/develop/01-OpenSeals.md...


At my previous job, in a legal-tech company, we used Woleet to build a copyright protection product for intellectual property. However, I believe IPFS [1] is a superior solution for proof-of-existence, compared to timestamping on Bitcoin.

With Woleet, you must keep the original payload (file + personal identification) that was timestamped, for eternity. In the event of a copyright violation, you must be able to prove in front of a judge that hash of the file in your possession is indeed what exists on the Bitcoin blockchain.

With IPFS, you only need to save the hash of the payload (or a human-readable name, with IPNS [2]), to convince the judge that you authored the original file at a certain point in time. Additionally, IPFS has version control. This means that if you want to prove to a court that some revision to the T&Cs of your product were made before a certain date, it makes more sense to use IPFS.

[1] https://ipfs.io [2] https://docs.ipfs.io/guides/concepts/ipns


You can't prove a file existed before a certain date with IPFS like you can with Bitcoin.


Yes, if I understand IPFS correctly, you can. Since IPFS works as a content addressed system, if you embed the date, send the document to the judge (the hash which is based on the content), don't show it until a later point, you can prove the document is the same as you sent, even without revealing the content until later.

IPFS doesn't seem to have anything about "version control" as onyb mentioned.


what prevents me from pre-dating a document on IPFS?


how you will embed the date?


The IPFS hash is a hash of the content. Simply including the date as text would suffice.


What stops you from backdating a document when you write it?


Nothing. Including the date doesn't do anything other than commit yourself to stating that date, the important part is the date at which you commit to the hash.


I'm assuming to you add the document (with date) to IPFS without being connected to the network (no actual data gets shared, only hash), gets the hash and send it over to the person you need to prove the document's date to. They won't be able to get it (since content itself is offline) and once you want to prove it, you add it again or make it otherwise online. Then they can fetch the same hash and confirm they received the hash at the date.


I am confused, how you can prove the date and ownership ? Does IPNS have some kind of timestamp?


There's already a better service that timestamps files in Bitcoin. It also uses blockchain space efficiently using servers that aggregate data that must be timestamped into a single Bitcoin transaction. You just need to publish a Merkle root and hold onto your Merkle proof.

https://opentimestamps.org/ https://petertodd.org/2016/opentimestamps-announcement


Nice. I'll give it a try when I get to my primary PC.

Journalists have a bad habit of linking to tweets which are often ephemeral because accounts are deleted, tweets are deleted, or accounts go private.

Another problem is where publishers themselves change the open graph meta (or whatever it's called) after a tweet has been published. One memorable example (for me) is where Washington Post changed the image on an article about Alexandria Ocasio Cortez's Jewish heritage depicting her with her hands clasped similar to the Happy Merchant meme[0]. Obviously they realized the resemblance enough to change the image, but didn't comment on it. If you look at the original tweet[1] now, you can see the replies look completely out of context because they changed it.

[0]:https://knowyourmeme.com/memes/happy-merchant [1]:https://twitter.com/washingtonpost/status/107212454556018278...


Great solution. The previous blockchain enabled solution for this problem that I’d found was https://tlsnotary.org/.

A few questions, how does notarising the tls handshake vs the entire document differ in terms of “proof”.

Is one form of proof better than the other? Or do they prove something different?


I believe this technique of storing hashes on a blockchain is how public figures should be inoculating against the upcoming risk posed by deepfakes.

If a video surfaces that’s faked from an existing hashed one, that’s a very easy proof.


Solid idea actually! Can you think of a way to wrap this as a product?


Fun challenge.

As a product it could be an independent video verification service.

On-demand: A client wants to release a video and make sure that it’s trusted. They contact the company first, some process is done to make sure that they represent who they say they represent, and the content is hashed and added to the blockchain prior to release.

Ongoing: when a client uploaded a video to the web, the system would automatically grab it, generate the hash, and add it to the blockchain along with the metadata of where it was found and who the client was.

For large clients, it could even include video cold storage.

For marketing/promotion, the company could automatically process public media from well-established social media feeds.


How would this work? We’d still need something that can detect similar/probably deepfaked content (a good cryptographic hash has random distribution).


At it’s most basic it only focuses on the original source with the original encoding. For consistency, metadata would be stripped before hash, but it’s understood that even “save as” can produce a file with a totally different hash.

It would be up to the creator (or some third party) to keep the original unedited video so that if there was ever a dispute about a fake surfacing that original could be undeniably verified to be the authentic one.


Furthermore common actions such as re-encoding a video for streaming will generate a copy of the content that's identical but a completely different hash which will generate an ecosystem of real and deepfaked content with a similar amount of noise that deepfaked content thrives in.


Suspicious new accounts commenting


Interestingly, at least one deleted their comment after it was flagged, just in case any reader sees this and thinks "Wait, there's only two suspicious new accounts commenting!"


HN will delete them too, I think. Both are now gone


No, only one is. Turn showdead on.


A more complete way is to put all of the website data on the blockchain including metadata like https://etched.page does. Other files are added in a way that you can see how the page looked at the time. The user selects what files from the document is kept (.css is normally a good idea).

The website, and the metadata about when it was stored including a signature from etched.page can be unpacked directly from chain.


What’s wrong with just a hash of all the relevant files? You can then store the actual files conventionally and provide them upon request. The security guarantees are still there (you implicitly trust SHA256’s resistance against collision attacks if you use Bitcoin).


Why not use opentimestamps.org for this extension instead?


Yeah, having an open, easy to verify standard is part of what makes a “proof” a good proof. No need to reinvent the wheel here.


we do not reinvent the wheel here the proof generated is downloadable in the chainpoint format (https://w3id.org/chainpoint/v2) and we also support Opentimstamps format. The verification tool allows you to download the chainpoint V2 format only though


If webpages supported tls-N, then this seems like it could be cool, but as is, I don’t see what this does beyond what originstamp (and similar services) provide . The tls notary thing others have mentioned here sounds cool, and I hadn’t heard of it. The interactivity allowing for a proof even without the server supporting like tls-n is impressive, if I am understanding correctly!


The Rebooting The Web of Trust workshops and conference represents an interesting intersection between self-sovereign identity authentication & verification, certificate validation, and reputation assessment.

The website proof on btc blockchain seems to touch on these concepts.

The W3C Digital Verification Community Group is working on a number of interesting solutions for digital verification as well. https://www.w3.org/community/digital-verification/ https://w3c-dvcg.github.io/http-signatures/

Does anyone have any experience working with the w3 digital verification stuff to help inform us on how this is progressing in the wild?


Why not just timestamp it with RFC 3161 timestamp server? BitCoin sounds like a bit of an overkill here.


Or, for most people, just tweet a hash


Or a combined hash with the NIST randomness beacon. https://beacon.nist.gov/home


Actually, uploading to Bitcoin is tantamount to using the NIST randomness beacon since the NIST randomness beacon is injected into Bitcoin: https://github.com/opentimestamps/nist-inject


Is there a simple solution to allow people to timestamp a static webpage like Patrick McKenzie does here:

https://twitter.com/patio11/status/958494488061595649


this is a proof that a user viewed a webpage rather than a proof that a webpage existed with given contents, which would need many users to find consensus on the content of a webpage.

as a result of this model, there's no way to verify the content is 'correct', since the user can arbitrarily modify it before submitting

so my understanding of this is that cryptographically it can be used to say that a user submitted some content at a given time.

what's the use case for this?


Prior art in patent, trademark, or copyright cases.


This sort of thing is going to become very important for photos and videos. Being able to prove that an original earlier unaltered version of a video exists will allow us to reject AI-altered fake videos.

Indeed, once it becomes commonplace, videos that do not have a blockchain record recorded close to creation time will be suspect. The less time an AI has to alter a video, the less likely alteration is.


You could use Bitcoin SV to store the actual file on the blockchain, but SingleFile creates an enormous HTML file. So, you're better off printing the page as a PDF, encode in the blockchain, and only spend 10% of what you would have spent (250kb SingleFile vs 25kb PDF).


I can understand you prefer the PDF format for saving HTML webpages even though the PDF format is not adapted for this use-case. But do all the pages on the Internet saved with SingleFile weight 250KB and all the pages saved in PDF weight 25KB? I did a quick test by saving https://github.com/gildas-lormeau/SingleFile/issues?q=is%3Ai... and the HTML file generated by SingleFile is almost 3 times smaller than the PDF file.

FYI, the SHA256 of the page is stored in Bitcoin, not the full page.


B:// Bitcoin Simple Storage Protocol https://b.bitdb.network

C:// Content Addressable Files over Bitcoin https://c.bitdb.network


I use DEVONThink, and it saves stuff in Web Archive (WARC) format when I want it to. Like Evernote clipper, but better.

I also use httrack to download files offline, which I can then have DEVONThink index. Bam, offline search engine!


While I'm intrigued and the personal version at 100$ looks okay, the server version (which is the only one I could use, as I have no device running an Apple OS) costs 500 fucking $!

Does it even run on a Linux server? The website is super low on information.


Can we alternatively get the ability to upload that to IPFS?




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: