
How Merkle Trees Enable the Decentralized Web - EGreg
https://taravancil.com/blog/how-merkle-trees-enable-decentralized-web/
======
basemi
A similar concept (Merkle-DAG) is implemented by IPFS (ipfs.io)

"A Merkle-DAG is similar to a Merkle tree in that they both are essentially a
tree of hashes. A Merkle tree connects transactions by sequence, but a Merkle-
DAG connects transactions by hashes. In a Merkle-DAG, addresses are
represented by a Merkle hash. This spider web of Merkle hashes links data
addresses together by a Merkle graph. The directed acyclic graph (DAG) is used
to model information. In our case, modeling what address has stored specific
data. "

([https://www.cio.com/article/3174193/healthcare/from-
medical-...](https://www.cio.com/article/3174193/healthcare/from-medical-
records-to-merkle-trees-with-ipfs.html))

~~~
EGreg
Why can't there be cycles in the Merkle graph

That would be cool. You just have a partial order between children and
parents, like in git.

~~~
glic3rinu
You would need to know the hash of a yet-to-exist node to create cycles on a
Merkle graph :)

~~~
eternalban
It is really not that different than the 'Grandfather Parsdox': Merkel graphs
are inherently 'causal sequences' and when combined with nodes as 'spaces of
things' they define the evolving manifold of a 'world of things'. This seems
to inform a sort of deep structure in our reality manifesting in phenomena
such as Golden Section, etc. [speculaing at the end there, of course.]

------
dracodoc
To rephrase my comment:

Some disadvantage of OP's method:

\- hash code is difficult to read for human. Urls under some hierarchy share
some common patterns, it also bear some meanings. All hash code will look same
and have nothing to hint on the content.

\- You have to copy it, almost impossible to type it, or compare two visually
similar string.

\- You may end up with some link shortening service for hash code, but you can
use link shortening to solve the portable file host problem already.

The Merkle Trees can solve some problems, but I don't think portable urls are
the right one.

~~~
subless
I agree 100%. Also, Merkle Trees would only benefit static content uploaded to
the Internet. Updating any dynamic content would constantly generate a new
root hash meaning a new URL with each update to that specific content. It's
just not a good option for anything other than static content in general.

One place I could see it being valuable is with an online archiving type
service like Archive.org where the content doesn't change except when a new
snapshot of that content is recorded displaying any changes made to that
content.

~~~
um304
Exactly! Editing content is the elephant in the room which author didn't even
address

~~~
tbv
It was only a 10 minute talk, so I had to cut _something_ out ;)

Blahah’s comment above may interest you:

[https://news.ycombinator.com/item?id=15541392](https://news.ycombinator.com/item?id=15541392)

------
indigochill
From a dumb layman's perspective (mine), it seems as though the addressing
issue has a very simple solution: to refer to some arbitrary file which is
hosted by some known third-party (say, a video on YouTube), provide a link to
it from your server and distribute the link to that link to others. When
switching from YouTube to Vimeo, simply change the one link on your server and
the link you distributed remains valid.

This also avoids the danger of hash collisions, although it is vulnerable to
third-party hosts taking down the content (although again, there's not much
cost to that. Just move the content elsewhere and change your redirect link).

With this method you do still have some server responsible for hosting the
content rather than distributing that load, but that's actually a saving when
looking at the storage load across the entire ecosystem.

It's a cool article and I really like the idea of content addressing instead
of host addressing. I just feel like it's too divergent from the web as most
people use it today (and I'm mildly concerned about malicious hash
shenanigans), whereas the above method can be used and understood right now by
many web users.

~~~
ENGNR
CDN's have made hosting pretty easy to scale from a low powered controller,
the hard problem is working out when to invalidate the cache. Which is easily
solved by.... another layer of merkle trees actually! It's Angela Merkel's all
the way down

------
abrax3141
It’s not so much the M-trees that make this work, as the idea of distributing
the leaves around so that you can’t efficiently corrupt the tree by replacing
it. If it was all in one machine you could just fake the tree. According to
the original bitcoin paper, this idea came from Haber and Stornetta. (I have a
personal theory that Stornetta is Satoshi ... who doesn’t reference their own
papers multiple times?!)

------
yosito
Can someone ELI5 what happens when I want to change a piece of data in a tree?
What if I have a document, and I notice a typo, and I want to fix it?

~~~
nattmat
By using an append-only Merkle tree, you make 'edits' by adding a new piece of
data with information on what was changed in the older data. You get the
benefit of also keeping a version history. I think Git works like this.

~~~
palunon
Note that Git commit graph isn't really a tree, but a directed acyclic graph,
in which nodes (commits) points to actual Merkle trees (trees/blobs).

(Also, git keeps the full data, not only diffs)

------
skate22
Good article, but there are a lot of claims of '100% certainty' that arn't
necessarily true. The author even states that hash functions only guarentee no
collisions to a high probability.

~~~
DennisP
It looks like there are about as many Earth-like planets in the universe as
grains of sand on the Earth. Write your name on a grain of sand on one of
those planets. Now have someone else randomly pick a single grain of sand from
some planet in the universe. How certain are you that they won't pick yours?

The chance of that happening is roughly equal to the chance of a collision
randomly occurring somewhere in a few quadrillion SHA256 hashes.

[https://crypto.stackexchange.com/questions/52261/birthday-
at...](https://crypto.stackexchange.com/questions/52261/birthday-attack-
against-sha256)

[http://www.npr.org/sections/krulwich/2012/09/17/161096233/wh...](http://www.npr.org/sections/krulwich/2012/09/17/161096233/which-
is-greater-the-number-of-sand-grains-on-earth-or-stars-in-the-sky)

[https://www.cnet.com/news/the-milky-way-is-flush-with-
habita...](https://www.cnet.com/news/the-milky-way-is-flush-with-habitable-
planets-study-says/)

~~~
dracodoc
It's totally possible the reality is different from the theoretical limit. See
collision attacks against MD5, SHA-0 by Wang Xiaoyun.

[https://en.wikipedia.org/wiki/Wang_Xiaoyun](https://en.wikipedia.org/wiki/Wang_Xiaoyun)

~~~
api
Those are directed attacks not random collisions.

~~~
skate22
The article uses the context of content validation from untrusted sources lol

~~~
DennisP
As long as the hash function remains unbroken, untrusted sources can't screw
with you.

Hash functions tend to be broken gradually and publicly, and we migrate to new
ones as they start to look shaky. It's theoretically possible for someone to
privately break a function that everyone else thinks is secure, but it would
be an extremely impressive achievement since lots of full-time cryptographers
work on breaking these things and publish every little bit of progress they
make.

------
asadlionpk
Video link of the talk if anyone wants:
[https://youtu.be/wGB5AYvFjxE?t=4h26m30s](https://youtu.be/wGB5AYvFjxE?t=4h26m30s)

------
peterwwillis
> The Web is centralized,

It isn't.

> but why?

"The web" is a collection of independent networks providing a means to
traverse hyperlinked text documents across any network that is addressable.

Not only are servers not a problem, host-based addressing is what makes the
web work at all, as numerical addressing by itself would have killed the web's
growth long ago, and users would have no reasonable way to address content.

~~~
pfraze
You're ignoring the data silo problem. Of course the Web is decentralized in
terms of accessing web pages, but-- the majority of data is not composed of
pages, and the majority of actions on the Web are not composed of hyperlink
traversals. For the applications on the Web to be decentralized, the data and
behaviors have to be decentralized as well, and they are not.

~~~
peterwwillis
The majority of data _is_ composed of pages, but the biggest uses of the web
today are not. The web isn't the web anymore. It's a shitty application
platform.

In this sense, the web works exactly how any application platform does. You
can't get content out of old apps, or apps that no longer run, or on systems
that no longer run. This isn't data siloing, this is just legacy applications.

With web apps the data is siloed, but it doesn't have to be, and didn't used
to be. The Internet Archive is proof. Getting the data out isn't that
difficult, IF it 's not hiding behind a web app. The difficult part is to
convince people to stop writing applications which _are_ siloed and _do_
prevent easy access to content that the web _used_ to provide.

Peer to peer networks are not a solution to incompatible legacy applications.
It's like a vehicle which is immobile when it runs out of gas. Instead of
building the vehicle so it can still be used when it lacks power, they're
changing the way the roads work. It's ridiculous.

~~~
pfraze
You can't just wish that people would build apps a different way though,
right? You have to change the system somehow. So, this is how we're doing it.
No legacy support, but it does have interesting new use-cases that can disrupt
the legacy.

