
BEP 30: Merkle hash torrent extension - mariorz
http://www.bittorrent.org/beps/bep_0030.html
======
amluto
I think it's sad that they're using SHA-1 for this. SHA-1 is a bit weak, and
the hashes are too short. There's a reason that SHA-1 is deprecated for X.509
certificates.

At the very least, this should use SHA-256.

If they really did it right, though, the protocol would use a secure tree
hash. The construction they're using has trivial collisions, which are only
avoided because the size of the file comes from a trusted source. A good hash
(e.g. the Sakura construction) doesn't have this problem. Fixing that would
make the resulting torrent files or URLs a bit shorter, as the size could
potentially be omitted.

~~~
nhaehnle
_The construction they 're using has trivial collisions, which are only
avoided because the size of the file comes from a trusted source._

Could somebody elaborate on this? I assume that you're referring to the fact
that (without the file size information) somebody could pretend that the
concatenation of the child hashes at an inner node is actually the file
content in this position. Is there anything else?

It seems that this could be trivially fixed by adding a single bit to the data
hashed in each node to indicate whether the node is a leaf or an inner node,
or by just adding the size information to the hash data in the root node.

Actually, you want to know the file size very early _anyway_ , since this
simplifies the data structures required to keep track of chunks you already
have, allows you to already reserve hard disk space, and so on.

~~~
AlyssaRowan
You've pretty much nailed it, yes, that and not hashing the _level_ of the
child hashes internally, you can construct a file which pretends to be upper
hashes. That is potentially not just collidable but actually second-
preimagable, given what we saw with the much older MD4-based ones - and they
used SHA-1, which wasn't a great idea either! (Although, it should be noted,
in _(2009)_ \- could a mod mark the headline such?)

The file size being there does complicate an attack - but with the weaknesses
in SHA-1, I certainly wouldn't feel comfortable with it.

This is a _disaster_ of a spec, we already had TTH at this point and that at
least did it better: it needed revising and should not be implemented by
anyone.

 _Today_ , you should consider using BLAKE2b's tree hash for this purpose. It
walks all over this construct from every direction.

~~~
jfindley
I do really like the BLAKE2b hash, but I've been concerned about actually
using it in practice (although recently I had an application which it would
have suited very well).

I'm worried that, having failed to win the SHA-3 contest it will end up
relegated into obscurity, and using obscure hashing functions isn't usually a
great idea.

Is this a valid concern, or am I placing too much weight in the NIST process?

~~~
sp332
I think BLAKE2 is too fast to be ignored. That's just a guess though.

------
qqueue
Hash trees are pretty cool. For some fancier uses, see the Peer to Peer
Streaming Protocol (PPSP), which is authored by the libswift/tribler guys.

[https://datatracker.ietf.org/wg/ppsp/documents/](https://datatracker.ietf.org/wg/ppsp/documents/)

Basically, instead of the stream source having to sign every single new chunk
(so peers can verify that they're getting the right data), the source signs
subtree hashes of the new data and slowly builds up a larger hash tree. Once
the stream is over, the complete hash tree is instantly seedable by anybody in
the original stream.

~~~
amirouche
The (other?) big win of pspp is the ability to stream live feeds.

------
zmanian
Very similar to IPFS's merkle DAG and seems like a critical element of getting
content centric networking projects like Bitorrent, Inc's Project Malestrom up
and running

------
andrewchambers
Isn't this how git works? I would have thought bit torrent did this all along.

~~~
JeremyBanks
Not exactly. Git uses per-file, per-commit, and per-directory hashes. It does
make up a sort of hash tree, but the tree does not descend _within_ a given
file. You need to hash the _entire_ file and determine its hash to know if
_any_ of the pieces you have are valid. This would be a problem for large
files if many of those pieces come from untrusted sources -- you'd have to
spend a lot of time/bandwidth downloading invalid data before you realized it.
This isn't generally the case for Git, but it for BitTorrent.

BitTorrent traditionally solves this using a hash list. All of the data in a
torrent is broken up between pieces of a chosen size, and hashes are
calculated for each of those pieces individually. (These are generally not
aligned with file boundaries, which is why you may have noticed that even if
you tell your torrent to only download a certain set of files, you may still
end up with some data from adjacent files.)

This entirely hash list is included in the torrent file. For torrents with a
lot of data, this could be 10MB or more.

If you're using a magnet link, that torrent file needs to be downloaded from
peers. This brings back the original problem: you need to download this entire
large file before you know that the peer isn't just sending you random data.

BEP-30 proposes a solution: generate a binary hash tree whose leaves are the
torrent pieces, and include only the single root hash in the torrent file to
keep it small. When you're getting pieces from a peer, they send you the
missing inner hashes of the tree that you need to verify the piece.

The minimum data transfer in ideal circumstances is increased a bit, but the
peer-to-peer system is made more robust, able to identify invalid data much
faster.

I think it's a great modification to the protocol. Unfortunately it isn't
widely-enough supported to be practical for general use.

~~~
im3w1l
>The minimum data transfer in ideal circumstances is increased by log(N)

There should be roughly as many internal nodes as there are leaves, so there
is a linear space increase. As the leaves are much bigger than the internal
nodes, the linear factor is small.

------
whyrusleeping
ipfs uses exactly the same strategy to distribute content via the hash of
their root merkleDAG node. [http://ipfs.io](http://ipfs.io)

------
edonkey
"Large torrent files put a strain on the Web servers distributing them".

Finally! bittorrent designers are acknowledging the centralized deficiency in
the torrent protocol and implementing Merkle / root hash distribution model,
as eDonkey / eMule used since 2000's:

[https://en.wikipedia.org/wiki/Ed2k_URI_scheme#eD2k_hash_algo...](https://en.wikipedia.org/wiki/Ed2k_URI_scheme#eD2k_hash_algorithm)

15 years later, everything old is new again.

~~~
gojomo
ED2K was kind of a degenerate Merkle tree: large chunks (9.5 MiB) and only one
level of leaves, all under the root.

Thus it didn't have some of the benefits of a full tree that this 2009
Bittorrent spec was hoping to achieve, such as verifying smaller-sized chunks
without a metadata-cost that grows linearly with the size of the full
resource.

(AFAIK, the 1st application of multi-level Merkle trees to P2P filesharing was
the TigerTree hash I wrote up with Justin Chapweske in 2002. At first glance,
it looks like this proposal makes the same mistake we did in our first draft,
not distinguishing between leaf and node hashes, corrected in the final
TigerTree spec version of March 2003.)

~~~
adamzochowski
edonkey has two methods of file verification. The ED2K hash as you have
mentioned. However, around 2004, edonkey world also received nicer AICH
hashes. People that generate ed2k style links are encouraged to make include
the

To recap:

ICH : inteligent corruption hash : also known as the old original ed2k method.
Root hash is md4 and is generated from a series of md4 hashes generated for
9.5mb chunks. If a file is below 9.5mb then ed2k root hash is just the real
md4 hash.

AICH : Advanced inteligent corruption hash : is a full Merkle tree using SHA1,
where chunks are 180kb (with exception of a chunk on the 9.5MiB boundary).
This weird part/chunking allows to map 53 of these AICH chunks perfectly into
the ed2k chunks.

More on this available at :

[http://www.emule-
project.net/home/perl/help.cgi?l=1&rm=show_...](http://www.emule-
project.net/home/perl/help.cgi?l=1&rm=show_topic&topic_id=589)

[http://en.wikipedia.org/wiki/EMule#Basic_concepts](http://en.wikipedia.org/wiki/EMule#Basic_concepts)

[http://wiki.amule.org/t/index.php?title=AICH_Hashset](http://wiki.amule.org/t/index.php?title=AICH_Hashset)

