
The BitTorrent Protocol Specification v2 - Nekit1234007
http://bittorrent.org/beps/bep_0052.html
======
jzelinskie
This update to the spec is a modest change that's largely a preemptive
reaction to SHA1 being broken; large portions of BitTorrent are designed
around the 20-byte length of a SHA1 checksum. They've decided to move forward
with SHA256 truncated to 20 bytes to avoid incompatibilities with existing
infrastructure such as the Mainline DHT.

Beyond the hashing algorithm, some important additions that were previously
proposals without widespread use (e.g. merkle tree for hashing pieces) are
becoming required. The focus has mostly been on optimizing latency for the P2P
protocol and making sane improvements to the file spec. I feel like trackers
were largely overlooked in this update, but I'm biased because I work on a
popular tracker.

Ideally, BitTorrent would be broken down into separate specifications that
could be used together or in separate systems: one for the file format and
piece representation for sharing files, one for the P2P protocol, and one for
discovery (trackers, DHTs). I want to believe that there would be far more
interesting P2P projects if you could just lift robust primitives from
BitTorrent.

~~~
the8472
The DHT BEPs specify a network that is only barely related to the bittorrent
core protocol, they can already be used independently, and some people do.

> I feel like trackers were largely overlooked in this update, but I'm biased
> because I work on a popular tracker.

Yes, we did not pay much attention to trackers, but BEP52 basically seized the
opportunity to do some incompatible changes we always wanted to do anyway
(quite a few accumulated over the years), and there were no such open issues
with the http tracker protocol.

~~~
jzelinskie
>...and there were no such open issues with the http tracker protocol

This is because the HTTP protocol is so much overhead that most trackers don't
even really run it anymore. I think UDP being promoted to the spec would've
been a step in the right direction. Modern trackers have a bunch of tricks
like BEP34[0] to avoid getting pounded that would be great if every client
conformed to.

I hope I'm not coming off as aggressive. I really appreciate this work and I'm
really glad to see a spec revision. It's just as you said, there's been many
years and many good improvements that I'd like to see made while there's still
a change to break things.

[0]:
[http://www.bittorrent.org/beps/bep_0034.html](http://www.bittorrent.org/beps/bep_0034.html)

~~~
the8472
I still don't see why this would need to piggyback on a breaking change of the
metadata file format. "Please use UDP by default" is fairly orthogonal to the
metadata format and could be added to the spec at any point if we want to.

HTTP trackers are considered fine for medium-scale torrent deployments. UDP
trackers were originally introduced to cope with the traffic caused by running
an open tracker that manages 100k+ infohashes for the whole world.

Also, both BEP3 and 52 already forward-reference the tracker extensions
(compact and UDP), so someone who writes a new bittorrent implementation
should already be aware of them.

Maybe we could make it more clear that some BEPs are almost-mandatory.

------
Scaevolus
1) _Chunks don 't span files_. Each file is validated by the hash of its
merkle tree. This is the biggest user-visible change, since it means you can
download _one_ file without downloading others.

2) SHA1 is replaced with SHA2-256 (2x longer hashes and not broken).

3) Files are represented by a tree structure instead of a list of dictionaries
with paths-- this reduces duplication in deeply-nested hierarchies.

4) Backwards compatible-- you can make a .torrent file with both old and new
pieces, and a swarm can speak either. This requires padding files from BEP47,
which most clients probably don't support.

Per-file metadata increases pretty significantly, from ~19B (just length) to
~68B (length + hash).

~~~
phire
Per-file metadata increases significantly, but it gets rid of the per piece
data (which in bittorrent v1 is 20 bytes of sha1 hash per piece and made up
the bulk of the .torrent file).

The .torrent file only stores the merkle tree's root hash for each file, and
the torrent client will query it's peers to get the rest of the merkle tree
(verifiable against the root hash). The leafs of the merkle tree are the hash
of each 16kb block.

Interesting consequences of this:

Piece size isn't baked into the file anymore (and I've seen torrents with 16mb
blocks), the client can dynamically chose it's verification piece size by
requesting only so many layers of the merkle tree. Or it could skip requesting
the tree and verify the whole file at once.

Merkle tree roots will be globally unique. You can scan torrent files for
duplicated files and download common files from multiple swarms.

~~~
Scaevolus
Right, in BitTorrent v1 the size of the .torrent file is O(number of files) +
O(number of bytes), but with this it's just O(number of files) with a higher
constant factor.

Piece size is _still_ baked into the file (as piece length), and is used for
presence bitsets, which are a crucial part of the swarm algorithm. Clients
download the rarest pieces first to boost efficiency, and this information is
handled as bitsets shared between clients indicating "I have chunk {1, 2, 3,
... 50, 52, ... }".

Merkle tree roots will only be unique for each piece length. Piece length
should still correlate with total size, to prevent huge bitsets-- a 16KB piece
length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it
could take 500KB of RAM per connected peer to maintain state-- or maybe
compressed bitsets make this problem irrelevant in practice?

~~~
the8472
v1: O(path-depth * number of files + number of bytes)

v2: O(log(path-depth) * number of files)

that is assuming some constantish branching factor in your directory structure

> Merkle tree roots will only be unique for each piece length.

Merkle trees are independent of piece size, which means you can use them to
dedup across torrents.

~~~
Scaevolus
Oh, neat! I missed the part where larger piece sizes correspond to higher
layers of the tree.

Presumably clients still reconstruct (and store, somewhere) the full Merkle
tree to do incremental validation and support queries.

------
Klathmon
I have to admit, BitTorrent is one of the things I took for granted.

I never really thought about the details of how it works, or the really really
impressive feats that were accomplished to get it to work. I knew it was a
really good technology, but reading this and the comments here puts it on a
whole other level.

Why isn't this technology talked about more? Why are blockchains the big
"thing" right now with people trying to use them everywhere to see where they
fit best, but torrent networks are kind of just... ignored?

The decentralized nature of it seems to open so many possibilities at first
glance, is there a reason they aren't being taken advantage of? Is there some
kind of "great filter" kind of thing that is preventing widespread usage of
something like a torrent network?

~~~
ue_
I wanted to implement a distribured imageboard over bittorrent but I quickly
realised it's hard to add data after the initial publication, and further to
verify it, and the nature of trackers may make it prone to censorship. So I
gave up.

~~~
wongarsu
Distributing the images/posts via bittorrent and the relations between them in
the DHT might be the way to go with such a project.

On the other hand, the an uncensorable imageboard would profit from the
verifiable timestamping of a blockchain, with just the images distributed via
a bittorrent-like mechanism. That also gives you a decent anti-spam mechanism
(you can post in exchange for mining blocks, similar to the original idea of
hash-cash)

~~~
ue_
I thought about this too, but one of the features of imageboards is that it
doesn't splinter into subthreads, there's a big list of posts, unlike Reddit.
And because a post can reply to multiple posts at once, you can't separate
them in blockchain forks. If the blockchain forks, it becomes hard to
reference posts in the other forks from any particular fork.

On the other hand, there has to be a way to avoid downloading (and sharing)
certain parts of the chain, for example if someone uploads illegal content,
they should have the option to _never download_ that data, so I like the idea
of keeping images separate.

For posting to be feasible, the time to mine has to be low, though of course
it'll increase over time, meaning that either shorter blockchains are favoured
for ease of use (nobody wants to wait 5 minutes and waste a lot of power just
to make a post) but long enough to make them hard to forge.

There's also the issue of segmentation; there's an interest in certain users
wanting not to share certain posts, for example people against political issue
X may not want to share posts about X. With a small number of peers, this
could mean that only one or two peers keeps track of the posts talking about
issue X. And then you'd have to trust that you're not downloading illegal
content from those people, so if you are committed to anti-censorship but also
don't want to download illegal content, you have to trust those peers to only
remove illegal content.

In the end, I'm not sure if it comes out better than NNTP, or even centralised
discussion boards with multiple independent archive sites available (which can
archive posts before they are deleted by moderators).

------
richdougherty
Rationale for hash function change:
[https://github.com/bittorrent/bittorrent.org/issues/58](https://github.com/bittorrent/bittorrent.org/issues/58)

Discussion of other changes:
[https://github.com/bittorrent/bittorrent.org/pull/59](https://github.com/bittorrent/bittorrent.org/pull/59)

------
lowglow
Can someone diff the spec from the previous version? What's the changelog? :)

~~~
mouldysammich
The main differences I can see is a change from SHA1 -> SHA2 and also seems to
have added official spec for webtorrent.

~~~
Luminarys
It also appears to be using a merkle hash tree for piece hashing now along
with a few new peer wire messages to support that.

~~~
phire
It also switches to a merkle hash per-file and specifies that each file is
aligned to the start of a piece, and the size of the last piece of each file
matches the amount of remaining data.

This means in large multi-file torrents you don't have to download (and store)
the two extra 1-4mb pieces at the start/end of each file anymore.

------
redm
I just don't see this technology ever going mainstream. I first deployed this
type of application in 2003. It was named Redswoosh and did effectively the
same thing as BitTorrent, just in a closed client. I was also a very early
adopter of BitTorrent using it personally.

Users hated it for general use, even when downloading big files. 1) They
didn't like having to install/run some special software to download a file. 2)
They didn't like the effects of uploading to others and it slowing down the
connections.

Consumer networks are asymmetric having far more download capacity in upload
capacity. This makes sense since 1) most users download and want to use the
available bandwidth for faster downloads, and 2) it prevents commercial
applications on consumer circuits. This is far from ideal for applications
like BitTorrent.

I'm not saying there isn't an application for this technology, I'm saying all
the good applications don't want to ask the users to pay for distribution to
other users. Thus it's relegated to mostly piracy, open source, etc.

Bittorrent Inc. has been trying to commercialize this for a decade now, I just
don't see it happening. If there was anyone who could commercialize it, it was
Travis Kalnik, and while he exited for 20m, he was very lucky, (and happy) to
get out of that market.

~~~
snakeanus
> I just don't see this technology ever going mainstream

It already is though.

------
0x0
What's the stuff about "proof layers", is that new in this v2? The paper
briefly talks about proof layer requests. Is this something merkle-tree
related? What is the purpose? Is it to prevent clients from lying about having
pieces they do not have by requesting a verifiable random hash chunk?

~~~
the8472
It's part of switching to merkle trees instead of flat piece lists. A merkle
tree can only be verified if you either have a whole layer or send ancestor-
siblings (uncle, great-uncle, etc.) along with a partial layer.

Merkle trees allow torrents to start faster from magnet links since only the
tree roots need to be front-loaded while the tree can be fetched
incrementally.

------
shmerl
Do all Bittorrent clients support it already?

~~~
silotis
Currently no bittorrent clients support it. This is still just a draft. I've
only just started working on an implementation for libtorrent, it will be
quite a while before it is production ready.

------
smegel
Pity we will never see a genuine version of uTorrent that will support it.
That was a real loss.

~~~
vanderZwan
We have plenty of good open source alternatives now. qTorrent works fine

~~~
smegel
Thanks I haven't hear of that. And it is written in C++ which is nice.

Is it considered the spiritual successor to the original uTorrent?

~~~
nyolfen
it is:

>The qBittorrent project aims to provide an open-source software alternative
to µTorrent.

though in my experience it is more of a memory hog and buggier than utorrent.
but that doesn't stop me from using it

~~~
lucasverra
I've switched to transmission client and never looked back (4 yrs ago)

