
Building a BitTorrent client from the ground up in Go - eat_veggies
https://blog.jse.li/posts/torrent/
======
eat_veggies
Over the holidays, I challenged myself to learn Go by torrenting the Debian
ISO -- from scratch. This post is a bit of a brain dump about everything I've
learned over the past week.

~~~
mixmastamyk
Wow, I've been looking for a BT tutorial at about this level of abstraction
for two years or so. And would have liked to have read it a decade or two back
as well.

Wondering if there are any similar docs or a book for implementing a server in
Python, and a client in Python or Dart/Flutter? I know libtorrent bindings are
a thing but the docs seemed quite dense and I didn't even know where to
start—until now that is.

~~~
peterkelly
Here's a pretty good one in JavaScript:
[https://allenkim67.github.io/programming/2016/05/04/how-
to-m...](https://allenkim67.github.io/programming/2016/05/04/how-to-make-your-
own-bittorrent-client.html)

------
devadvance
Awesome write-up and a great way to dig into the inner-workings of BitTorrent.
Nice work!

Glancing at the wiki page for BitTorrent[0], I'm a bit surprised that there
isn't more of an effort to create cross-platform libraries and clients using
Go or Rust for this. Seems like a perfect use case? For example, synapse [1].

[0]
[https://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clien...](https://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clients#Libraries)

[1]
[https://github.com/Luminarys/synapse](https://github.com/Luminarys/synapse)

~~~
eat_veggies
I think BitTorrent is a very 2000's/early 2010's technology. That's when it
feels like it peaked, at least. The future is probably more toward WebTorrents
[0] (with WebRTC) and IPFS [1]. And those have great cross platform libraries.

[0] [https://webtorrent.io/](https://webtorrent.io/)

[1] [https://ipfs.io/](https://ipfs.io/)

~~~
totalperspectiv
As a torrent newb, what's the tldr on on why webtorrent over BitTorrent?

~~~
gpm
Why would I download a sketchy ass torrent client when I can go to a website,
download the torrent in my browser, and be able to rely on my browsers
sandboxing against the torrent client? Even better, if it's video I'm
torrenting, I can rely on my browsers sandboxing against the torrents contents
too.

It's more convenient, it's safer, it sucks for other torrentors because I
don't seed as long, but let's be honest, most people downloading torrents
don't care.

~~~
G4E
Why a torrent client would be sketchier than any other software ? There are
plethora of good open source clients like qbittorrent, transmission ... even
aria2 ! At that rate, are you suspicious of wget too ? Even more of Chrome ?
There are a lot of sketchy things going on in that one ! It is the user's
responsability to choose in which software he trust, and a bittorent client is
not worst than anything else ?

~~~
xvector
Browsers are far more battle-tested than just about any other web-facing
application on your computer.

Of course, you could make the personal decision to trust a client, and that is
fine. But if you aren’t willing to blindly trust a client, the other guy’s
point still stands - browsers are probably just the better choice here from a
security POV.

~~~
pdkl95
> Browsers are far more battle-tested than just about any other web-facing
> application on your computer.

They also have a monstrous attack surface _because_ they are "web-facing". A
specialized client that only implements one protocol without any connection to
the "web" is far easier to reason about and debug.

If you only consider the number of man-years an application has been battle-
tested, you imply that design complexity and attack surface doesn't matter. If
we account for complexity by using a metric like "(man-years of battle-
testing)/(magnitude of attack surface)", a well-tested specialized client that
hasn't had many recent bug reports is a much safer choice than anything
running in a browser.

> blindly trust a client

That's even _worse_ for the browser: you have to trust several orders of
magnitude more code implementing a massive set of interdependent features.
Yes, there are probably a lot more people working on fixing bugs in the
browser, but there are also a lot of people adding/modifying features and thus
creating new bugs.

------
war1025
A decade ago now, I wrote a bit torrent client [1] that started as a project
for a software development class.

I think the focus of the class was roadmaps and other parts of the development
process for a larger project. Most of the groups picked games or things like
that. We chose to build a bit torrent client and I ended up continuing working
on it for a year or so.

I was pretty proud of it at the time. I'm sure there are a bunch of
implementation details that seemed like a great idea as a Sophomore in college
that no longer sound so clever, but the implementation is pretty sound. I used
it to download a lot of stuff over the years.

The part I think is most interesting is the code for managing the torrent
pieces and writing them into the proper places for the various files within
the torrent. (Not necessarily my implementation of it, just the math behind
the process) [2]

I remember feeling really clever the first time everything came together and
actually downloaded a full torrent. My first bit software development victory.

[1] [https://github.com/war1025/Torrent](https://github.com/war1025/Torrent)

[2]
[https://github.com/war1025/Torrent/blob/piecereg/tcl/tm/torr...](https://github.com/war1025/Torrent/blob/piecereg/tcl/tm/torrent/file/impl/FileAccessManagerImpl.java)

~~~
kccqzy
> The part I think is most interesting is the code for managing the torrent
> pieces and writing them into the proper places for the various files within
> the torrent. (Not necessarily my implementation of it, just the math behind
> the process)

Can you explain a little bit how this is interesting mathematically? On first
thought, it seems like you would probably use a lot of pwrite, or if want
convenience over safety, mmap all the files and write directly to memory. Not
to diminish your achievement, but I didn't immediately see anything
interesting here. Can you explain a bit more?

~~~
war1025
What I meant more-so was the math of a "piece" from a torrent is a fixed
number of bytes.

The way the files are laid out for a torrent, they are listed in some order in
the initial torrent file, along with their individual lengths.

The data for the torrent is then treated as the contiguous sequence of bytes
for the files in the listed order.

That contiguous sequence is then broken into "pieces" of whatever fixed size.
So finding which file(s) and what position within those files piece 148, for
example, goes to involves some math that is pretty straight forward all in
all, but is still rewarding when it's the first time you've done such a thing.

------
userbinator
_Strictly leeches (does not support uploading pieces)_

Be careful with using this client --- the swarm for a Linux ISO is probably a
bit more forgiving, but you may get banned very quickly by the tracker or
other clients, because they will definitely notice that.

------
throwaway42534
In the same vein, I started to implement a BitTorrent client in OCaml.
[https://github.com/phlalx/sawadee](https://github.com/phlalx/sawadee)

Initially it was a project to learn about the Core/Async libraries. I will put
it back to life at some point, but it grew quite big and became very time
consuming for a single developer.

------
diehunde
Nice! Looks like a good idea to implement something from scratch to improve
the understanding of a language and also a tool. Any other ideas about stuff
to implement? Thanks.

~~~
eat_veggies
I built an http server and a markov chain library to learn python. They're
both pretty worthwhile weekend-size projects.

I heard cryptopals [0] is really good for learning both crypto and a new
programming language. It might be cool to build things like a DNS resolver, an
X window manager, gameboy game, or something from coreutils like grep or tar

[0] [https://cryptopals.com/](https://cryptopals.com/)

~~~
jsjohnst
> It might be cool to build things like a DNS resolver

I wouldn’t recommend writing a recursive resolver as a weekend project, to
many edge cases that you’ll get frustrated by, but that said, I do use writing
an authoritative DNS server as a great way to fully learn a new language. You
get file handling, grammar parsing, serialization/deserialization, bitpacking,
network I/O, data structures, both tcp and UDP socket handling, deamonization,
concurrency, etc etc. Covers most of the more difficult parts of many
languages and once you’ve done it once or twice, doing it again for new
languages is less about figuring out how to do it, but rather how to do it in
X language. At last count, I think I’ve done upwards of twenty languages now.

------
3fe9a03ccd14ca5
Excellent write up and very readable code.

------
jamra
I noticed a few things with the Go code that confused me. I haven't coded in
Go in quite some time, so I may be way off.

`copy(buf[1:], h.Pstr)`

In this line, are you copying the entire buffer to a string? Doesn't it
overflow into other data elements?

Also, and I may be wrong, in the following line, it appears that you are
casting to a []byte when it's already a slice of bytes, which should still be
fine.

`peers[i].Port = binary.BigEndian.Uint16([]byte(peersBin[offset+4 :
offset+6])`

I really enjoyed the tone and the code. I'm not done with the article, but I
love it so far.

~~~
eat_veggies
thank you for the feedback! It's really helpful.

copy's arguments are like copy(destination, source) so we're copying the
string into the buffer. Also, copy will never overflow because it will only
copy up to the length of the shortest buffer.

you're right about the unnecessary cast to []byte -- that function used to
take a string as arguments, and when I changed it, I didn't change the rest of
it. I've removed it.

------
jijji
I played with this it requires go 1.13 and seems to time out and close if one
of the trackers is unresponsive for more than 10 seconds. It has no command
line options and crashes if no options are present. It also doesnt support
magnet links, only torrent files.

------
tgrzinic
@eat_veggies The writeup is very useful. Can you describe the process of
learning two concepts simultaneously? Or you knew before the parts of
Bitorrent and how to implement it in some other language?

------
jbverschoor
Great write up, very clear explanation about the torrent protocol, and nice to
see the rational behind certain code decisions.

------
ovebepari
Any resources for doing the same thing in python?

------
ngcc_hk
Can it be done in node.js under a VM so to isolate the traffic and force it to
use vpn etc.

~~~
ngcc_hk
Found it based on comments here:

[https://github.com/webtorrent/webtorrent](https://github.com/webtorrent/webtorrent)

I guess the VM part is just to use a unikernel then try to find a way to auto
use vpn.

------
_tkzm
i wanted to do the same not that long ago but i abondoned that idea when i
could not find UI library i would like to work with and i was not looking to
make a cli application.

------
ignoramous
The blog doesn't discuss it, but the choice of DHT implementation in
BitTorrent is Kademlia [0] whilst Chord is more popular, used by Amazon Dynamo
and Facebook Cassandra [1]. One of the authors of Kademlia, David Mazières,
later co-authored the Stellar crypto-currency protocol [2] with Jed McCaleb of
Ripple fame.

That said, quite famously, Bram Cohen, the co-inventor of BitTorrent, failed
the Google interviews.

[0]
[https://news.ycombinator.com/item?id=18711980](https://news.ycombinator.com/item?id=18711980)

[1]
[https://news.ycombinator.com/item?id=3480480](https://news.ycombinator.com/item?id=3480480)

[2]
[https://news.ycombinator.com/item?id=16125920](https://news.ycombinator.com/item?id=16125920)

~~~
cristaloleg
Any sources regarding failed interview? Googles (ha-ha) says nothing.

~~~
ignoramous
May be the down-voters think I'm a troll? I thought it was an interesting note
about the creator of a protocol responsible for a third of all Internet
traffic. Or, does it highlight the alleged ineffectiveness of FAANG-style tech
interviews?

Btw, here's a ref:
[https://web.archive.org/web/20200105225938/https://www.there...](https://web.archive.org/web/20200105225938/https://www.theregister.co.uk/2007/01/05/google_interview_tales/)

