
Building a BitTorrent client from scratch in C# - nxzero
https://cheatdeath.github.io/research-bittorrent-doc/
======
jabstack
Former admin of DC# here (forgive the sourceforge hosting-- it was a long time
ago! [https://sourceforge.net/projects/dc-
sharp/](https://sourceforge.net/projects/dc-sharp/)). Great write-up! This was
a fascinating read- thank you for putting it together.

One issue to be mindful of- the HttpWebRequest.BeginGetResponse method does
not honor timeouts, and you are on your own to timeout the attempt. Consider
using HttpClient, if available in Mono / .NET Core. Otherwise, see MSDN for
how to do this:

"In the case of asynchronous requests, it is the responsibility of the client
application to implement its own time-out mechanism. The following code
example shows how to do it." See: [https://msdn.microsoft.com/en-
us/library/system.net.httpwebr...](https://msdn.microsoft.com/en-
us/library/system.net.httpwebrequest.begingetresponse\(v=vs.110\).aspx)

I'm not sure if you have access to the ThreadPool class. In a bug that
Microsoft's library had, I used the TPL Task construct to resolve this. See
the pull request here: [https://github.com/Microsoft/ProjectOxford-
ClientSDK/pull/83...](https://github.com/Microsoft/ProjectOxford-
ClientSDK/pull/83/commits/f19b86b74a4841618d7644bcdc249f4c5406d632)

~~~
cheatdeath
Thanks for the pointer!

------
mattcopp
Great work! Quite annoying actually. I finished my own implementation in
Python at about 10pm last night, this would have been most useful. I'm no C#
coder, but it's nicely readable, and this is a much better write up than I'm
sure I could do.

If anyone who hasn't tried doing this before, the "official" BitTorrent spec
docs, namely BEP-3
([http://bittorrent.org/beps/bep_0003.html](http://bittorrent.org/beps/bep_0003.html)),
seem little more than a vague blog post turned in to a "spec". However,
somewhat conversely, this has lead to is a wealth of articles describing how
to do it.

The three guides I used were:

\- A 2 part blog post which has a bit of a Python bent
[http://www.kristenwidman.com/blog/33/how-to-write-a-
bittorre...](http://www.kristenwidman.com/blog/33/how-to-write-a-bittorrent-
client-part-1/)

\- The unofficial specs
[https://wiki.theory.org/BitTorrentSpecification](https://wiki.theory.org/BitTorrentSpecification),
and

\- An incomplete Python client [https://github.com/JosephSalisbury/python-
bittorrent](https://github.com/JosephSalisbury/python-bittorrent)

I didn't know of the RFC mentioned in the post, that would have also been
really useful.

A lot of BitTorrent stuff for Python is remarkably hard to find in all the
noise of Deluge, the original client, and libtorrent wrappers, but none that
existed were sophisticated (or at least well documented) enough for my
experiments, they have different focuses.

I never went as far as implementing my own BEncoder library, a billion seem to
exist in multiple languages and install any BitTorrent Python library and it
seems to come with their own copy. (I suspect due to the way BEncoder was
bundled in the original client, see:
[https://pypi.python.org/pypi/bencode](https://pypi.python.org/pypi/bencode))

I also found a Rust implementation which seems not to compile, but is useful
as I'm trying to teach myself Rust
[https://github.com/kenpratt/rusty_torrent](https://github.com/kenpratt/rusty_torrent)
I think the work to get it to compile might be minimal.

~~~
jkeler
> I also found a Rust implementation which seems not to compile, but is useful
> as I'm trying to teach myself Rust
> [https://github.com/kenpratt/rusty_torrent](https://github.com/kenpratt/rusty_torrent)
> I think the work to get it to compile might be minimal.

There is also another project in Rust, it looks more active:
[https://github.com/GGist/bip-rs](https://github.com/GGist/bip-rs) It is a
collection of libraries.

> If anyone who hasn't tried doing this before, the "official" BitTorrent spec
> docs, namely BEP-3
> ([http://bittorrent.org/beps/bep_0003.html](http://bittorrent.org/beps/bep_0003.html)),
> seem little more than a vague blog post turned in to a "spec".

Doesn't look vague at all. What do you think is missing from it?

~~~
mattcopp
> There is also another project in Rust, it looks more active:
> [https://github.com/GGist/bip-rs](https://github.com/GGist/bip-rs)

Thanks, I had seen that one, but forgot about it. I think it's a great
project, but it's really just a collection of libraries that don't really tell
you how it all fits together, which when I was picking stuff up wasn't very
helpful. Hopefully now I have a better understanding of the client design I
can make something from that.

> Doesn't look vague at all. What do you think is missing from it?

For a comparison I would recommend reading a few (what I would consider) good
protocol docs. Docs that you could read and implement, and probably get
working very quickly, for example:

\- XMPP's XEPs (one picked for similarity in usage to BitTorrent)
[https://xmpp.org/extensions/xep-0020.html](https://xmpp.org/extensions/xep-0020.html)
\- Lots of examples in there for what messages should look like, which is
always helpful.

\- The BitTorrent RFC doc (linked in the original post)
[http://jonas.nitro.dk/bittorrent/bittorrent-
rfc.html](http://jonas.nitro.dk/bittorrent/bittorrent-rfc.html) \- Sums the
situation up nicely with the layout of messages and value lengths.

I think the main thing that makes the biggest difference is adhering to a
language spec such as RFC 2119 which recommends using "MUST", "SHALL",
"REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", etc. which makes it really
clear what you're meant to do or not to.

Specifically for the vagueness of BEP-3, how about this example that made me
rage on IRC. In the description for the info_hash field in the Tracker
section.

    
    
        This value will almost certainly have to be escaped.
    

ALMOST CERTAINLY?? Will it, or won't it? Then, escaped? Escaped how?

What this turned out to mean was that the 20-bit binary sha1 hash MUST be URL
encoded, and not hex encoded.

I would love to see someone try to build a BitTorrent client for the first
time based solely on this doc.

\---

BEP-3 also seems more interested in implementation detail, than describing the
protocol. Take the last paragraph (before Copyright) as an example.

Something else which occurred to me today is that BitTorrent is not a spec,
it's not been developed, it has evolved. Along with being built in a very
modular way, i.e.: DHTs can replace trackers and simply dropped in, magnet
URIs can replace Torrent files. This probably contributes it's success and
longevity, but what this also means is that there is a lot of stuff, like
metainfo, trackers, bencoding, that SHOULD belong in their own spec docs,
which form a collective whole.

------
Const-me
Very good! A few minor performance comments.

1\. In your EncodeDictionary, you sort byte arrays by converting them to
string. Correct but subeffective. See e.g. this:
[http://stackoverflow.com/q/19695629/126995](http://stackoverflow.com/q/19695629/126995)
but add checks for nulls, authors of that code forgot about that.

2\. You don’t need a dedicated thread to wake up every 1-10 seconds and do
something small. Thread are expensive system resources, they own stack, cache
misses are guaranteed then they wake up, etc. If your compiler supports async-
await, use that instead + endless loop + Task.Delay inside the loop. If not,
System.Timers.Timer class will do.

~~~
cheatdeath
Thanks for the tips! I will look into making these modifications when I get a
chance.

~~~
Const-me
You're welcome. Edited for clarity.

------
voltagex_
BEncoding and variants like REncoding are possibly one of my least favourite
things ever. If you deal with the Deluge torrent client API you'll see it
everywhere.

That aside, fantastic work on this, I think previously the only Bittorrent
library for C# was an abandoned Mono project.

~~~
masklinn
> BEncoding and variants like REncoding are possibly one of my least favourite
> things ever.

Can't be worse than ASN.1 can it?

~~~
orbitur
Yes, it can be worse, since ASN.1 is relatively "popular" and there's a decent
pool of knowledge.

------
jdudek
Here’s a similar write-up on building a BitTorrent client in Haskell:
[https://blog.chaps.io/2015/10/05/torrent-client-in-
haskell-1...](https://blog.chaps.io/2015/10/05/torrent-client-in-
haskell-1.html)

------
allenkim6
If anyone is interested in a similar writeup for node.js check out my tutorial
here: [http://allenkim67.github.io/bittorrent/2016/05/04/how-to-
mak...](http://allenkim67.github.io/bittorrent/2016/05/04/how-to-make-your-
own-bittorrent-client.html)

~~~
finnn
HTTPS Everywhere redirects me to the HTTPS version of the page, but you've
hard-coded [http://](http://) links for some/all of the resources, which the
browser refuses to load, so it just looks like some big gray boxes.

~~~
allenkim6
Hey thanks, it looks like the link for the css was the problem so I've just
fixed that for now.

------
nbarbettini
Awesome write-up! I love C# and this was really well-written. Great work.

Pro tip: Use DateTimeOffset instead of DateTime. It's less frustratingly
ambiguous than DateTime, and already has a Unix timestamp helper function if
you're on the latest framework: [https://msdn.microsoft.com/en-
us/library/system.datetimeoffs...](https://msdn.microsoft.com/en-
us/library/system.datetimeoffset.tounixtimeseconds\(v=vs.110\).aspx)

~~~
cheatdeath
Oh cool, thanks! I wasn't aware of DateTimeOffset.

------
whoisthemachine
Not sure if it's a part of the standard library, but the .Net variant of C#
contains a sorted dictionary: [https://msdn.microsoft.com/en-
us/library/f7fta44c(v=vs.110)....](https://msdn.microsoft.com/en-
us/library/f7fta44c\(v=vs.110\).aspx)

~~~
whoisthemachine
Also, _fantastic_ work. The readability of C# really shines through here - I
could grok all of this code instantly :)

~~~
cheatdeath
Thanks.

Actually organising the write up forced me to tidy up the code much more than
I otherwise would have. I definitely find C# to be one of the more readable
languages although I have had to debug and untangle some C# messes before.

Yeah I remember changing it to a SortedDictionary but I changed it back. I
can't remember exactly why, possibly because it's supposed to be sorted by raw
UTF8 bytes rather than a nice neat C# string and I didn't want to start using
byte arrays for dictionary keys. I guess it only needs to be sorted when in
the BEncoding format and it felt better to keep the internal structure as
simple as possible. The tradeoff is it doesn't support incorrectly encoded
torrent files – I'm really not sure how much of an issue that is.

------
blt
It's really nice to see a walkthrough of a non-trivial program all on one page
like this. The clarity of the code and writing makes me want to port it to a
different language because it seems like it would be easy with all the needed
info in one place.

------
th0ma5
Heh, more than a decade ago I created a torrent file format library in .NET
... actually with VB.Net ... anyway, this is GPLv2 licensed.
[http://writtorrent.cvs.sourceforge.net/viewvc/writtorrent/wr...](http://writtorrent.cvs.sourceforge.net/viewvc/writtorrent/writtorrent/torrentclass/torrent.vb?revision=1.1&view=markup)

------
Uptrenda
It's really great work OP. I know this would have taken you a long time to do
but part of me can't help but wonder if programming is becoming even more like
paint by numbers than it already is.

------
vishnuks
For interested people there is a great write up on Tox protocol here
[https://toktok.github.io/spec](https://toktok.github.io/spec)

------
aashu_dwivedi
I wish there's a similar article in python.

~~~
emodendroket
Isn't the original client a python program?

~~~
spaam
yeah, you can still download it from
[http://web.archive.org/web/20110712221649/http://download.bi...](http://web.archive.org/web/20110712221649/http://download.bittorrent.com/dl/archive/)

------
ambicapter
I don't understand this encoding method. If say, a dictionary starts with d
and ends with e, how do you know with "d3:key5:valuee" if the value is "value"
or "valu"?

~~~
sigcode
[http://cr.yp.to/proto/netstrings.txt](http://cr.yp.to/proto/netstrings.txt)
(1997)

------
hackeradam17
I've been considering trying my hand at creating a bittorrent client. This
should prove to be most helpful!

------
ZanyProgrammer
Somewhat disappointing that it's just a console app. I'd love to be able to do
cross platform C# desktop development. There shock be something equivalent to
WinForms/WPF on OSs other than Windows.

~~~
ohitsdom
This is now possible with the new .NET Core.

[https://docs.microsoft.com/en-
us/dotnet/articles/core/index](https://docs.microsoft.com/en-
us/dotnet/articles/core/index)

~~~
revelation
The one thing they explicitly do not have on the roadmap for .NET Core is any
sort of desktop environment / UI stuff.

There won't be a Forms or WPF port.

~~~
egeozcan
I got the impression that Xamarin was going in that direction?

~~~
nbarbettini
Xamarin is specifically for app and UI on Android/iOS/Windows Phone.

~~~
michaeldwan
They released Mac bindings for Xamarin Forms recently. Not a stretch to
imagine unified UI for desktops.

~~~
egeozcan
Actually that's exactly what I imagined to happen. What would be so
fundamentally different in desktop development compared to mobile UI
development that would exclude it from Xamarin? I don't see it.

------
NKCSS
Fun read, but using automatic properties might lead you down a path that isn't
optimal;

Take this for example:

    
    
        public byte[] Infohash { get; private set; } = new byte[20];
        public string HexStringInfohash { get { return String.Join("", this.Infohash.Select(x => x.ToString("x2"))); } }
        public string UrlSafeStringInfohash { get { return Encoding.UTF8.GetString(WebUtility.UrlEncodeToBytes(this.Infohash, 0, 20)); } }
    

You have an automatic property and two 'properties' that actually perform work
every time you call the getter (might be smarter to make functions of those,
so you know it's not just retrieval of data, but work is done).

If you were to rewrite this a bit, you could make sure the 'work' is done only
when needed, and the properties become actual simple data retrieval properties
like:

    
    
        public class Hashes
        {
            byte[] _infohash;
            string _hexStringInfohash, _urlSafeStringInfohash;
            public byte[] Infohash
            {
                get { return _infohash; }
                private set
                {
                    _infohash = value;
                    _hexStringInfohash = String.Join("", this.Infohash.Select(x => x.ToString("x2")));
                    _urlSafeStringInfohash = Encoding.UTF8.GetString(WebUtility.UrlEncodeToBytes(this.Infohash, 0, 20));
                }
            }
            public string HexStringInfohash { get { return _hexStringInfohash; } }
            public string UrlSafeStringInfohash { get { return _urlSafeStringInfohash; } }
    
            public Hashes()
            {
                Infohash = new byte[20];
            }
        }
    

Going further through the article, I spot many more items to improve; but
let's not forget your did great work and the code is quite readable.

One thing that might help; is building some indexes to know how files are
fragmented; you have the following code multiple times:

    
    
            if ((start < Files[i].Offset && end < Files[i].Offset) ||
                (start > Files[i].Offset + Files[i].Size && end > Files[i].Offset + Files[i].Size))
                continue;
    

If you'd build an index to know which piece hits which files, you don't have
to enumerate this every time.

Another general remark is to always 'retrieve' an indexed item from the array
and use that instead of keep calling the 'indexed' record.

So; do:

    
    
            var file = Files[i];
            if ((start < file.Offset && end < file.Offset) ||
                (start > file.Offset + file.Size && end > file.Offset + file.Size))
                continue;
    

The code becomes more readable and allows you to change the structure later on
more easily since you don't have 100 references tot he same array now and only
use an itermediate.

~~~
cheatdeath
Thanks for the comments! I'm all for improved readability. I definitely wasn't
aiming for much performance wise, at least initially. These were two areas (of
many) that I felt could do with some improvement (especially the file IO). I
will look into making some modifications like those suggested when I get a
chance.

------
mafuy
Is the site down?

~~~
dethi
Here is the text-only version from Google Cache
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://seanjoflynn.com/research/bittorrent.html&num=1&strip=1&vwsrc=0)

------
cheatdeath
If anyone's having issues, I've mirrored it at
[https://cheatdeath.github.io/research-bittorrent-
doc/](https://cheatdeath.github.io/research-bittorrent-doc/)

edit: I'm the author, let me know if you have any questions.

~~~
lossolo
Great work. Does it work under .NET Core ?

~~~
jsingleton
From a quick look at the source [0], it looks like it supports Mono. I can't
see anything stopping it being ported to Core, apart from JonSkeet.MiscUtil
may not support Core.

If you package this up as a NuGet package and support Core then can you please
ping me and I'll add it to [https://anclafs.com](https://anclafs.com).

[0] [https://github.com/cheatdeath/research-
bittorrent](https://github.com/cheatdeath/research-bittorrent)

~~~
nick_
I've been prototyping a .NET Core port. JonSkeet.MiscUtil source code is fine
for Core, but for the BitTorrent code we need to change the HttpWebRequest
objects to HttpClient and rewrite the Begin/End/IAsyncResult/callback-style to
async/await.

~~~
jsingleton
FYI Jon has said to avoid using MiscUtil.

[https://github.com/jpsingleton/ANCLAFS/issues/6#issuecomment...](https://github.com/jpsingleton/ANCLAFS/issues/6#issuecomment-231041947)

