Hacker News new | past | comments | ask | show | jobs | submit login
Building a BitTorrent client from scratch in C# (cheatdeath.github.io)
509 points by nxzero on July 5, 2016 | hide | past | web | favorite | 86 comments



Former admin of DC# here (forgive the sourceforge hosting-- it was a long time ago! https://sourceforge.net/projects/dc-sharp/). Great write-up! This was a fascinating read- thank you for putting it together.

One issue to be mindful of- the HttpWebRequest.BeginGetResponse method does not honor timeouts, and you are on your own to timeout the attempt. Consider using HttpClient, if available in Mono / .NET Core. Otherwise, see MSDN for how to do this:

"In the case of asynchronous requests, it is the responsibility of the client application to implement its own time-out mechanism. The following code example shows how to do it." See: https://msdn.microsoft.com/en-us/library/system.net.httpwebr...

I'm not sure if you have access to the ThreadPool class. In a bug that Microsoft's library had, I used the TPL Task construct to resolve this. See the pull request here: https://github.com/Microsoft/ProjectOxford-ClientSDK/pull/83...


Thanks for the pointer!


Great work! Quite annoying actually. I finished my own implementation in Python at about 10pm last night, this would have been most useful. I'm no C# coder, but it's nicely readable, and this is a much better write up than I'm sure I could do.

If anyone who hasn't tried doing this before, the "official" BitTorrent spec docs, namely BEP-3 (http://bittorrent.org/beps/bep_0003.html), seem little more than a vague blog post turned in to a "spec". However, somewhat conversely, this has lead to is a wealth of articles describing how to do it.

The three guides I used were:

- A 2 part blog post which has a bit of a Python bent http://www.kristenwidman.com/blog/33/how-to-write-a-bittorre...

- The unofficial specs https://wiki.theory.org/BitTorrentSpecification, and

- An incomplete Python client https://github.com/JosephSalisbury/python-bittorrent

I didn't know of the RFC mentioned in the post, that would have also been really useful.

A lot of BitTorrent stuff for Python is remarkably hard to find in all the noise of Deluge, the original client, and libtorrent wrappers, but none that existed were sophisticated (or at least well documented) enough for my experiments, they have different focuses.

I never went as far as implementing my own BEncoder library, a billion seem to exist in multiple languages and install any BitTorrent Python library and it seems to come with their own copy. (I suspect due to the way BEncoder was bundled in the original client, see: https://pypi.python.org/pypi/bencode)

I also found a Rust implementation which seems not to compile, but is useful as I'm trying to teach myself Rust https://github.com/kenpratt/rusty_torrent I think the work to get it to compile might be minimal.


" this would have been most useful. I'm no C# coder, but it's nicely readable, and I'm sure this is a lot better written up than I could do."

I agree. I don't do C# but mostly can follow it. It also is well-organized presentation of much of a protocol all kinds of people keep re-implementing. They need the help more often than not. A great write-up.


> I also found a Rust implementation which seems not to compile, but is useful as I'm trying to teach myself Rust https://github.com/kenpratt/rusty_torrent I think the work to get it to compile might be minimal.

There is also another project in Rust, it looks more active: https://github.com/GGist/bip-rs It is a collection of libraries.

> If anyone who hasn't tried doing this before, the "official" BitTorrent spec docs, namely BEP-3 (http://bittorrent.org/beps/bep_0003.html), seem little more than a vague blog post turned in to a "spec".

Doesn't look vague at all. What do you think is missing from it?


> There is also another project in Rust, it looks more active: https://github.com/GGist/bip-rs

Thanks, I had seen that one, but forgot about it. I think it's a great project, but it's really just a collection of libraries that don't really tell you how it all fits together, which when I was picking stuff up wasn't very helpful. Hopefully now I have a better understanding of the client design I can make something from that.

> Doesn't look vague at all. What do you think is missing from it?

For a comparison I would recommend reading a few (what I would consider) good protocol docs. Docs that you could read and implement, and probably get working very quickly, for example:

- XMPP's XEPs (one picked for similarity in usage to BitTorrent) https://xmpp.org/extensions/xep-0020.html - Lots of examples in there for what messages should look like, which is always helpful.

- The BitTorrent RFC doc (linked in the original post) http://jonas.nitro.dk/bittorrent/bittorrent-rfc.html - Sums the situation up nicely with the layout of messages and value lengths.

I think the main thing that makes the biggest difference is adhering to a language spec such as RFC 2119 which recommends using "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", etc. which makes it really clear what you're meant to do or not to.

Specifically for the vagueness of BEP-3, how about this example that made me rage on IRC. In the description for the info_hash field in the Tracker section.

    This value will almost certainly have to be escaped.
ALMOST CERTAINLY?? Will it, or won't it? Then, escaped? Escaped how?

What this turned out to mean was that the 20-bit binary sha1 hash MUST be URL encoded, and not hex encoded.

I would love to see someone try to build a BitTorrent client for the first time based solely on this doc.

---

BEP-3 also seems more interested in implementation detail, than describing the protocol. Take the last paragraph (before Copyright) as an example.

Something else which occurred to me today is that BitTorrent is not a spec, it's not been developed, it has evolved. Along with being built in a very modular way, i.e.: DHTs can replace trackers and simply dropped in, magnet URIs can replace Torrent files. This probably contributes it's success and longevity, but what this also means is that there is a lot of stuff, like metainfo, trackers, bencoding, that SHOULD belong in their own spec docs, which form a collective whole.


Very good! A few minor performance comments.

1. In your EncodeDictionary, you sort byte arrays by converting them to string. Correct but subeffective. See e.g. this: http://stackoverflow.com/q/19695629/126995 but add checks for nulls, authors of that code forgot about that.

2. You don’t need a dedicated thread to wake up every 1-10 seconds and do something small. Thread are expensive system resources, they own stack, cache misses are guaranteed then they wake up, etc. If your compiler supports async-await, use that instead + endless loop + Task.Delay inside the loop. If not, System.Timers.Timer class will do.


Thanks for the tips! I will look into making these modifications when I get a chance.


You're welcome. Edited for clarity.


BEncoding and variants like REncoding are possibly one of my least favourite things ever. If you deal with the Deluge torrent client API you'll see it everywhere.

That aside, fantastic work on this, I think previously the only Bittorrent library for C# was an abandoned Mono project.


Yeah, monotorrent doesn't really work very effectively. Tends to get blocked by a lot of peers and really mediocre throughout when it isn't blocked.

The last time I was trying to build something that used bittorrent, I fell back on launching aria2 and redirecting and parsing its stdout, which as crazy as it was, worked much better.


Aria2 actually has a RPC interface which is pretty easy to work with (json or xml over http or web sockets).

https://aria2.github.io/manual/en/html/aria2c.html#rpc-inter...


> BEncoding and variants like REncoding are possibly one of my least favourite things ever.

Can't be worse than ASN.1 can it?


Yes, it can be worse, since ASN.1 is relatively "popular" and there's a decent pool of knowledge.


Depends on if ignoring a high-assurance implementation is bad:

http://cps-vo.org/node/1577

Already bad if you wanted ASN.1 but can't get Galois's version. If you can get it, then choosing an ASN.1 alternative can be bad since it probably won't have a formal spec, verified parser, Haskell implementation, and so on. Probably a drop in correctness in some corner case vs whatever they made.

Note: I'd like to see them do a high-assurance JSON and/or XDR parser instead of just ASN.1. I know they did a Haskell-to-JSON library already. A strong one that extracted parsers or generators from a user-supplied specification with plugins for various programming languages would be nice.


Here’s a similar write-up on building a BitTorrent client in Haskell: https://blog.chaps.io/2015/10/05/torrent-client-in-haskell-1...


If anyone is interested in a similar writeup for node.js check out my tutorial here: http://allenkim67.github.io/bittorrent/2016/05/04/how-to-mak...


HTTPS Everywhere redirects me to the HTTPS version of the page, but you've hard-coded http:// links for some/all of the resources, which the browser refuses to load, so it just looks like some big gray boxes.


Hey thanks, it looks like the link for the css was the problem so I've just fixed that for now.


Awesome write-up! I love C# and this was really well-written. Great work.

Pro tip: Use DateTimeOffset instead of DateTime. It's less frustratingly ambiguous than DateTime, and already has a Unix timestamp helper function if you're on the latest framework: https://msdn.microsoft.com/en-us/library/system.datetimeoffs...


Oh cool, thanks! I wasn't aware of DateTimeOffset.


Not sure if it's a part of the standard library, but the .Net variant of C# contains a sorted dictionary: https://msdn.microsoft.com/en-us/library/f7fta44c(v=vs.110)....


Also, fantastic work. The readability of C# really shines through here - I could grok all of this code instantly :)


Thanks.

Actually organising the write up forced me to tidy up the code much more than I otherwise would have. I definitely find C# to be one of the more readable languages although I have had to debug and untangle some C# messes before.

Yeah I remember changing it to a SortedDictionary but I changed it back. I can't remember exactly why, possibly because it's supposed to be sorted by raw UTF8 bytes rather than a nice neat C# string and I didn't want to start using byte arrays for dictionary keys. I guess it only needs to be sorted when in the BEncoding format and it felt better to keep the internal structure as simple as possible. The tradeoff is it doesn't support incorrectly encoded torrent files – I'm really not sure how much of an issue that is.


Agreed. Properly written C# code is very readable. I write most things in it these days and couldn't be happier about what this language has become.

I have a few minor gripes with id, but they mainly relate to missing syntactic sugar rather than actual shortcomings.

Kudos to OP, great job.

EDIT: fix up grammar mistake.


It's really nice to see a walkthrough of a non-trivial program all on one page like this. The clarity of the code and writing makes me want to port it to a different language because it seems like it would be easy with all the needed info in one place.


Heh, more than a decade ago I created a torrent file format library in .NET ... actually with VB.Net ... anyway, this is GPLv2 licensed. http://writtorrent.cvs.sourceforge.net/viewvc/writtorrent/wr...


It's really great work OP. I know this would have taken you a long time to do but part of me can't help but wonder if programming is becoming even more like paint by numbers than it already is.


For interested people there is a great write up on Tox protocol here https://toktok.github.io/spec


I wish there's a similar article in python.


It seems to me that the C# code listings here look basically just like the equivalent Python. I wouldn't let the name of the language stop you.


Skimming it, the article seems pretty detailed, it should be possible to follow it in other languages, or even to convert it to a polyglot guide.


Isn't the original client a python program?



I would love one, and other similarly meaty projects, in Swift 3.


I am actually working on a GUI BitTorrent client in Swift, just converted my code to Swift 3 this morning, (I'm in the EU), quite far to go still, but I hope to have an alpha release out in Q4.


I guess a native Python implementation would be too slow. However, there is a fantastic libtorrent library that has Python bindings and allows to implement a torrent client in Python relatively easily.

BTW, regarding the original article, there is also a MonoTorrent library for .NET. Despite the name it can be compiled by Visual Studio. The original library was abandoned a while ago and seems to be buggy, but I was able to make a very simple .NET client with WinForms UI using this fork: https://github.com/ErtyHackward/monotorrent


The bottleneck is usually network.

The original bittorrent client (before µTorrent) was actually written in Python BTW.


It's not accurate. Widely adopted client before uTorrent was Azureus and it was written in Java.


The very first torrent client written by Bram Cohen (the person who invented bittorrent) was written in Python[1].

I remember it, because 15 years ago that was the only client available. Later people started creating other clients by forking his python code, and eventually rewriting it in different languages.

[1] https://en.wikipedia.org/wiki/BitTorrent_(software)


And no one used it, you know what was before C ? You probably don't because no one used it also after that. Azureus was developed 13 years ago and that was the client that was used... I know because i remember it also. And then they (from bittorrent inc) changed their python version to C++ and called it uTorrent because python was too slow and no one wanted to use it...


Lots of people used the Python client because not everyone wanted to run Java (memory hog) or were on Windows (uTorrent). Azureus had one advantage: first to support the DHT and trackerless operation.


I don't understand this encoding method. If say, a dictionary starts with d and ends with e, how do you know with "d3:key5:valuee" if the value is "value" or "valu"?



Because the value is actually "5:value", which means it's 5 bytes long. The final 'e' indicates the end of the dictionary, not the value.


The length of the string is included, so you know where it ends.


I've been considering trying my hand at creating a bittorrent client. This should prove to be most helpful!


Somewhat disappointing that it's just a console app. I'd love to be able to do cross platform C# desktop development. There shock be something equivalent to WinForms/WPF on OSs other than Windows.


This is now possible with the new .NET Core.

https://docs.microsoft.com/en-us/dotnet/articles/core/index


The one thing they explicitly do not have on the roadmap for .NET Core is any sort of desktop environment / UI stuff.

There won't be a Forms or WPF port.


Personal plug: we're working on it! https://github.com/AvaloniaUI/Avalonia


I've been looking at EdgeJs, specifically one version which was compiled to run in Electron. The idea being to make a bridge between electron and C# view-models (or whatever) so people can build cross-plat C# and HTML5 desktop apps.


Sure there won't be shared UI development support, but you can easily encapsulate app logic from UI. If you're brave enough you can try using Gtk# from http://www.mono-project.com/docs/gui/


I got the impression that Xamarin was going in that direction?


Xamarin is specifically for app and UI on Android/iOS/Windows Phone.


And Mac. You could write shared business logic in C# for your Windows, Mac, and mobile apps, with a combination of .NET and Xamarin, and still use C# for each native UI.

But it sounds like you are actually thinking of Xamarin.Forms, which allows sharing UI code on mobile platforms as well as UWP. You are correct that there is no Mac support for that.


Ah, makes sense. Thanks for clarifying.


They released Mac bindings for Xamarin Forms recently. Not a stretch to imagine unified UI for desktops.


Actually that's exactly what I imagined to happen. What would be so fundamentally different in desktop development compared to mobile UI development that would exclude it from Xamarin? I don't see it.


Are you quite sure? I don't think this is correct.

Xamarin.Mac provides C# bindings to Mac desktop APIs, and has nothing to do with Xamarin.Forms.


Mono provides an implementation of Windows.Forms that can be used cross platform.


I've also heard good things about Eto.Forms.


Fun read, but using automatic properties might lead you down a path that isn't optimal;

Take this for example:

    public byte[] Infohash { get; private set; } = new byte[20];
    public string HexStringInfohash { get { return String.Join("", this.Infohash.Select(x => x.ToString("x2"))); } }
    public string UrlSafeStringInfohash { get { return Encoding.UTF8.GetString(WebUtility.UrlEncodeToBytes(this.Infohash, 0, 20)); } }
You have an automatic property and two 'properties' that actually perform work every time you call the getter (might be smarter to make functions of those, so you know it's not just retrieval of data, but work is done).

If you were to rewrite this a bit, you could make sure the 'work' is done only when needed, and the properties become actual simple data retrieval properties like:

    public class Hashes
    {
        byte[] _infohash;
        string _hexStringInfohash, _urlSafeStringInfohash;
        public byte[] Infohash
        {
            get { return _infohash; }
            private set
            {
                _infohash = value;
                _hexStringInfohash = String.Join("", this.Infohash.Select(x => x.ToString("x2")));
                _urlSafeStringInfohash = Encoding.UTF8.GetString(WebUtility.UrlEncodeToBytes(this.Infohash, 0, 20));
            }
        }
        public string HexStringInfohash { get { return _hexStringInfohash; } }
        public string UrlSafeStringInfohash { get { return _urlSafeStringInfohash; } }

        public Hashes()
        {
            Infohash = new byte[20];
        }
    }
Going further through the article, I spot many more items to improve; but let's not forget your did great work and the code is quite readable.

One thing that might help; is building some indexes to know how files are fragmented; you have the following code multiple times:

        if ((start < Files[i].Offset && end < Files[i].Offset) ||
            (start > Files[i].Offset + Files[i].Size && end > Files[i].Offset + Files[i].Size))
            continue;
If you'd build an index to know which piece hits which files, you don't have to enumerate this every time.

Another general remark is to always 'retrieve' an indexed item from the array and use that instead of keep calling the 'indexed' record.

So; do:

        var file = Files[i];
        if ((start < file.Offset && end < file.Offset) ||
            (start > file.Offset + file.Size && end > file.Offset + file.Size))
            continue;
The code becomes more readable and allows you to change the structure later on more easily since you don't have 100 references tot he same array now and only use an itermediate.


Thanks for the comments! I'm all for improved readability. I definitely wasn't aiming for much performance wise, at least initially. These were two areas (of many) that I felt could do with some improvement (especially the file IO). I will look into making some modifications like those suggested when I get a chance.


Is the site down?


Here is the text-only version from Google Cache http://webcache.googleusercontent.com/search?q=cache:http://...


It's not completely down, the server does seems to be struggling though.


not really just takes minutes to load


If anyone's having issues, I've mirrored it at https://cheatdeath.github.io/research-bittorrent-doc/

edit: I'm the author, let me know if you have any questions.



Sorry, I didn't update some of the links in the contents correctly when creating the mirror. They should be fixed now – everything is on one page.


I don't know if you've seen it already but some people on reddit talked about some improvements/tips here: https://www.reddit.com/r/programming/comments/4rcss2/buildin...

I agree with most of their sentiments, especially the parsing part could be made cleaner/more C#'ish :)

Otherwise nice work and very interesting writeup!


Thanks for letting me know, looks like I'll have to find some time to implement some of these recommendations.


Great work. Does it work under .NET Core ?


From a quick look at the source [0], it looks like it supports Mono. I can't see anything stopping it being ported to Core, apart from JonSkeet.MiscUtil may not support Core.

If you package this up as a NuGet package and support Core then can you please ping me and I'll add it to https://anclafs.com.

[0] https://github.com/cheatdeath/research-bittorrent


I've been prototyping a .NET Core port. JonSkeet.MiscUtil source code is fine for Core, but for the BitTorrent code we need to change the HttpWebRequest objects to HttpClient and rewrite the Begin/End/IAsyncResult/callback-style to async/await.


FYI Jon has said to avoid using MiscUtil.

https://github.com/jpsingleton/ANCLAFS/issues/6#issuecomment...


I'm not that familiar with .NET Core. It doesn't look like MiscUtil is compatible with it.

The executable project has some minor references to Mono.Posix just so it can catch kill signals while running in the Terminal and die gracefully.

Both dependencies can be removed quite easily from the project.


Definitely having timeout issues on the OP link. Your €3/month hosting from blacknight.com isn't holding up to the HN load, apparently :)

Thank you for the github link. Mods can we get OP link adjusted to the github link instead of what was submitted?


Great to see an alternative to old MonoTorrent! I'm curious why you didn't use async/await instead of the Begin/End/IAsyncResult pattern?


Honestly I'm just not that familiar with async/await. I'll look into modifying it when I get around to making some other changes.


Awesome post/code; might be worth noting that you're the author of the code/post and your willingness to address questions.


Will do, thanks for posting.


Thanks, we updated the link from http://seanjoflynn.com/research/bittorrent.html.


Your name from WoW?


World of Warcraft? Haha no, I've been using that handle probably since that game came out but I've never played it.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: