Hacker News new | past | comments | ask | show | jobs | submit login
Building a BitTorrent client from the ground up in Go (2020) (jse.li)
580 points by stargrave on Nov 6, 2022 | hide | past | favorite | 47 comments



Thanks, this is really cool. I don't have anything to add; I just want to let the author know this is appreciated. I recently decided to learn Go, and articles like this are a great way to get a feel for how Go programmers think about solving problems with the tools available to them. The kind of knowledge you don't get from just skimming The Go Programming Language and such.


I've recently started learning Go and chose a project-based approach. I've been reading "Writing an Interpreter in Go" and "Powerful Command-Line Applications in Go" and it's been loads of fun.


Thanks for the recommendations!


Check out distributed services in go as well. Best technical book I’ve ever read


This didn’t dive into it, but I appreciated the suggested search terms for information on decentralized peer discovery. I always wondered how that would work, especially the bootstrapping process. Ultimately I found this StackOverflow answer to be really helpful: https://stackoverflow.com/a/22240583


One of the things that made the bootstrapping click for me was realizing there are in fact servers that DHT clients talk to once to find an entrance to the network. They all have a list (such as https://github.com/qbittorrent/qBittorrent/blob/c80238d66ff3... )


Why is it so hard to in block torrent if all they have to d ok is block these DHT servers


Blocking DHT bootstrap nodes only takes care of DHT. This is only one way of obtaining peers, so even if it's not available, you can still fetch peer information from torrent trackers and receive potential peer addresses from other nodes through Peer Exchange.

DHT is required when your torrent file does not contain any tracker URLs (and you didn't add any yourself). In this case if it's blocked, then you have a problem.

Private trackers typically disable DHT for all torrents (so their torrents don't leak to outside users), along with PEX. It's just a flag in the torrent file, so you can force-enable it if you want (but it won't help much since the vast majority of other peers will probably abide by the rules).


Those are used only by new users the first time they use the app. They are regularly blocked, or retired (because maintainer decided so), it does not affect the network because they have nothing special. New ips are added to the list. You could become one of those simply by letting your computer online 24/7.


As someone who writes Go code daily, nice Go code!


One of my favorite things about Go's language design is that difference between really good code and average-level code is much smaller.


The project author's previous submission: https://news.ycombinator.com/item?id=21958359


They're big fans of bittorrent, it appears.


Bit Torrent is a cornerstone of the modern internet.


All the web3 talk is scoffed at by many, but I believe in decentralizing the internet as much as possible and am a big fan of Bittorrent.

To quote Satoshi Nakamoto, "For transferable proof of work tokens to have value, they must have monetary value. To have monetary value, they must be transferred within a very large network – for example a file trading network akin to bittorrent."


Web3 is mostly about NFT and making money.

Decentralization is just a buzz-word added to keep the privacy fundamentalists interested.


I don't believe BT will truly take off until web seeding is functional in all clients and I can use (a site | series of sites) to make a torrent of an existing thing. There was such a site (whose cute URL escapes me) where one could plug in a URL, it would download and cook a .torrent file but with the web seeds pointed at the source, and from that point forward others could get the cached version and the process was bootstrapped

I was also saddened by AWS dropping torrent support from their S3 API because that's one more easy way that folks could provide torrent bootstrapping "for free"


Take off? It's unfortunately waning. Used to be 35% of all internet traffic https://torrentfreak.com/bittorrent-is-still-the-king-of-ups...


Mostly because of TV streaming services though. I wonder how much BT is of consumer backbone usage.


Great post

One nit, with the font:

    l4:spami7ee
the first two characters look an awful lot like the number "14", when they're actually "L4". I'm not sure if bencoding is case sensitive or if anohter font may better differentiate the two, but that may be something the author may want to poke at if they happen to read this comment :)


I had the same confusion! In my browser, with the default monospace font, the two characters are so similarly as to be indistinguishable.

Reading the next example, showing an encoded dictionary, was what made the penny drop. Bencode doesn't go out of its way to be visually accessible, so different font choices would help.


The l looks taller than the 4 to me, which makes it clear it's not another number


Looks like a one to my bad eyes.


This looks to have been written on Jan 4, 2020.



> "They’ll send us an unchoke message to let us know that we can begin asking them for data."

What is the purpose of that choke part when connecting to remote peers? Is that some kind of flood protection?


You can normally control the number of peers you share with, so I guess if 50 peers are connected but you only want to share with 10, then when one completes you'd send an unchoke message to another peer to tell it you'll share with them. Meanwhile the other 39 will stay choked. I assume that's what's going on here.


That makes sense, thanks. It just occurred to me how interesting of a challenge it is to write a bittorrent client. There are so much issues to work on, like how to effectively retrieve chunks of files and in meantime balance upload speed equally to remote clients of file parts you actually have already.


Very helpful. Can someone recommend more of these learn by doing programming blogs?



Hey Jesse, we went to school together but I'm sure that's a ton of people you know by now ;). If you can figure it out hit me up ill show you some Zebra industry stuff I've reversed.


Really nice share! I think I'll have a follow-along with this next weekend to improve my go skills and try to learn rust by attempting to port this to it.


It is only downloading, right? Is that tolerated?

Anyway it's a cool post! Would be nice to see a follow-up for the upload/sharing part!


Hi I'm the original author. Sorry to see that you got downvoted because it's a good question and a real limitation of my project. I might get around to building a follow-up some day with upload/sharing since that's where the actual interesting stuff happens (and not just shuffling bits around, with a shitty download algorithm tacked on like in my post). No real plans though. If I do that then I might have to rewrite some of the code and edit parts of the blog post, since I've grown as an engineer and writer since I made this two years ago.

As for your question of "Is that tolerated?" the question is a social one. As the sibling commenter pointed out, if you use a private tracker, you're part of a community that often has a social contract (implicit or explicit) that participation means you need to contribute back and have a good seeding ratio. In that case, only downloading -- like my client does -- is not tolerated and might get you banned. That's why I chose to demo with torrenting a Linux ISO, since those tend to be public, have a lot of seeders anyway, and don't really care how much you upload or download.

There's nothing in the protocol that says you have to do it, and indeed, there are even "selfish" algorithms that try to maximize download speed while uploading nothing in return: https://www.wired.com/2007/01/bittorrent-bullies-bittyrant-a...


Thanks a lot for the insights! It's already really instructive, for the download part!

And it changes from all those Twitter-related posts recently, to be honest :-).


You may be thinking of private trackers, where "ratio" is the currency of participation in that community. Debian, Libre Office, and even Humble Bundle care more about the community of seeders sharing the bandwidth load than whether anyone seeds their ratio


I believe they are trying to clarify if this code is only for the 'downloading' portion of a BitTorrent client. It does not seem have any code for handling requests for pieces coming from other peers.


No. I'm thinking that a BitTorrent client, in my naive understanding, should share the file while it downloads it. Otherwise it's just a client/server system, isn't it?

That code is only downloading, as far as I can tell. Never sharing the file. I was just asking if that's tolerated in BitTorrent. My guess was that it is not.


It is a client/server system, but pulling from $n hosts instead of just 1, spreading out the load and also downtime risk across all $n hosts. And the reason I mentioned the "ratio currency" is your use of the word "tolerated" here: BitTorrent isn't the mafia, it's just a protocol that offers the ability to share pieces back to the Internet. As another concrete example, aria2c (https://github.com/aria2/aria2#readme) behaves similarly: using BT to download, but then exiting without attempting to seed anything


> It is a client/server system, but pulling from $n hosts instead of just 1, spreading out the load and also downtime risk across all $n hosts.

Well the point of peer to peer is that peers share while they download. Otherwise, if everyone only downloads, then it doesn't really get distributed. Also I meant client/server in opposition to p2p. Many non-p2p systems "on the cloud" spread the load and downtime risk across machines, that is independent from the p2p part.

> using BT to download, but then exiting without attempting to seed anything

I am not saying that you should keep sharing after it is downloaded. But typically BT clients "force" you to share while downloading.

> And the reason I mentioned the "ratio currency" is your use of the word "tolerated" here: BitTorrent isn't the mafia, it's just a protocol that offers the ability to share pieces back to the Internet.

Sure, sorry for my english, not my mothertongue :-). I know it is not the mafia, and I know it is a protocol. I used the word "tolerated" to ask whether or not, in practice, that gets you banned.


Was that worth a downvote? At least, if my question was that stupid, could you point me to the part of the blog post where it shows how pieces are sent to other peers?


One of the best posts lately. Thanks.


this is good info. i was thinking about making my own, before i discovered picotorrent. but the lack of ui libraries for go prevented me from moving forward. nowadays, with nuts and others, it might be worth a shot.


what happens when the peer is behind NAT? you need to be able to use TCP Hole Punching


Even if the peer isn't behind a NAT (or on IPv6), you still need to pass the firewall, which won't let incoming packets through. This can be solved by simply connecting from both ends simultaneously, (first attempt fails but opens the fw, second attempt in the other direction succeeds)

You only need hole punching if both peers are behind a NAT (TCP hole punching is substantially more complex and fragile than UDP). You can reduce the need for hole punching if you complicate the signaling protocol (eg "please dial me at <addr>").

That said, bittorrent is quite resilient to NATs in practice, because since it's many-to-many, chances are even NATed clients will find many non-NATed clients.


Personally I would've preferred a Rust implementation, but nice article nonetheless.


nice cartoons! :D




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: