
Show HN: Magnetico – self-hosted BitTorrent DHT search engine suite - boramalper
https://github.com/boramalper/magnetico
======
the8472
Skimming over the code it seems like it's doing fairly egregious bittorrent
spec violations. In other words its very selfish and other clients may
blacklist it if they detect such behavior.

~~~
boramalper
> Skimming over the code it seems like it's doing fairly egregious bittorrent
> spec violations.

It's true and even understated: _magneticod_ is not only selfish but also
incredibly aggressive. It will literally contact a thousand (and if your
network allows, even more) nodes every single second and deceive them by
forging thousands of ids to create the impression that it is "very close" (in
terms of XOR-distance used in DHT Kademlia) to the every single node.

Please have a look at this website (although it is a bit old):
[https://dsn.tm.kit.edu/english/2936.php](https://dsn.tm.kit.edu/english/2936.php)

There are on average 7,000,000 DHT peers in the network and less than a
quarter of the nodes stay longer than 12 hours in the network (i.e. the node
you contacted this morning might be long gone at the evening). There are quite
a few public papers[0] describing various Sybil- _attacks_ on BitTorrent DHT,
so it's not something unknown. BEP 42[1] tries to mitigate the issue by making
node IDs dependent on the IP address of the node, but as far as I know not
many clients (if any) implements it.

I spoke too much but one last point: These _attacks_ -given the scale of
BitTorrent DHT- I doubt to be disruptive, but they are more dangerous for the
privacy of the peers. One can gather thousands of IP addresses each day as
easily using the same technique. (See
[http://opentracker.blog.h3q.com/2007/02/12/perfect-
deniabili...](http://opentracker.blog.h3q.com/2007/02/12/perfect-
deniability/))

[0]:
[https://www.usenix.org/legacy/event/woot10/tech/full_papers/...](https://www.usenix.org/legacy/event/woot10/tech/full_papers/Wolchok.pdf)

[1]:
[http://www.bittorrent.org/beps/bep_0042.html](http://www.bittorrent.org/beps/bep_0042.html)

~~~
the8472
I'm not sure what you're trying to argue here. That it's ok to misbehave
because there are papers describing how to do it? That it's ok to be selfish
because there are enough compliant nodes to compensate for it?

~~~
boramalper
> That it's ok to be selfish because there are enough compliant nodes to
> compensate for it?

Yes. Of course, I do not have data nor the foresight to claim that its short
or long term effects are negligible, but I really doubt if it's not the case.

> That it's ok to misbehave because there are papers describing how to do it?

Not because of that, but I was pointing out that the technique is nothing new,
and probably numerous real attacks of much greater scale (with the intention
of actually disrupting the network) using the same techniques must have been
launched many times before magnetico. If BitTorrent DHT survived those, surely
it is more resilient than you tend to think.

\----

I would love to hear the opinion of a BitTorrent developer though. I'll try to
e-mail them tonight and ask for their opinion. =)

~~~
the8472
> Yes. Of course, I do not have data nor the foresight to claim that its short
> or long term effects are negligible, but I really doubt if it's not the
> case.

And what if people see your solution and try to build an end-user thing out of
it so that millions of users could run it at home?

When building something in a P2P network you always should consider whether it
would still work if the majority of the network behaved like that.

> If BitTorrent DHT survived those, surely it is more resilient than you tend
> to think.

Yes, the rain forest is also quite resilient. Nothing wrong with logging a few
more trees, right? I'm saying your kind of reasoning is responsible for
tragedy of the commons outcomes. You're not even trying to make your nodes a
good citizen in the network because "others do it too".

\--

Anyway, such behavior can be detected and if it becomes more common other
clients will probably implement blacklisting for it.

~~~
boramalper
> DHT indexing already is possible and done in practice by passively observing
> get_peers queries. But that approach is inefficient, favoring indexers with
> lots of unique IP addresses at their disposal. It also incentivizes bad
> behavior such as spoofing node IDs and attempting to pollute other nodes'
> routing tables.

Source: DHT Infohash Indexing <
[http://www.bittorrent.org/beps/bep_0051.html](http://www.bittorrent.org/beps/bep_0051.html)
>

Figurative speech aside, bittorrent.org acknowledges the need & the issue, and
calls it _bad_ , not " _egregious_ ". The paper[0] mentioned in the BEP 51
proposal describes more or less how magnetico works ("horizontal attack") and
states in conclusion that

> Through an extensive measurement study since December 2010, we have
> identified that both of these attacks are happening in the real network. We
> have analyzed their exact behavior through honeypots and have shown the
> scale of the on-going activities. We must stress that we have no concrete
> proof of actual malicious activities; our work only shows that the scale of
> attacks is large enough for this to be a concern.

Repeating for the last time (as I feel this slowly creeps into a flame war):

1\. As I have said, these technique is already in use. This means that the
network is more resilient than you claim and the chances that a few hundred
(or thousand, in future) magnetico instances will harm the network is near
zero. I could answer in a more direct way if you did not use rain forest
metaphor and addressed the issues you see more directly.

2\. magnetico uses a _similar_ but also _a lot different_ technique. The
_horizontal (Sybil) attack_ requires the offending node to maintain its
identity: "First, the attacker only has to answer two types of messages,
`PING` and `FIND_NODE`; he can ignore every other message."[0] magneticod, in
contrast, answers only `GET_PEERS` and `ANNOUNCE_PEER` queries, which are
_essential_ for the remote node to register itself as a peer downloading the
torrent. magneticod will send hundreds of `FIND_NODE` queries per second[1]
using forged node IDs, _and that 's all_! According to the BitTorrent DHT
protocol[2], "after 15 minutes of inactivity, a node becomes questionable."
Hence, presumably, magneticod should be forgotten & removed from the routing
table after it does not answer several `PING` queries. Consider that
`FIND_NODE` queries that magnetico makes are targetting (at every single time)
totally random nodes, so same set of nodes will not be targeted ever
(statistically speaking).

The problem is, BEP 51 is still a _draft_ and AFAIK no clients in the wild
implements it, so it's simply impractical to write a DHT search-engine
depending on this particular behaviour. Once it becomes more widespread
(hopefully together with BEP 42), magnetico will (and has to) drop its current
technique and support the new and proper one.

[0]:
[https://www.cl.cam.ac.uk/~lw525/publications/security.pdf](https://www.cl.cam.ac.uk/~lw525/publications/security.pdf)

[1]:
[https://github.com/boramalper/magnetico/blob/257912a12f862d9...](https://github.com/boramalper/magnetico/blob/257912a12f862d9f4ca3c5970df57f3a62d29c66/magneticod/magneticod/dht.py#L225)

[2]:
[http://www.bittorrent.org/beps/bep_0005.html](http://www.bittorrent.org/beps/bep_0005.html)

~~~
the8472
> bittorrent.org acknowledges the need & the issue

It does not acknowledge or imply that what you are doing (forged replies) is
necessary. It only acknowledges that indexing happens in the wild and that
when it is happening nodes are currently _incentivized_ to misbehave, that
does not mean they _need_ to misbehave to achieve their goals.

> and calls it bad, not "egregious".

That's nitpicking words, a spec is obviously written in a more conservative
tone. Your implementation is far outside specs, intentionally so to manipulate
the behavior of other clients.

> This means that the network is more resilient than you claim and the chances
> that a few hundred (or thousand, in future) magnetico instances will harm
> the network is near zero.

I have not denied that the network has some resilience or claimed that a
single instance of this code running will destroy the network. I am saying
that your code is misbehaving and greedy, which makes it a bad citizen in a
p2p environment.

> BEP 51 is still a draft

Hardly relevant, many widely deployed BEPs are in draft status.

> AFAIK no clients in the wild implements it

That's not entirely true, but even assuming that it is it is not relevant
either because BEP51 is not the only approach to be a good citizen while
indexing the DHT.

------
wut42
Nice project! Last year I made a similar PoC on Elixir based on the paper
"Crawling DHTs for fun and profit"[1].

In that paper the researcher uses a pool of IPs to introduce multiple nodes
(to cover the hash space). On my PoC I used less IPs but multiple ports per
IP. How does this project manages to cover that ?

[1]:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172.8202)

~~~
boramalper
I too read that paper, but magnetico crawls Mainline DHT (which is the _de
facto_ standard) whilst the paper is on Vuze DHT, though the idea is more or
less the same.

This project uses only _one_ instance of crawler running at the same time, so
one IP and one port for crawling the DHT. Whenever it receives a `GET_PEERS`
query from another peer in the network, it forges itself a new ID that has the
first 15 bytes in common with the info hash in the query, so that the querying
node would think that it found the closest (or a close enough) node to
announce. After responding back, most of the times the same querying node
sends a `ANNOUNCE_PEER` query to register itself as a peer downloading that
particular torrent. Upon receiving `ANNOUNCE_PEER` query, magneticod then
connects the peer using BitTorrent (TCP) protocol, receives the complete
metadata, and closes the connection.

Keep repeating these and you'll end up with thousands of torrents at the end
of each day.

Possibly, utilizing multiple IPs might increase the throughput and prevent IP
based bans, but I didn't bother to be honest, as the current solution seems to
work well enough.

------
NoGravitas
A couple of thoughts on this.

1\. I assume it is "dangerous" to run this on an IP address that is easily
traceable back to you, since it claims, to various peers, to be downloading
all the things, even though it doesn't. Is this correct?

2\. A site, skytorrents.in, was posted to Show HN a while ago, that appears to
be similar to a centrally hosted version of this, in that it builds its index
of magnet links by trawling the DHT rather than relying on user submissions.

~~~
c3833174
1\. I'd expect any decent logging system to actually attempt transferring a
file before marking an IP

2\. BTDigg is/was the most known one, dozens of other followed.

~~~
number6
I expect the content mafia just to sue whoever they can

------
barrystaes
Its not really decentralized. Readme says:

" _magnetico liberates BitTorrent from the yoke of centralised trackers & web-
sites and makes it truly decentralised. Finally!_ "

Source says: (
[https://github.com/boramalper/magnetico/blob/42b2d64d200e2af...](https://github.com/boramalper/magnetico/blob/42b2d64d200e2aff106eb0ce4241d7bcc411ceb5/magneticod/magneticod/dht.py#L31-L34)
)

    
    
        BOOTSTRAPPING_NODES = [
            ("router.bittorrent.com", 6881),
            ("dht.transmissionbt.com", 6881)
        ]
    

Also, i think its going to be very difficult to find anything in the DHT
(spam) chaos without some of the pre-filtering that exists by using torrent
sites. Fun project though..

~~~
setra
>Its not really decentralized

Every DHT has to be bootstrapped from something. If that makes it not
distributed then i'm not sure anything is. You can use whatever bootstrap
nodes you want by changing the values.

~~~
barrystaes
Readme says "truly decentralised" which is the point, not "not distributed".
It certainly is distributed, just not truly. It requires bootstrapping with
prefixed domain names. Bootstrapping is not bad (when done securely), but the
false claim is.

~~~
the8472
DHT bootstrapping is an implementation detail, not part of the protocol. You
don't need DNS or any dedicated bootstrapping mechanism since _any_ node in
the network can be used to join it. You could call your friend and ask for the
IP and port of his bittorrent client.

So the protocol is fully decentralized. Implementations tend to use
bootstrapping mechanisms that rely on an expandable, customizable list of
known bootstrap nodes, but this is not essential.

------
tscs37
This looks rather nice though two things seem to be missing:

* A docker-compose file for easy testing * Screenshots of the WebUI

~~~
boramalper
I have no experience with docker so I couldn't (nor can) create docker-compose
file but the installation is pretty straightforward:

    
    
        pip3 install magneticod magneticow --user
    

Also, I added some screenshots, thanks for the suggestion! =)

~~~
helb
"Dockerizing" Python apps is pretty straightforward, too. This might help if
you ever decide to try it: [https://runnable.com/docker/python/dockerize-your-
python-app...](https://runnable.com/docker/python/dockerize-your-python-
application)

