
Kademlia: A Peer-To-peer Information System Based on the XOR Metric (2002) [pdf] - liamzebedee
https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf
======
bcaa7f3a8bbc
I'm familiar to this paper because I was using eMule. eMule has a direct
citation to this paper in its "help" window, otherwise, the decentralization
of eD2k would not be possible. The protocol is old, but still impressive.

The year 2000 is special, within the same year, Gnutella, eDonkey and Freenet
were all released its initial version, as if the P2P network "emerged" from
the Internet. By 2004, eMule, I2P and BitTorrent (and BTW, Tor) were all here.
At the very beginning, the first generation of P2P, Gnutella was suffering
from its scalability problem - its simple unstructured, flooding protocol
needs a number of O(n) queries, and becomes unstable once the users exceeded
millions.

Then, Kademlia almost saved P2P. BitTorrent, I2P, eD2k and many other P2P
protocols are all Kad-based. For example, in eD2k, Kad allowed it to have P2P
resource discovery, and also an efficient decentralized keyboard search engine
(although there is spam, but mostly works, and in BitTorrent we still don't
have it), all with only a O(log(n)) query, The P2P mechanism of I2P is also
exclusively based on Kad.

The authors of this paper are the few people who have made great contributions
to the Internet.

------
nostrademons
I love how people are rediscovering the previous generation of P2P software
from 15-20 years ago. Reading through these papers was a huge part of my
college procrastination; maybe now that cryptocurrencies give a way to provide
economic incentives to those who contribute computing power, we might see a
renaissance of some of these older ideas.

~~~
gst
Kademlia is still in widespread use today. For example Bittorrent uses it to
resolve magnet links.

~~~
sliken
Quite handy for any program to find a copy of itself running anywhere on the
internet. It's the best solution I've found for peer discovery that doesn't
depend on a central host.

There's a few implementations for most platforms. At least Java, Javascript,
C++, C, and Go anyways.

~~~
voltagex_
Doesn't Kad require an initial bootstrap node? How do you find it?

~~~
adrusi
Yes but that doesn't require a central host. If your friend is connected to
the network you can join it via them.

~~~
voltagex_
If I've just downloaded SomeCoolNewTool that's using the network, are there
bootstrap hosts pre-loaded? Who's my "friend" in that case? Are there trust
issues here?

~~~
cf
In practice, SomeCoolNewTool will be bundled with some seed hosts.

------
jMyles
Kademlia is awesome and is, of course, the basis of libp2p-kad-dht and the
DevP2P project that many decentralized projects are using at the moment.

At NuCypher, we found that, all the way to the core of Kademlia, there were
some "inter-planetary" assumptions that, while both mindblowing and fit for
many use-cases, didn't fit our "intra-planetary" distributed notion of network
topology.

If Kademlia interests you, you might also be interested in this overview of
what we did and why:

[https://blog.nucypher.com/why-were-rolling-our-own-intra-
pla...](https://blog.nucypher.com/why-were-rolling-our-own-intra-planetary-
node-discovery-at-nucypher-beeb53018b0)

~~~
mohitmun
Huge fan of Nucypher. Recently At one hackathon, I used your proxy re-
encryption to build decentralised KYC system. Amazing work you guys are doing!

------
jondubois
A lot of cryptocurrency projects are using Kademlia but it's prone to
vulnerabilities such as eclipse attacks. Because peer connections are made
according to deterministic rules (e.g. often based on unique Node IDs), the
topology of the network can be analyzed and then the information can be
exploited to disrupt the network.

There are solutions which involve making it hard to generate IDs (I've seen it
done with node IDs generated using hash-based proof of work) but nowadays with
all the cheap ASIC miners, you could compute these hashes much faster than any
regular computer could; this means that node IDs need to be generated using
top-of-the-range ASICs in order to make the network truly secure; this adds a
lot of complexity when it comes to deploying a node. I don't believe that any
of the existing Kademlia-based cryptocurrencies currently require sufficiently
complex node IDs.

------
tombert
I wrote a video-sharing thing during a hackathon based on Kademlia. It's one
of my favorite algorithms and I found the paper to be easier to read than most
papers based on algorithms. Here's a video of a talk I gave on it:

[https://www.youtube.com/watch?v=byIFT8OzfX0](https://www.youtube.com/watch?v=byIFT8OzfX0)

------
mohitmun
Recently I figured Kademlia is how bittorrent achieves trackerless torrents. I
used bittorrent-dht[0][1] to build simple decentralised ride hailing app(was
able implement POC with end to end booking). I wonder what other modern
problems can be solved by this technology

[0]: [https://github.com/webtorrent/bittorrent-
dht](https://github.com/webtorrent/bittorrent-dht)

[1]:
[http://www.bittorrent.org/beps/bep_0005.html](http://www.bittorrent.org/beps/bep_0005.html)

------
woadwarrior01
Over a decade ago, I was a part of a small team at a startup that built a p2p
messenger called cSpace[1], I fondly remember reading the paper and
implementing a Kademlia DHT for it. IIRC, the bittorrent DHT was the only open
source Python implementation of it that we could find, it was using Twisted
for networking and quite complicated, we decided to keep things simple and
rewrite it ourselves. The resulting implementation[2] is quite simple and
elegant, IMO.

[1]: [https://github.com/jmcvetta/cspace](https://github.com/jmcvetta/cspace)
(The only copy of it, I could find online) [2]:
[https://github.com/jmcvetta/cspace/tree/master/cspace/dht](https://github.com/jmcvetta/cspace/tree/master/cspace/dht)

------
loeg
(2007), if not earlier, FWIW. I used this protocol in a college final project
("make a botnet") years ago :-).

~~~
quickthrower2
2002 according to
[https://en.wikipedia.org/wiki/Kademlia](https://en.wikipedia.org/wiki/Kademlia)
\- and this is a nice article for the layman (who still understands basic CS)
to understand it.

I'm a bit dubious about where the name "Kademlia" came from :-)

~~~
loeg
> Where does the word “kademlia” come from? What I know is—it is a Turkish
> word for a “lucky man” and, more importantly, is the name of a mountain peak
> in Bulgaria.

[http://www.maymounkov.org/kademlia](http://www.maymounkov.org/kademlia)

------
cf
Interestingly, Kademlia ended up being a major component in lots of botnets
for a time like
[https://en.m.wikipedia.org/wiki/Storm_botnet](https://en.m.wikipedia.org/wiki/Storm_botnet)

------
miguelrochefort
Has any progress in the P2P a been made since Kademlia?

I've been searching for a while, for a DHT (actually a distributed graph
database) that self optimizes based on usage. Basically, I want data that's
frequently accessed together to get close to eachother and to nodes that
frequently access it.

~~~
xj9
in addition to a DHT, you need another layer that can map key names to their
hashes. ideally this is also something you can store in DHT (using ipns /
mutable keys). once you have a (distributed) key-value store, you can build
pretty much any kind of database[1]. this is known to work well for hadoop-
style workloads[2] and i'd expect you could get similar performance for a p2p
graph db.

[1]: [https://apple.github.io/foundationdb/layer-
concept.html](https://apple.github.io/foundationdb/layer-concept.html)

[2]:
[https://news.ycombinator.com/item?id=15543596](https://news.ycombinator.com/item?id=15543596)

------
new12345
I don't understand inspite of having working p2p protocol why we don't yet
have an good p2p alternative to dropbox that just works without any setup
hassles. Closest working thing I know of is BitSync but that too requires an
mediator server.

~~~
yholio
Because Dropbox is a file storage service, not a file distribution system. The
typical use case is that an uploaded large file will be downloaded a single
time by someone else, at different times when computers are online.

This requires an intermediary system with very high availability and
performance, which in P2P can only be implemented by duplicating data on a
dozen or so systems.

Since files must be private and there is no commonality like in BitTorrent, it
follows that running a P2P Dropbox would consume 5-10 times the storage and
bandwidth of it's centralized counterpart for the same useful service.

------
dmourati
Attended a "Papers we Love" presentation on this paper last year.

