Hacker News new | past | comments | ask | show | jobs | submit login
Kademlia: A Peer-To-peer Information System Based on the XOR Metric (2002) [pdf] (mit.edu)
191 points by liamzebedee 3 months ago | hide | past | web | favorite | 40 comments

I'm familiar to this paper because I was using eMule. eMule has a direct citation to this paper in its "help" window, otherwise, the decentralization of eD2k would not be possible. The protocol is old, but still impressive.

The year 2000 is special, within the same year, Gnutella, eDonkey and Freenet were all released its initial version, as if the P2P network "emerged" from the Internet. By 2004, eMule, I2P and BitTorrent (and BTW, Tor) were all here. At the very beginning, the first generation of P2P, Gnutella was suffering from its scalability problem - its simple unstructured, flooding protocol needs a number of O(n) queries, and becomes unstable once the users exceeded millions.

Then, Kademlia almost saved P2P. BitTorrent, I2P, eD2k and many other P2P protocols are all Kad-based. For example, in eD2k, Kad allowed it to have P2P resource discovery, and also an efficient decentralized keyboard search engine (although there is spam, but mostly works, and in BitTorrent we still don't have it), all with only a O(log(n)) query, The P2P mechanism of I2P is also exclusively based on Kad.

The authors of this paper are the few people who have made great contributions to the Internet.

I love how people are rediscovering the previous generation of P2P software from 15-20 years ago. Reading through these papers was a huge part of my college procrastination; maybe now that cryptocurrencies give a way to provide economic incentives to those who contribute computing power, we might see a renaissance of some of these older ideas.

Interestingly, the original Kademlia paper did not make it into any of the big p2p conferences at the time. It was published at some 2nd tier p2p workshop. It was only when the system gained actual adoption, that people referenced the paper.

In contrast, Chord, which was published 1 year earlier in 2001 is one of the most cited papers in computer science. However, Chord was not practical. It assumed the availability of bi-directional connections between any two hosts. However, NATs had just started to appear, which broke Chord. Kademlia, which does not require peers to maintain a well-defined routing structure (the ring in Chord) can muddle through with NATs - because the routing table is a huge messy tree and if any branches lead you to the destination, it will kind of work.

Being the engine behind trackerless BitTorrent gave it quite some exposure too. I do not know if it was before or after it became known in academia.

My point was that it became the engine behind trackerless BT because it worked in the presence of NATs - even if other architectures, like Chord, had superior properties (if you assume an unrealistic model of the open internet).

The other point of note is that Kademlia was not designed to work in the presence of NATs. That was a feature whose value was only appreciated when ISPs ran out of open IP addresses and NAT'd up whole networks. It shows the value of exploratory, basic research.

It is a key part of the implementation of Holochain (https://holo.host).

It turns out that global consensus is not required to maintain accounting balance in a global-scale ledger. DHTs and their validation of the data they host is a big part of maintaining the invariant of the system.

I have been excited about this project for quite some time. They recently released their Rust client, although without networking. I cannot wait to play with it further as it holds the most promise amongst all "Web3" decentalized platforms.

Kademlia is still in widespread use today. For example Bittorrent uses it to resolve magnet links.

IPFS is using kademlia, and iirc zeronet are still working on it. I don't think there currently are many alternatives actually being used. Haven't seen many projects using chord or pastry these days.

Ethereum also uses Kademlia for its DevP2P mechanism. We recently implemented this in Exthereum/Mana.

Quite handy for any program to find a copy of itself running anywhere on the internet. It's the best solution I've found for peer discovery that doesn't depend on a central host.

There's a few implementations for most platforms. At least Java, Javascript, C++, C, and Go anyways.

Doesn't Kad require an initial bootstrap node? How do you find it?

Generally if you downloaded it from somewhere/someone you can get a working node from them. Often binaries have a few boostrap hosts hardwired, but then save state once launched. Typically a host tracks several hosts per bucket, and a bucket for each log(n) of the population. So with a million hosts your client will track log(million)*4 or similar. Those are usually written to disk when you close, so you can try those when you launch next.

Clients often broadcast on the local lan to find peers that way.

Also other methods are common, for instance bittorrent clients often use the PEX standard to exchange peers, those peers are often on the same DHT, although I think there might be 2 distinct DHT networks among the popular bittorrent clients.

Worst case you could start randomly walking the IPv4 space, given that there's millions of active clients it wouldn't be long before you found one. Likely only take you a few 1000 random IPs before you found someone on the DHT.

Yes but that doesn't require a central host. If your friend is connected to the network you can join it via them.

If I've just downloaded SomeCoolNewTool that's using the network, are there bootstrap hosts pre-loaded? Who's my "friend" in that case? Are there trust issues here?

In practice, SomeCoolNewTool will be bundled with some seed hosts.

I still miss the first Napster and later the OpenNap network with its OSS clients such as Lopster. Way flawed and not scalable at all compared to BitTorrent, but it allowed 1 to 1 connections between people, so that one day I could browse a random guy shared hard drive because he had some interesting music, see a band named "Spock's Beard" and being a trekkie myself, albeit a moderate one, I simply had to hear what they sounded like, and after an eternity due to then slow connections, thanks to other bands discovered the same way, I realized that progressive rock was (and still is) even more alive today than in the 70s, although obviously not mainstream. So a huge thankyou to the old flawed and slow Napster as well.

Kademlia is also an interesting option for a small and resilient key-value store between a handful of hosts.

DHT: Distributed Hash Table


well, In theory you can store (and fetch) the data directly, rather than just using it as a routing table.

Of course, that's why I wrote "small key-value store"...

Kademlia is awesome and is, of course, the basis of libp2p-kad-dht and the DevP2P project that many decentralized projects are using at the moment.

At NuCypher, we found that, all the way to the core of Kademlia, there were some "inter-planetary" assumptions that, while both mindblowing and fit for many use-cases, didn't fit our "intra-planetary" distributed notion of network topology.

If Kademlia interests you, you might also be interested in this overview of what we did and why:


Huge fan of Nucypher. Recently At one hackathon, I used your proxy re-encryption to build decentralised KYC system. Amazing work you guys are doing!

A lot of cryptocurrency projects are using Kademlia but it's prone to vulnerabilities such as eclipse attacks. Because peer connections are made according to deterministic rules (e.g. often based on unique Node IDs), the topology of the network can be analyzed and then the information can be exploited to disrupt the network.

There are solutions which involve making it hard to generate IDs (I've seen it done with node IDs generated using hash-based proof of work) but nowadays with all the cheap ASIC miners, you could compute these hashes much faster than any regular computer could; this means that node IDs need to be generated using top-of-the-range ASICs in order to make the network truly secure; this adds a lot of complexity when it comes to deploying a node. I don't believe that any of the existing Kademlia-based cryptocurrencies currently require sufficiently complex node IDs.

I wrote a video-sharing thing during a hackathon based on Kademlia. It's one of my favorite algorithms and I found the paper to be easier to read than most papers based on algorithms. Here's a video of a talk I gave on it:


Recently I figured Kademlia is how bittorrent achieves trackerless torrents. I used bittorrent-dht[0][1] to build simple decentralised ride hailing app(was able implement POC with end to end booking). I wonder what other modern problems can be solved by this technology

[0]: https://github.com/webtorrent/bittorrent-dht

[1]: http://www.bittorrent.org/beps/bep_0005.html

Over a decade ago, I was a part of a small team at a startup that built a p2p messenger called cSpace[1], I fondly remember reading the paper and implementing a Kademlia DHT for it. IIRC, the bittorrent DHT was the only open source Python implementation of it that we could find, it was using Twisted for networking and quite complicated, we decided to keep things simple and rewrite it ourselves. The resulting implementation[2] is quite simple and elegant, IMO.

[1]: https://github.com/jmcvetta/cspace (The only copy of it, I could find online) [2]: https://github.com/jmcvetta/cspace/tree/master/cspace/dht

(2007), if not earlier, FWIW. I used this protocol in a college final project ("make a botnet") years ago :-).

2002 according to https://en.wikipedia.org/wiki/Kademlia - and this is a nice article for the layman (who still understands basic CS) to understand it.

I'm a bit dubious about where the name "Kademlia" came from :-)

> Where does the word “kademlia” come from? What I know is—it is a Turkish word for a “lucky man” and, more importantly, is the name of a mountain peak in Bulgaria.


Interestingly, Kademlia ended up being a major component in lots of botnets for a time like https://en.m.wikipedia.org/wiki/Storm_botnet

Has any progress in the P2P a been made since Kademlia?

I've been searching for a while, for a DHT (actually a distributed graph database) that self optimizes based on usage. Basically, I want data that's frequently accessed together to get close to eachother and to nodes that frequently access it.

in addition to a DHT, you need another layer that can map key names to their hashes. ideally this is also something you can store in DHT (using ipns / mutable keys). once you have a (distributed) key-value store, you can build pretty much any kind of database[1]. this is known to work well for hadoop-style workloads[2] and i'd expect you could get similar performance for a p2p graph db.

[1]: https://apple.github.io/foundationdb/layer-concept.html

[2]: https://news.ycombinator.com/item?id=15543596

I don't understand inspite of having working p2p protocol why we don't yet have an good p2p alternative to dropbox that just works without any setup hassles. Closest working thing I know of is BitSync but that too requires an mediator server.

Because Dropbox is a file storage service, not a file distribution system. The typical use case is that an uploaded large file will be downloaded a single time by someone else, at different times when computers are online.

This requires an intermediary system with very high availability and performance, which in P2P can only be implemented by duplicating data on a dozen or so systems.

Since files must be private and there is no commonality like in BitTorrent, it follows that running a P2P Dropbox would consume 5-10 times the storage and bandwidth of it's centralized counterpart for the same useful service.

IMO. The p2p business model is problematic in this case.

1 People need to incentives to join a p2p network and certainly to contribute storage+bandwidth

1* Reliably and privacy issues makes this a bigger problem (though p2p may also be the solution, it's not easily achievable)

2 This can't easily happen without investment. Marketing + setup without "hassles" is not cheap

2* Given investment, what are the assets of the investors?

I'm out of the loop for a few years now, but isn't NAT hole punching still one of the biggest problems? AFAIK, there is still no reliable solution that works without an accessible 3rd party server.

Ipfs doesn't take much setup, but it's global sharing - so you'll have to bring your own access control (eg: encrypt the files, and use à key distribution/sharing scheme of some sort).

Attended a "Papers we Love" presentation on this paper last year.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact