The year 2000 is special, within the same year, Gnutella, eDonkey and Freenet were all released its initial version, as if the P2P network "emerged" from the Internet. By 2004, eMule, I2P and BitTorrent (and BTW, Tor) were all here. At the very beginning, the first generation of P2P, Gnutella was suffering from its scalability problem - its simple unstructured, flooding protocol needs a number of O(n) queries, and becomes unstable once the users exceeded millions.
Then, Kademlia almost saved P2P. BitTorrent, I2P, eD2k and many other P2P protocols are all Kad-based. For example, in eD2k, Kad allowed it to have P2P resource discovery, and also an efficient decentralized keyboard search engine (although there is spam, but mostly works, and in BitTorrent we still don't have it), all with only a O(log(n)) query, The P2P mechanism of I2P is also exclusively based on Kad.
The authors of this paper are the few people who have made great contributions to the Internet.
In contrast, Chord, which was published 1 year earlier in 2001 is one of the most cited papers in computer science. However, Chord was not practical. It assumed the availability of bi-directional connections between any two hosts. However, NATs had just started to appear, which broke Chord. Kademlia, which does not require peers to maintain a well-defined routing structure (the ring in Chord) can muddle through with NATs - because the routing table is a huge messy tree and if any branches lead you to the destination, it will kind of work.
The other point of note is that Kademlia was not designed to work in the presence of NATs. That was a feature whose value was only appreciated when ISPs ran out of open IP addresses and NAT'd up whole networks. It shows the value of exploratory, basic research.
It turns out that global consensus is not required to maintain accounting balance in a global-scale ledger. DHTs and their validation of the data they host is a big part of maintaining the invariant of the system.
Clients often broadcast on the local lan to find peers that way.
Also other methods are common, for instance bittorrent clients often use the PEX standard to exchange peers, those peers are often on the same DHT, although I think there might be 2 distinct DHT networks among the popular bittorrent clients.
Worst case you could start randomly walking the IPv4 space, given that there's millions of active clients it wouldn't be long before you found one. Likely only take you a few 1000 random IPs before you found someone on the DHT.
At NuCypher, we found that, all the way to the core of Kademlia, there were some "inter-planetary" assumptions that, while both mindblowing and fit for many use-cases, didn't fit our "intra-planetary" distributed notion of network topology.
If Kademlia interests you, you might also be interested in this overview of what we did and why:
There are solutions which involve making it hard to generate IDs (I've seen it done with node IDs generated using hash-based proof of work) but nowadays with all the cheap ASIC miners, you could compute these hashes much faster than any regular computer could; this means that node IDs need to be generated using top-of-the-range ASICs in order to make the network truly secure; this adds a lot of complexity when it comes to deploying a node. I don't believe that any of the existing Kademlia-based cryptocurrencies currently require sufficiently complex node IDs.
: https://github.com/jmcvetta/cspace (The only copy of it, I could find online)
I'm a bit dubious about where the name "Kademlia" came from :-)
I've been searching for a while, for a DHT (actually a distributed graph database) that self optimizes based on usage. Basically, I want data that's frequently accessed together to get close to eachother and to nodes that frequently access it.
This requires an intermediary system with very high availability and performance, which in P2P can only be implemented by duplicating data on a dozen or so systems.
Since files must be private and there is no commonality like in BitTorrent, it follows that running a P2P Dropbox would consume 5-10 times the storage and bandwidth of it's centralized counterpart for the same useful service.
1 People need to incentives to join a p2p network and certainly to contribute storage+bandwidth
1* Reliably and privacy issues makes this a bigger problem (though p2p may also be the solution, it's not easily achievable)
2 This can't easily happen without investment. Marketing + setup without "hassles" is not cheap
2* Given investment, what are the assets of the investors?