Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Nebula – A network agnostic DHT crawler (github.com/dennis-tra)
68 points by dennis-tra 8 months ago | hide | past | favorite | 22 comments



I'm sure that this just because I'm not the target audience, so intend only the very gentlest criticism. But I literally LOLed at how completely incomprehensible this README was for me. It has really been a while since I've read a paragraph and had literally no idea what it was talking about. But here's the winner:

> A network agnostic DHT crawler and monitor. The crawler connects to DHT bootstrappers and then recursively follows all entries in their k-buckets until all peers have been visited.

Following the Wikipedia link for "DHT" yielded some clues. (Ah. Distributed hash table.) But I've still been looking at this for several minutes now and am basically just puzzled. But the graphs are pretty! Reading the word "amino" a little further down threw me off the scent for a bit. But I gather that is actually a proper noun, and we aren't really talking about proteins here.

Maybe an initial sentence that makes fewer assumptions about the reader's familiarity with the jargon would be helpful.


A DHT is a decentralized key-value database, it's most famous use being in the bittorrent protocols, it uses a routing algorithm to guarantee that you can find the peers that can retrieve the value of a known key, granted that you at least know one peer in the network (even if that peer doesn't know the value). Essentially the network is split into buckets and it guarantees that you'll either will be already connected to a peer that knows the value for the key, or that that peer knows a peer whose bucket is closer to the key, you can then recursively ask for peers that are closer and closer until you find one that knows the key, as you do this search you keep track of the peers so the next time you ask for another key you're more likely to know a peer that is closer to it. A typical DHT implementation has you keep track of hundreds of peers to guarantee the robustness of the network.

One issue is that peers go offline and online all the time, so the network is ever changing, if you turn off your client for a week and then come back, your only hope is that at least one of the peers you know is still online, if that's the case then that's fine, if that's not the case, or you're starting the client for the first time, then there's no way for you to connect to the network and query for keys. In bittorrent this is not an issue as most torrents include trackers, the original centralized way of finding peers on the network, but it seems that each project listed on this page has it's own separate DHT network that doesn't connect to the main network (the one used by bittorrent), so for you to connect to these networks for the first time you need to use a bootstrap peer, this is just a normal peer on the network that is known to be always online, usually hosted by owner of the project, and it'll give you a starting point to find other peers in the network.

What this project does in essence, is connect to a bootstrap peer, then use the properties of the routing algorithm to efficiently find out all the peers that are currently online.


While I agree, there is a whole generation of people who know what a DHT is, it's not really that obscure.

I'm talking about of course very late 90s, 00's P2P file sharing, Kademlia, torrents but also later "eventually consistent" databases ( remember those? )

The crypto 20 somethings hype beasts came way later.


When the choice is to either quickly explain why a potential user should use your project or teaching someone without domain knowledge what the project is I would choose the former every time.

The part you quote even provides links to DHT and k-buckets if you want more information. That first paragraph is important real-estate and shouldn't be wasted on someone who most likely has no use of the project.

Every domain has these words that sound alien to newcomers and you can't explain them to everyone all the time.


The Kademlia DHT paper is a pretty straightforward description of a DHT - https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia...


BitTorrent's BEP regarding DHT also explains how it works quite clearly. The implementation is based on Kademlia and can be found here: https://www.bittorrent.org/beps/bep_0005.html


This is not a particularly egregious example, but it's kind of spectacular how everything crypto adjacent revels in technobabble.

The detractors of the ecosystem (myself included, to be honest) will be quick to point out that obfuscating the tech as magic as much as possible, as well as creating an inside group lingo, is key to onboarding and retaining people into it. But it's fascinating how that percolates throughout the dev community behind it as well.


I feel the same way about AI/LLM lingo


Unlucky naming collision with Slack’s networking tool Nebula: https://github.com/slackhq/nebula


And the Nebula streaming platform. Which is unfortunate because I'm using both.

I get it, nebulae are cool.


Not forgetting the awesome Nebula game engine


Oh no, what of an unfortunate event. The slack tool uses the name already used by OpenNebula. /s


Not to mention they jinxed Slack(ware) for many.


It isn't really network-agnostic... in fact it doesn't support the (by far) largest DHT out there, the Mainline DHT that bittorrent uses.

This is just a crawler for DHTs that use IPFS's implementation, or at least smell very similar to it.


Why is BitTorrent not supported? Perhaps I'm misunderstanding this thing but it seems like application #1.


My guesses:

a) Many other tools exist for that.

b) Bittorrent DHT modes are simple and interchangeable. They can give you a list of peer addresses associated with certain (torrent) hash — and only if you know the exact hash. Even client versions can't generally be collected (apart from some protocol extensions). The only thing you learn about DHT member is that it exists. On the contrary, this project is for heterogeneous networks in which peers announce various services.

c) Number of Bittorrent DHT nodes is… bigger.

d) To collect interesting data from Bittorrent DHT, one needs to observe as much third party torrent hash requests as possible. To do that, multiple nodes are needed. Moreover, they need to run for a long time, not just because it takes time to make a lot of requests to a lot of nodes, but also because of external preference for long-running nodes. Not sure how important it is, but, anecdotally, a fresh DHT node sees twice as much requests after a week than after a day.


Looks like Nebula uses go-libp2p and all of the supported networks listed in the README use libp2p for their p2p networking. Mainline DHT doesn't support the same transport protocols that libp2p supports (such ash TCP+Yamux+Noise) which is probably why Nebula doesn't support Bittorrent


Because this seems to cater to the cryptocurrency/blockchain culture.


Because it isn't really network-agnostic.

It only supports IPFS and derivitaves thereof.


/me remembers various DHT views, traffic flows, client stats, graphs and other data decorations in Azureus. Now that's what I call a dashboard.


Can someone explain why we want to crawl and/or monitor? What is this used for?

When I think of a crawler, I think of a non-homogonous network (if that is the right term).

But with the blockchain, isn't it the case that each node has an entire copy of the blockchain, so you don't need to "crawl" it, it works more like a database.

What am I not understanding about this?


Instead of everyone crawling on their own, isn't it more efficient if everyone shared the same index somehow?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: