Hacker News new | comments | show | ask | jobs | submit login

Can you comment on how many additional domains you mined - compared (for example) to the 1M domains from alexa top-1M

$ cat alexa myset myset | sort | uniq -u | wc -l


0.77M of Alexa top 1M were not in my list.

$ cat alexa alexa myset | sort | uniq -u | wc -l


I mined 25,842,205 additional domain names.

Did you consider using the gTLD zone files (from the respective registries) and the ccTLD zone files found @ http://viewdns.info/data/? A much bigger initial dataset than 25M domains right there?

No, getting access will probably take a couple of days (or in case of viewdns, more than 100$) and thereby all the fun out of the project. If you know of any other way to get the list I'd be happy to hear it though!


Feel free to grab a copy of our domain list. The "All domains with NS records" is the one you want. Has 191 million in it.

Wow! That's awesome!

Amazing, thanks!

A shortcut to getting com, net, info, org, us, sk, and biz is to give premiumdrops.com $24.95/mo. You can get these for free from the TLD operators, but it takes a few weeks of snail mail (last I checked). The gTLD access via CZDAP is free, but takes a few days for approvals to process.

https://czds.icann.org/en It has been largely automated now so you can request access to the files with one click vs having to sign and email hundreds of forms. Approval seems to be automatic for most of them.

man.. the internet really is full of crappy domains...

(and yes.. now i see that you mentioned it in the article.. took me time to get there)

The Sonar FDNS set contains about 1.4 billion host names (50m+ domains). The FDNS set is seeded from TLD zones, CZDAP, PTR lookups (RDNS), SSL/TLS scans, and HTTP link extraction. It updates every two weeks: https://github.com/rapid7/sonar/wiki/Forward-DNS

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact