Hacker News new | comments | show | ask | jobs | submit login

$ cat alexa myset myset | sort | uniq -u | wc -l

773733

0.77M of Alexa top 1M were not in my list.

$ cat alexa alexa myset | sort | uniq -u | wc -l

25842205

I mined 25,842,205 additional domain names.




Did you consider using the gTLD zone files (from the respective registries) and the ccTLD zone files found @ http://viewdns.info/data/? A much bigger initial dataset than 25M domains right there?


No, getting access will probably take a couple of days (or in case of viewdns, more than 100$) and thereby all the fun out of the project. If you know of any other way to get the list I'd be happy to hear it though!


http://meanpath.com/freedirectory.html

Feel free to grab a copy of our domain list. The "All domains with NS records" is the one you want. Has 191 million in it.


Wow! That's awesome!


Amazing, thanks!


A shortcut to getting com, net, info, org, us, sk, and biz is to give premiumdrops.com $24.95/mo. You can get these for free from the TLD operators, but it takes a few weeks of snail mail (last I checked). The gTLD access via CZDAP is free, but takes a few days for approvals to process.


https://czds.icann.org/en It has been largely automated now so you can request access to the files with one click vs having to sign and email hundreds of forms. Approval seems to be automatic for most of them.


man.. the internet really is full of crappy domains...

(and yes.. now i see that you mentioned it in the article.. took me time to get there)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: