Thesis: Sites that do well on hacker news will tend to be sites with high quality content.
Tools: Hacker News Big Query, python, Google CSE
Steps:
1. Using HN Big Query, get all unique domains with more than 3 stories with more than 50 points (query link [1]). Sort by percentage of such stories to total number of stories.
By doing that, at the top you will get sites like blog.geoffralston.com that have 3 out of 3 submitted stories get more than 50 points (100% !). Or lucumr.pocoo.org had 46 out of 124 total stories reach 50+ points! Talking about good writing.
We cut the list at 2,500 sites , where the popular to submitted ratio is still at enviable 12%.
2. Add to this list all sites that had exactly one submission and that only submission ever from that domain had 300+ points on HN. I call them unexplored one hit wonders and thesis is that there are probably other gems on the domain just not ssubmitted yet. [2]
3. Now we have about 3,000 sites total. We will use Google CSE engine which allows up to 2,000 sites through annotations [3]. We have to clean the data now.
- Check if the domain still resolves. Sadly about 400 these high quality sites do not anymore.
- Check for redirects
For example this site is no longer on its old address:
https://david.weebly.com ... 302 Found (0.153)
http://www.david.blog/ ... 200 OK (0.0676)
- Check for all other sorts of weird errors
This took most of the day :) I used modified version of [4]
4. Manually clean the list from news sites that made it on (nytimes, usatoday...)
5. If you want to check if your site is on the list, check [5]. If you are on the list congrats!
5. Finally, here is the end result:
https://cse.google.com/cse?cx=014479775183020491825:c2lrlzrogb5
Search cream of the crop of HN submitted sites!
Let me know if you find this useful!
[0] https://cse.google.com/cse?cx=014479775183020491825:c2lrlzro...
[1] https://console.cloud.google.com/bigquery?sq=217608811855:37...
[2] https://console.cloud.google.com/bigquery?sq=217608811855:67...
[3] https://developers.google.com/custom-search/docs/annotations
[4] https://github.com/amgedr/webchk
[5] https://docs.google.com/spreadsheets/d/1ON26TVBUHH4FZuvH8YFQ...