Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Concept search engine based on most popular HN submissions
7 points by freediver 12 days ago | hide | past | web | favorite | 1 comment
Thesis: Sites that do well on hacker news will tend to be sites with high quality content.

Tools: Hacker News Big Query, python, Google CSE

Steps:

1. Using HN Big Query, get all unique domains with more than 3 stories with more than 50 points (query link [1]). Sort by percentage of such stories to total number of stories.

By doing that, at the top you will get sites like blog.geoffralston.com that have 3 out of 3 submitted stories get more than 50 points (100% !). Or lucumr.pocoo.org had 46 out of 124 total stories reach 50+ points! Talking about good writing.

We cut the list at 2,500 sites , where the popular to submitted ratio is still at enviable 12%.

2. Add to this list all sites that had exactly one submission and that only submission ever from that domain had 300+ points on HN. I call them unexplored one hit wonders and thesis is that there are probably other gems on the domain just not ssubmitted yet. [2]

3. Now we have about 3,000 sites total. We will use Google CSE engine which allows up to 2,000 sites through annotations [3]. We have to clean the data now.

- Check if the domain still resolves. Sadly about 400 these high quality sites do not anymore.

- Check for redirects

For example this site is no longer on its old address:

https://david.weebly.com ... 302 Found (0.153) http://www.david.blog/ ... 200 OK (0.0676)

- Check for all other sorts of weird errors

This took most of the day :) I used modified version of [4]

4. Manually clean the list from news sites that made it on (nytimes, usatoday...)

5. If you want to check if your site is on the list, check [5]. If you are on the list congrats!

5. Finally, here is the end result:

https://cse.google.com/cse?cx=014479775183020491825:c2lrlzrogb5

Search cream of the crop of HN submitted sites!

Let me know if you find this useful!









Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: