
I did the math: here are the 50 best HackerNews posts of all time - fluxic
https://medium.com/swlh/best-of-2015-pfffffffft-79d9b014f4de
======
fluxic
Hey all. My partner and I scraped the entire database of HackerNews (~9m
submissions and comments) and found the links that reappeared multiple times.
We discounted any submission with less than 10 points, and filtered out
product launches and GitHub repos. The end result: a list of the 50 best blog
posts/articles that have appeared on HackerNews, in descending order of
submissions (the story with the most submissions had 6).

I know that HackerNews is heavily weighted towards news (and not old, timeless
content), but I wanted to see if certain stories have nevertheless been
"canonized" on the site. As it turns out, there are a select few (~3500) that
have been resubmitted twice or more to HN. Those numbers are obviously biased
against recent stories, which have not had enough time to reappear on the
site.

I also compared submission domains to see who was posting the most
"resubmitted" content. As it turns out, independent bloggers like PG and Matt
Might outperform mass media outlets like TechCrunch and Bloomberg in the
number of stories that have been submitted 2+ times to HN.

While the data analysis isn't perfect (we didn't account for subdomain usage
[bits.blogs.nytimes.com vs nytimes.com], and if URLs had weird parameters
appended to the end, we didn't count them as duplicates). Nevertheless, some
really cool insights from a week's worth of work.

As a bonus, the stories that made our list are absolutely incredible.
Definitely check them out.

~~~
sdiq
Very informative pieces for those like me who missed most of the submissions
in the first place.

Great work!

------
touristtam
Is there no correlation between points and number of comments? That would be
interesting to have some analysis on this, drilling further down and raking
the quality of the comments. Or maybe this is too much? ;)

~~~
ColinWright
The question I would like to see investigated is the hypothesis that the best
posts are those with a large value of (#points)/(1+#comments). I've used that
for years as my measure of when an article that would otherwise have been
missed should be read.

~~~
fluxic
Yeah, that's also another heuristic that works (and one I use myself when
browsing HackerNews in real-time). The only reason I required multiple
submissions was to filter out posts that were of fleeting interest (product
launches, or news articles that didn't manage to sustain HN's interest over a
long period of time).

