

What Makes Hacker News Fame? - jonathonmorgan
http://goodattheinternet.com/2015/02/13/getting-to-the-hacker-news-front-page/

======
fa
This seems like a dramatic illustration of cumulative advantage, as studied in
the famous MusicLab experiment [1, 2]. The argument is that in cultural and
social markets, random effects govern which products or artifacts get the
initial few "upvotes" (or their analogs), at which point the rich-get-richer
dynamic takes over. Very, very awesome.

[1] Salganik, Dodds, Watts, "Experimental Study of Inequality and
Unpredictability in an Artificial Cultural Market", 2006:
[https://www.princeton.edu/~mjs3/salganik_dodds_watts06_full....](https://www.princeton.edu/~mjs3/salganik_dodds_watts06_full.pdf)

[2] A popular article by one of the authors, the inestimable Duncan Watts:
[http://www.nytimes.com/2007/04/15/magazine/15wwlnidealab.t.h...](http://www.nytimes.com/2007/04/15/magazine/15wwlnidealab.t.html?_r=0&pagewanted=print)

~~~
marketforlemmas
At the risk of plugging my own work, I did a follow-up to the (really awesome)
Watt's experiment using data from reddit and Hacker News:

[http://arxiv.org/abs/1501.07860](http://arxiv.org/abs/1501.07860)

The work isn't complete yet (always more to do) but the TL;DR is:

1\. Yes, randomness governs a lot of article outcomes. Whether something hits
the front page or not is pretty arbitrary.

2\. However, conditioned on making the front page, popularity is actually a
good reflection of "intrinsic quality". I think the ultimate relationship
between popularity and quality is stronger than the MusicLab experiment
suggests.

~~~
fa
I like this a lot! If I allow myself explanation, I'd say your findings make a
lot of sense, but I will instead heed the title of Duncan Watts' latest,
"Everything is Obvious*: Until You Know the Answer" :)

I wonder how much engagement you'd get if you made a browser plugin or even an
alternative website that showed users a random selection from the top three
pages of reddit/HN, then intercepted and logged their upvotes, to get a direct
measure of intrinsic quality, rather than estimating this statistically. I for
one would use such an interface.

On a sidenote. Have you seen this work in predicting the growth of ongoing
cascades from Facebook [1]? I'm fixing to see if their findings apply to the
MusicLab data.

[1]
[https://research.facebook.com/publications/680551081983090/c...](https://research.facebook.com/publications/680551081983090/can-
cascades-be-predicted-/)

~~~
marketforlemmas
Thanks for the feedback (and the Watts' reference). Building a plugin is
definitely an interesting idea; I hadn't thought of that before. I guess the
problems would be three-fold. First, I don't know how to do that :-). Second,
there are probably a bunch of ethical concerns/IRB issues that would stand in
the way of academic publishing (but thats not huge). Third, and the only
fundamental issue, is that the self-selection into using that plug-in would
bias the estimates of intrinsic quality. Still its a pretty good idea but I'm
currently trying to get access to more fine-grained data in other ways, so
we'll see.

In terms of creating your own site, a few researchers [1,2] have already done
this and have some interesting work. But even with that, you still have the
problem of accounting for position bias within the site (like HN doesn't
really know if you skimmed the title of an article and decided to ignore it,
or never read the title at all). But the experimental power you get with that
is pretty cool.

And I have totally read that Facebook cascades paper and have more than a few
thoughts about it. In fact, I have adapted their prediction-style results to
the MusicLab data and you get really strong predictive accuracy (like 90-95%
in terms of predicting whether a song will eventually be above the median
popularity). However the accuracy you would achieve on Reddit or Hacker News
data is considerably lower. I didn't really include those results in my paper
because I'm not sure how they fit yet.

If I had one critique (which is not really a critique but a comment) is that
the Facebook study doesn't really contradict Watts' point that popularity is
hard to predict. The Facebook study shows that if you can observe the "initial
conditions", then you can predict eventual outcomes pretty well but that's
directly in line with the rich-get-richer effect that Watts et el demonstrate.
To put it pithily, its easy to predict who gets richer if you observe who is
rich.

Anyway, I could geek out about this for a long time but feel free to drop me
an email at stoddardg [at] gmail.com if you're interested in chatting some
more.

[1]: [http://arxiv.org/pdf/1410.6744.pdf](http://arxiv.org/pdf/1410.6744.pdf)
[2]:
[http://journals.plos.org/plosone/article?id=10.1371/journal....](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098914)

------
DanBC
(This post sounds a bit too negative but it's not meant to be. Sorry for my
poor communication style. Thanks for the informative post.)

> People can vote stories up or down (posts get +1 for an upvote, -1 for a
> downvote)

People can't downvote submissions. They can flag submissions and they can flag
as well as downvote comments.

> It’s not entirely clear how karma is assigned but it’s safe to assume this
> is a measure of status.

Each point of karma is a single upvote on a submission or a comment. My high
karma is purely a result of very frequent commenting, and has nothing to do
with status. (Other people's high karma combined with high average karma is
probably an indicator that people respect them. Some people have a weirdly low
average - I don't understand how ColinWright only has an average of 1.7 or
rayiner only has an average of 2.7 for examples).

A downvoted comment will reduce your karma by one point for each downvote.
Flagged comments don't reduce your karma but may have other effects. A while
ago I had a comment that got 50 downvotes - all of those were taken off my
karma total. There was a problem where a controversial post might be
flagkilled - limiting the total loss to whatever downvotes that post gets
before it's flagkilled; or a controversial post just gets heavily downvoted
but not flagkilled which means unlimited downvotes for the time that downvotes
are available on that post. (Not sure if this has been changed yet, or if it's
by design.)

> Karma isn’t meaningless. While plenty of users with low karma submitted
> posts that ascended to the front page, it looks like posts submitted by
> influencial users get to the front page more quickly (though the correlation
> is pretty weak).

Very few of my submitted articles make the front page. I'm tempted to say
something about correlation and causation here - people who submit articles
that make the front page get lots of karma, rather than users with lots of
karma get front pages more easily.

About points per minute: There are some anti-gaming algorithms that will
penalise a submission if it gets too many votes too quickly. I have no idea
what the correct rate is.

There is also, I think, some algorithm that will detect many comments by new
accounts. Announcing a post on social media is a mistake that some people
make. New accounts then visit and upvote the submission, which demotes rather
than promotes it.

------
sbashyal
The effect "High karma can give you a little boost" is not due to HN karma
itself. Many contributors to HN already have a reputation for good content and
thus people are more likely to read/vote new submissions from them. So the
boost is earned by the contributors by having impressed HN audience, not
granted by HN solely based on past karma.

~~~
captn3m0
Related:
[https://en.wikipedia.org/wiki/Matthew_effect](https://en.wikipedia.org/wiki/Matthew_effect)

We've been studying this in a Sociology class, and this seems like a good
example.

------
pdh
[http://www.righto.com/2013/11/how-hacker-news-ranking-
really...](http://www.righto.com/2013/11/how-hacker-news-ranking-really-
works.html)

~~~
jonathonmorgan
Oh right, so it really is a calculation of points vs age, according to the
formula, plus penalties -- which my analysis didn't catch. Thanks for the
background.

------
Jonathan5
This is a nice analysis, although the causes of a high points per minute score
could presumably be related to whether it gained an early vote / comment from
a user with high karma (who is effectively giving it a stamp of approval).

This only had two points on 20 minutes when I voted for it though, so by their
own logic is unlikely to make it to the FP!

~~~
jonathonmorgan
Hah, good point! Much to my chagrin, the theory I proposed is challenged by my
own post. It looks like someone with a lot of karma can boost a post to the
front page (there are some outliers in the analysis). Maybe we can get
individual votes exposed via the API to know for sure.

------
benologist
This post is interesting but it falls short, ultimately if you can get a few
different people to upvote your submissions you can exploit HN. You can see
we're being marketed at, often, by startups that are routinely on the front
page with content that is irrelevant to their own customers.

------
j2kun
Interesting post. I would like to see those plots on a logarithmic scale.

