Hacker News new | past | comments | ask | show | jobs | submit login
Popularity Dynamics and Intrinsic Quality in Reddit and Hacker News (2015) [pdf] (semanticscholar.org)
62 points by usgroup 4 months ago | hide | past | web | favorite | 12 comments

Reddit received approximately 450 million page views in December 2014, while Hacker News received approximately 3.25 million.

I just looked, and HN had well over 60M page views that month. The Reddit number is likely way too small as well.

Yeah. Both numbers seem very low. Are they confusing monthly visitors with page views?

Pretty sure on DEC 2014, the monthly page view for reddit would be near a billion and HN would be in the tens of millions.

I'd be really interested in more HN stats. Could you guys maybe publish an article about them?

I'm curious to know where they got those numbers; I couldn't find a citation for this in the paper.

This was an extremely interesting paper to me, about a topic that I see as economically and sociologically fundamental.

I was actually impressed by the methods they used. I found myself thinking "this is what I'd really like to see," and then they'd report it. Validating their method on the MusicLab data seemed critical to me, as did examining reddit resubmissions versus YouTube views.

Although I thought methodologically it was almost as well done as it could have been outside of an experiment, I disagreed with the author's conclusions. They acknowledge some of the problems, such as the problem of the huge number of forgotten posts they didn't model at all, but other issues they don't.

For example, it seems the question of most interest is, given an observed post score, what's the actual "quality"? If you look at, say, Figure 3, it's apparent that there's huge variability in quality conditional on score, as observed score increases.

I think the correlational-style relationship they focus on obscures things like this that are critical to interpreting the findings. Yes, there's a strong estimated relationship between quality and score, if you ignore all the missing data that constitutes the bulk of submissions, and the fact that the relationship is being driven very strongly by a large quantity of very low-"quality" posts versus everything else, and the variability everywhere else. It's an odd, heteroscedastic, nonlinear relationship that isn't well-captured by a correlation, even a nonparametric one.

I also would have liked to see examination of variability in links across sites. How much variability is there in rank of an initial link, to the same material, across reddit, HN, Twitter, etc.? Maybe tellingly, the authors report the relationship between YouTube views and number of reddit submissions, but not the relationship (if I'm reading correctly) between YouTube views and rank of initial reddit submissions, which is kind of the key relationship.

So, liked the paper but if anything it just reconfirms the conclusions of earlier studies to me, that social network dynamics has a big influence on apparent popularity.

From the abstract:

"We define quality as the number of votes an article would have received if each article was shown, in a bias-free way, to an equal number of users."

I haven't yet the whole paper yet - but isn't that ignoring other major factors like how "newsworthy" a particular link is? A low quality link might get a lot of upvotes simply because it was the first link submitted that describes an inherently interesting event.

“In a bias-free way” needs careful definition. That ought to include the order in which stories are shown (so it would eliminate any advantage of being the first link posted relating to a specific event).

"Intrinsic quality" is a terrible name - it should be something like "decontextualised quality" or "neutrally presented quality", because it's still an aggregate subjective view on the quality of the article.

Why use a word like "quality" at all here.

It seems unnecessary. They should've just used "estimated votes" , since that is what they are, or something derived from votes.

Quality is almost content-free and worst case is chosen in bad faith or hubris to make the result seem more important

aka tragedy of the commons

I don't think this is similar to tragedy of the commons at all.


I would call this more of a stag hunt: https://en.wikipedia.org/wiki/Stag_hunt There's a tension between spending your time helping vote on /newest to get stuff on the main page where they are then accurately ranked, and slightly tweaking the ranking on the main page while enjoying the overall fruits.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact