
The Reddit Front Page Is Not a Meritocracy - lil_tee
http://toddwschneider.com/posts/the-reddit-front-page-is-not-a-meritocracy/
======
jedberg
This post is interesting, but he should have just read the code:

[https://github.com/reddit/reddit/blob/master/r2/r2/lib/norma...](https://github.com/reddit/reddit/blob/master/r2/r2/lib/normalized_hot.py)

This is an effect of the normalization algorithm, which biases towards trying
to have a post from as may different subreddits as possible on the front page.

Edit: At least we're consistent! Ketralnis said the same thing about an hour
ago. :)
[https://news.ycombinator.com/item?id=8568021](https://news.ycombinator.com/item?id=8568021)

~~~
nichochar
I don't think reading the code replaces the post. A good example is that even
when you implement an algorithm you monitor it, measure the results and tweak
it.

Code is the theory (bottom to top) and his research (which is great) is a top
to bottom analysis

~~~
jedberg
I wasn't discounting his research (apologies if it sounded that way). I was
simply pointing out that at the end he made it sound like this behavior was
unexplained, when in fact that code explains it quite well.

------
phillmv
This is a gorgeous post and I shiver at the thought of how much work went into
this! Thoughtful, detailed and chock full of great visualizations of the data.
I'd be interested in a similar analysis of HN, as I'm interested in the
editorial intent thus revealed.

On tangential note, though, I was a bit surprised by the hypothesis.
'Meritocracy' to begin with is a dubious fiction, but especially so when
mapped on to the vote distribution of a given post in a semi-decentralized
human-moderated environment that is constantly being gamed. "Merit" just seems
like a big value judgement over such a noisy channel.

~~~
minimaxir
A note about the graphics: all the charts were made with R/ggplot2. Not only
ggplot2, but _basic_ ggplot2: the charts in the blog post can be made with
about 5 lines of code _maximum_ each.

You can customize ggplot2 for even more beautiful charts, such as the ones
used in my analysis of HN comments: [http://minimaxir.com/2014/10/hn-comments-
about-comments/](http://minimaxir.com/2014/10/hn-comments-about-comments/)

Note that reproducing this analysis for the HN front page is harder since
there is a lot that happens behind the scenes that cannot be generalized.
Here's an attempt at performing such an analysis (note dang's comments):
[https://news.ycombinator.com/item?id=8533757](https://news.ycombinator.com/item?id=8533757)

~~~
12423gsd
I really enjoy using ggplot2, but the basic "theme" is terrible.

\- They tried too hard to do a Tuft* - but it's actually way too flashy.

\- Basically the issue is that it's not generic enough. Whenever you see a
ggplot2 plot it's always screaming at you "I WAS MADE IN ggplot2!!" Just
KISS...

\- The grey background is completely unusable if you want to print out your
charts!

\- The default pastel colors (ie. PowerPoint 2012) are always a disaster for
readability. In meetings people constantly can't tell them apart and they are
definitely not colorblind friendly.

It' a shame b/c each script I write ends up having to have an obnoxiously long
theme declaration to make it look "normal"

* I think most people miss the Tuft's point. He presents a way about thinking about data presentation. Instead people just look at it and think "oooo! That looks pretty. Let me copy what he did". The guy has his own personal style - it's definitely nice, but the point isn't to copy it.

EDIT: For those thinking about learning ggplot2.. maybe wait off for a bit. It
seems like it'll be deprecated soon and replaces with ggvis.

~~~
minimaxir
NB: Here is a paper the derivation of the ggplot styles:
[http://vita.had.co.nz/papers/layered-
grammar.pdf](http://vita.had.co.nz/papers/layered-grammar.pdf)

I would recommend using theme_bw(), which helps solve some of the problem with
the gray background.

EDIT: This is ggvis: [http://blog.rstudio.org/2014/06/23/introducing-
ggvis/](http://blog.rstudio.org/2014/06/23/introducing-ggvis/)

This is news to me, so I'll look into it to see how it differs from ggplot2.

EDIT 2: It seems more like that the difference is that ggvis is more for
interactive charts, but then it requires a dependency on Shiny, which is not
optimal for blog posts.

~~~
12423gsd
Thanks for linking the paper! I'll definitely read over that. I've heard it
mentioned before ...

I'm definitely not an R guru or "in" on the latest news, but it seems that
Hadley Wickham (who probably single handedly is the reason R is still
relevant) now works for the RStudio guys and he's reworking his tools. plyr is
now dplyr and gpplot2 is now ggvis. And there also another tool called tinyr.
My understanding is that they're still in development, but they'll ultimately
provide an "integrated" ecosystem for processing data.frames (with hooks into
the RStudio IDE)

He talks about it at the beginning of this video
[https://www.youtube.com/watch?v=wki0BqlztCo](https://www.youtube.com/watch?v=wki0BqlztCo)

------
selmnoo
One thing I'm interested in knowing is whether Reddit manipulates the votes on
its own submissions. See, for example, r/blog:
[http://www.reddit.com/r/blog/new/](http://www.reddit.com/r/blog/new/)

 _Every_ submission made there went to the frontpage. What are the chances? I
mean, it's hard to believe that mainstream Redditors at this point are so
interested in Reddit news that they keep sending every Reddit announcement to
the top. What's more likely, to me at least, is that Reddit is either
leveraging its knowledge of how things work to get to the top in a surefire
way, or it's plain messing with the upvote numbers. I would love to see data
on these submissions, if possible. Of course, there's good reason for them to
actually do this -- they need to advertise the Reddit marketplace stuff and so
on, it's how they make money.

~~~
ketralnis
I've been out of the loop for a few years, but no the votes aren't
manipulated.

But the normalisation algorithm
[https://github.com/reddit/reddit/blob/master/r2/r2/lib/norma...](https://github.com/reddit/reddit/blob/master/r2/r2/lib/normalized_hot.py)
does prefer to get one at least of each subreddit. This means that if /r/blog
has a post it will probably be placed on the front page immediately. That
action gets it a lot of votes just because it's very visible.

~~~
raldi
_> I've been out of the loop for a few years, but no the votes aren't
manipulated._

AHA! Let's just cancel out the double negative ("no" and "n't") and look what
we uncover, in this former reddit admin's very own words: "the votes are
manipulated"

~~~
uberdog
That's not how english works. That's not a double negative.

~~~
raldi
Now let's cancel out the double negative ("not", and "not") in what you said:

 _> That's how english works. That's a double negative._

Exactly!

~~~
thezilch
Or...

 _> That's a double positive._

Indeed!

------
mason240
>Much to my surprise, I found out that reddit's front pages are not a pure
"meritocracy" based on votes, but that rankings depend heavily on subreddits.

Is this really a surprise? If just went by upvotes alone, a sub with 1M
subscribers will always dominate a sub with 500k. You need to factor in the
that context.

~~~
dazmax
That's what I thought at first too, but the data makes it clear that it isn't
just based on votes weighted by subscriber count either.

~~~
ketralnis
It's not votes weighted by subscriber count at all. It's "hotness" weighted by
the hotness of the highest-hotness link per subreddit

Hotness:
[https://github.com/reddit/reddit/blob/master/r2/r2/lib/db/_s...](https://github.com/reddit/reddit/blob/master/r2/r2/lib/db/_sorts.pyx#L45-L56)

Front-page weighting:
[https://github.com/reddit/reddit/blob/master/r2/r2/lib/norma...](https://github.com/reddit/reddit/blob/master/r2/r2/lib/normalized_hot.py)

------
fenomas
IANAQ* , but could not the effects shown in this article occur from content-
neutral rules, combined with some clustering in the popularity of various
subreddits?

For example, assume there is a rule that a given subreddit can have no more
than N posts in the top 50 at a given time. It seems like this alone would
explain the clustering shown in the article. Super-popular subreddits like
/r/funny would rarely have posts on page 2, simply because they usually
already have N posts on page 1. Thus they drop off sharply in likelihood to
appear in the 40s, then shoot back up after #50 when the limiting stops.

Meanwhile clusters 2 and 3 appear to be the subreddits which rarely and often
(respectively) reach the top 50, but only due to the limiting rule. Cluster 2
is the least popular in the unlimited spots past #50, so it makes sense that
it usually reaches the lowest of the limited spots, while cluster 3
(apparently medium in overall popularity) takes the middle region.

Naturally I'm just squinting at it, but it looks like the article's findings
could easily occur without Reddit treating some subreddits differently from
others (as I take the author to imply it might, given the title). Am I missing
something?

* I am not a quant :P

~~~
jagger27
For those curious,

quant |kwänt| noun informal _a quantitative analyst_.

ORIGIN 1970s: abbreviation.

------
icesoldier
I'd like to see this kind of analysis done on /r/all, since it seems to more
closely operate like the author anticipated. The default front page is meant
to weight the subreddit like they discovered, but IIRC /r/all is strictly
based on score and time, as if everything was submitted under the same
subreddit.

------
mrfusion
Dumb question but what is the reddit front page? Isn't it customized for
everyone depending on what you're subscribed to? Or are there a ton of users
that never log in?

It seems like you wouldn't get much value out of reddit if you just view the
front page without logging in?

~~~
lbotos
Reddit has a default frontpage for people who don't have accounts. I lurked
for years and was just fine without actually creating an account. :)

~~~
mrfusion
Isn't the default page mostly cat pictures and jokes? I tried viewing it
logged out one time and I couldn't figure out who would like that?

~~~
oftenwrong
For many people, reddit is merely a source of internet junk food. They go
there for cheap, quick hits of mindless amusement.

See also: buzzfeed, 9gag, imgur, funnyjunk, etc.

~~~
visarga
I wish so hard I could filter out the junk posts. Just give me something to
read, a good article, a debate.

~~~
egypturnash
Log in, unsubscribe from all the default front-page subreddits, and put
together your own mix based on stuff you're interested in. The Reddit I read
by doing this is a pretty genteel and sensible place.

------
vuldin
This is an interesting study into how Reddit works, but I have to say that I'm
fine with the fact that Reddit is neither a meritocracy or democracy when it
comes to how posts make it to the front page. I'd rather not see any more
funny or aww posts on the front page than are already there (I can go to their
subreddits directly when I want to).

In fact, I'd rather see a variety of posts from subreddits I don't usually go
to or usually follow. I want to see thing I don't already follow. Most posts I
end up liking are ones I find in the subreddits I visit. I'd like my front
page to give me posts from other unsubscribed subreddits so that I may end up
expanding which subreddits I'm subscribed to.

~~~
tomphoolery
> In fact, I'd rather see a variety of posts from subreddits I don't usually
> go to or usually follow.

There is a little line of text on the top of the reddit front page called
"trending subreddits"...I've found a bunch of cool stuff that way
[http://imgur.com/okDu5AN](http://imgur.com/okDu5AN)

But of course, the best way to find new subreddits is to read the first 5 or
so comments on a weird gif or picture. Someone will link to either subreddit
that it came from, or a subreddit where it could have been posted.

~~~
Kalium
I'm pretty sure that those are human-selected rather than algorithmically
selected.

------
minimaxir
Relatedly, the Hacker News front page is not a meritocracy either. That's why
counterbalance tools like flagging exist. Additionally, dang is implementing
algorithms to identify articles that slip through:
[https://news.ycombinator.com/item?id=8157698](https://news.ycombinator.com/item?id=8157698)

EDIT: I misread the article's argument; there's a lot of luck on HN, but I
cannot confirm there's a _systemic bias_ toward people/topic.

------
devindotcom
One thing that immediately jumped to my head when I saw the falloff towards 50
and then the jump right after 50 was simply visual prominence. I would guess
that people naturally pay more attention to posts that appear at the top of
the page, whether it's page 1, 2, or 3. There's also a blip of attention at
the very bottom, since that post is also visually and conceptually distinct -
everyone looks directly at it because it's the last one, so it gets more
eyeballs and potentially quite a few people giving it a charity vote to "save"
it from dropping to the next page. Meanwhile people scan more quickly over
posts in the middle, perhaps (as I do) merely skimming over a couple words and
the score, to see whether it merits moving my eyes over the whole line
(because my time is _that_ valuable).

------
josefresco
Many years ago I had an article on my personal blog about web design climb
it's way to the front page of Digg (I'm dating myself here) and Reddit.
Despite my server buckling rather quickly, I still saw at least ~250K visits
over a couple days (Digg then was slightly more popular) and to this day (5+
years later) that one first page post continues to deliver tens of thousands
of visits a year thanks to a rather healthy Google rank and a huge network of
incoming links from related tech blogs.

Reddit exposes you to a huge audience, who then in turn comment, link to, and
debate your post. Reddit's memory is short however, and within hours your post
will be gone. However the reverberating effects benefit you in many ways and
almost guarantee traffic for years if the topic is "evergreen".

------
eplanit
For both Reddit and HN, articles/posts live or die based on popularity, not
meritocracy nor karma.

~~~
minimaxir
A high karma is a causal effect for popularity, however. (on HN, an article at
#1 will receive _much_ more activity than an article at #30.)

------
ejz
Very cool. Although, maybe another possible explanation is that people get
"tired" at reading a whole page and skip to the next page, resulting in a top
of the page bias.

Also, love the use of R. R is beautiful.

~~~
xiongchiamiov
R is awful. It's full of decisions made by people who seem to not have much
programming experience, in that they seem good at the time but cause major
issues later on. See, for instance,
[http://www.talyarkoni.org/blog/2012/06/08/r-the-master-
troll...](http://www.talyarkoni.org/blog/2012/06/08/r-the-master-troll-of-
statistical-languages/) , [http://blog.revolutionanalytics.com/2008/12/use-
equals-or-ar...](http://blog.revolutionanalytics.com/2008/12/use-equals-or-
arrow-for-assignment.html) , [http://shape-of-code.coding-
guidelines.com/2012/02/29/parsin...](http://shape-of-code.coding-
guidelines.com/2012/02/29/parsing-r-code-freedom-of-expression-is-not-always-
a-good-idea/) , and the necessity of [http://tim-smith.us/arrgh/](http://tim-
smith.us/arrgh/) (I wrote up some stuff about the *apply functions, but not
yet in a form suitable for the guide:
[https://github.com/tdsmith/aRrgh/issues/18](https://github.com/tdsmith/aRrgh/issues/18)
) .

------
Thoguth
Popularity isn't exactly a meritocracy either.

That is, while popularity is correlated with quality, the two are rarely
considered identical, and one is often a poor heuristic for the other.

------
Subi
Beautiful post very interesting data.

The balance trying to be achieved can most simply be described as known good
content vs. discovery. I wouldn't call it uneven it's more like; this is
interesting vs we might think you'll find this interesting but we're taking a
gamble because it has low visibility. I'm betting subreddits can move from
cluster to cluster over time as well fairly frequently. Maybe an interesting
thing to try to track over the next month or 2?

------
jelloPuddin
Just a though: It looks like things toward the bottom of a page drop off in
popularity. Perhaps users are more engaged when looking at the top of the
page, clicking all the links, and perhaps less engaged towards the bottom,
skipping remaining links and just going to the next page. I imagine this would
only be seen by users who use the actual page rather than RES.

------
johnasb
Reddit is internet power - I cant recall a day in the last 5 years when I
haven't checke reddit. When people ask me what websites do I read I can't
recall any other than reddit. So funny. And I grew up with slashdot. People
dont even know what slashdot is anymore. Reddit is the internet explorer
button for me.

~~~
mtbcoder
A while back ago I would have agreed with you (also having spent a lot of time
on Slashdot), but the quality of posts these days on the front pages keep from
returning on a regular basis. Yes, I know that with Reddit you need to
unsubscribe from the defaults and things like that and there are quality sub-
reddits out there. But from my perspective, the front page used to contain
enough good content that one could causally scan through and find something
interesting without the haggle of signing up, maintaining subscriptions,
searching for relevant content and things like that.

At the end of the day, I'm just there to browse and not jump through a bunch
of hoops just to filter out junk that only appeals to the under 18 crowd.

------
JabavuAdams
There's no such thing as a meritocracy, unless you include ability to game the
system in your measure of merit. Either that, or every system is a
meritocracy, and you're just not trying hard enough to win.

------
redthrowaway
They talked about this eons ago. Maybe 3, 4 years? It's an attempt to
highlight smaller subreddits and hopefully boost subreddit discovery.

------
AngrySkillzz
Nice job. I guess the practical takeaway is, if you don't want to look at
funny pictures and animal GIFs, start on page 2.

~~~
ketralnis
Or just unsubscribe from /r/funny and /r/gifs. The default front page is total
nonsense, you have to find reddit for yourself.

------
kzrdude
Reddit doesn't have just one frontpage anymore, the defaults vary depending on
country.

------
gfunk911
Sounds like there are 50 subs that get spots 1-50, sorted mainly by vote
count.

~~~
kevincox
Not quite, as you can see multiple posts from the same subs in the top 50 and
it changes. They aren't fixed "slots" but just appear to be weighted like
that.

------
nevergetenglish
"I found out that reddit's front pages are not a pure meritocracy based on
votes, but that rankings depend heavily on subreddits"

(meritocracy was between commas). Clearly votes are not what people think when
you speak about merits, but let it be.

------
johan_larson
Is _anything_ actually a meritocracy?

~~~
krapp
On the internet? No.

------
vinhboy
Good time for me to do my annual bitching about how much I hate HN comments.
So who made the most agreeable comment on this thread? I don't know...

------
ionwake
and again this why you should use
[http://www.sagebump.com/?info&view=technocrat](http://www.sagebump.com/?info&view=technocrat)
to manage your social aggregator news

------
_deh
Lovely piece of work.

------
Dewie
"meritocracy" is a pompous and politically charged word to use in this
context, but ok.

------
Siecje
On HN how is it that posts with 7 points make it the front page but others do
not?

~~~
minimaxir
Posts with 7 points that get those points quickly (within a half hour of being
posted) will hit the front page due to the time portion of the algorithm.

Whether it stays there is another story.

------
adrianlmm
I feel the same, about reddit, I only go to r/linux and r/android, I don't go
to r/news or r/technology anymore, let alone the front page, is all liberal
biaz, and I like impartial news.

~~~
Guvante
When you login you get to pick the subreddits that show up on the front page.
Getting rid of /r/politics was why I created a login and now I have a front
page with few "easy laughs" due to self selection (no /r/funny etc.)

------
freshflowers
No shit Sherlock. It's no secret that the front page is heavily weighted. It's
also subject to personalization, so basically no two redditors have the same
front page.

The default front page is just the landing page for newcomers to get a first
impression and a starting point for personalization.

An analysis of and comparison with /r/all would have been way more
interesting.

------
ThomPete
I left reddit years ago (and found HN) because although I agree with mostly
liberal views it became too much of an echo-chamber for liberal views the
frontpage is the extreme example of that.

There are many great sub-reddits of course but it's not a place for politicial
discussions (in fact I am still looking for a good place to have political
discussions)

Edit: Why does my personal experience get downvoted?

~~~
mrfusion
It's odd the Ron Paul was popular on Reddit for a while though. How do you
explain that?

~~~
skwirl
Ron Paul was "popular" on almost every website on the internet because his
fanatical supporters flooded the web with an endless barrage of Ron Paul
support. Reddit, newspaper comments, other discussion boards, polls, you name
it, they were there to tell the world about Ron Paul. They were a very loud
minority.

~~~
eli
That sounds like a borderline conspiracy theory. If Ron Paul's popularity on
Reddit around 2008 was a fascade, it was very convincing.

