

How Reddit ranking algorithms work - kopsai
http://amix.dk/blog/post/19588

======
kprobst
One of the problems with Reddit in my experience is the spam filter. What
happens is in the bigger subreddits (say, /r/pics or /r/funny) is that there
seem to be people around who downvote stuff regardless of content, presumably
to give their own posts a better chance. When your submissions consistently
end up with -1 or -2 scores, the spam filter then decides that you are a
spammer and you should be punished. Except in this case it's not for
submitting spam, but because other people want their own submissions to have
more visibility.

(As an aside, I obviously wouldn't be complaining about this if I had actually
tried to spam reddit, my few submissions to date match the typical fare there)

Once the spam filter 'hates' you (as they put it) then subsequent submissions
are 'ghosted' in the sense that you see them, but they don't appear in the
/new section of the subreddit. Assuming you get wind of this (no downvotes, no
comments, nothing) then you get to message the moderators, who may or may not
respond in good time and restore your submission. By that time your obvious
reaction is to delete and re-submit (because by now the post has drifted down
the submission sort order), which apparently makes the spam filter hate you
even more.

The problem I have with this is that by no fault of my own I ended up
submitting things that were flagged as spam and hidden from view without any
indication to me. You can see your posts, but no one else can. This might be a
clever solution to deal with systemic spammers, but it's really annoying to
normal users like me.

Karma games are fun and internet drama is lame of course; I don't lose sleep
over this. But it's unfair to people who want to share something with the
community. Of course you see reposts of lame memes that hit the front page and
wonder why the picture of your cute hedgehog submitted in good faith ends up
in the spam void.

There's supposed to be a user flag that tells the filter you're an approved
submitter for a given subreddit; good luck getting that, I guess. So
ultimately this makes me not want to submit anything. Right now reddit might
not care about that because it has such a large user base, but I would remind
them that Slashdot and Digg also thought they were invincible once.

~~~
ericd
Yeah, I think users should have a limited amount of downvote power to spread
around, and that each successive downvote diminishes the power of all their
previous downvotes in determining spamminess.

Ghosting really, really sucks for legitimate users. Craigslist does the same
thing, and normal people are devastated when they realize they've been
judged/victimized by a computer somewhere with no appeal.

~~~
landr0id
I like the Stack Overflow system of downvoting

    
    
      Downvotes remove 2 reputation from the post owner.
      Downvotes on answers remove 1 reputation from you, the voter.
      Downvotes on questions are free.
      You can vote 30 times per UTC day. You get an additional 10 votes on questions only.
    

On Reddit, people just tend to downvote when they don't agree with something.
That's really it. The only times that I've downvoted posts have been when it's
a repost from recent, and on comments I only downvote if it's something that's
hateful or doesn't contribute to discussion.

------
anthuswilliams
One thing I wish was different about the way Reddit's algo worked is the way
it treats downvotes. Basically, if I am average Joe redditor, there are two
potential reasons for a downvote:

1) The post is something I personally disagree with.

2) The post was written by a moron (for various definitions of the word
moron).

What I wish is that someone (smarter than myself) could come up with a way to
reliably determine the motive behind a downvote. I know that's no simple task.
But I actually want posts that are slightly more controversial to rank higher
than posts which don't generate a lot of downvotes, simply because I'm more
interested in reading something thought-provoking than checking out the latest
batch of meme-gen.

Basically I want comment rankings to look like this:

a) Lots of downvotes, few upvotes == total asshole. Maybe racist, maybe a
general troll, but I want this ranked at the rock bottom.

b) An equal number of upvotes vs. downvotes == flamebait. I'm not interested
in the latest holy wars, so we call this the next lowest in rankings.

c) Lots of upvotes, few downvotes == uninteresting. Rank it lower.

d) The ratio upvotes/downvotes is greater than 1, but less than some finite
number N. These are the stories I want to read.

I want a ranking system that works something like the above, albeit with more
finesse (e.g. perhaps we swap the ranking position of b and c). I'm just not
interested in the latest round of adorable kitten pictures, and on HN, too
much emphasis on the sheer number of upvotes manifests as groupthink. Also, I
want the downvotes that are received by slightly controversial
stories/comments (again, slightly controversial as opposed to PC-vs-Mac style
flamebait) to have a smaller effect on the poster's karma, so that people will
feel free to express disagreement on sites like reddit and HN.

~~~
dennisgorelik
Seeking controversial postings makes sense.

But why do you downvote controversial stuff then? Why not give upvote for
interesting postings and downvote for boring postings?

~~~
anthuswilliams
I, personally, don't downvote as a way to express disagreement. The average
redditor does. I wish it was reasonable to expect that the entire reddit
community uses downvotes only in a way I consider appropriate, but I don't
think it is.

~~~
dennisgorelik
I think every voter should get their own list of sorted stories and comments.
That list should form based on voter's upvotes and downvotes. Then readers
would quickly stop seeing postings they are likely to downvote and see more
postings that they are likely to upvote. Pretty soon people who express their
interest in controversy - would get controversial stories on top of their list
and people who express their interest in groupthink - would get lots of
stories re-confirming their groupthink.

------
amix
Did not expect this post from 2010 to hit HN... Happy to see that my content
is still useful :)

To add something new to the discussion I think HN should use Reddit's comment
sorting algorithm (the confidence sort). It produces way better results than
the current algorithm since it will rank the best comments highest regardless
of their submission time. Would probably produces an even better comment
section here on HN.

~~~
apu
Can't find a reference for this at the moment, but I think PG has since
modified the HN ranking code a bit to put interesting stuff at the top. (This
is since his last arc dump, so I don't know exactly what it is.)

Also, I think the situation on HN is quite different from reddit because vote
totals aren't shown here anymore. It's not clear that you necessarily want to
use the same ranking system as on reddit, where current vote counts affect how
people vote in the future.

------
sniW
Note that this article is from 2010. The most recent code for the Reddit
ranking algorithms is here:

[https://github.com/reddit/reddit/blob/master/r2/r2/lib/db/_s...](https://github.com/reddit/reddit/blob/master/r2/r2/lib/db/_sorts.pyx)

It looks like they've tweaked the confidence sort algorithm a bit.

------
buddydvd
I wrote a simple jsfiddle page that show you the current stories in /r/all and
their associated ratings as calculated by the hot algorithm:

<http://jsfiddle.net/6FVd3/embedded/result/> Source:
<http://jsfiddle.net/6FVd3/>

One thing I noticed is that the stories aren't sorted by the algorithmic score
in strict descending order. I recall some mentioned that reddit mangles the
ups/downs score exposed via their API. Is this the reason?

------
sirclueless

        This has a big impact for stories that get a lot of upvotes and
        downvotes (e.g. controversial stories) as they will get a lower
        ranking than stories that just get upvotes. This could explain why
        kittens (and other non-controversial stories) rank so high :)
    

That actually seems backwards to me. A controversial story (that is, one that
triggers a lot of votes both negative and positive) would do just fine so long
as it continues to receive more upvotes than downvotes.

That's not to say that downvotes don't put a damper on things -- if a story
gets 80% upvotes and 20% downvotes it will get 75% the score it would without
downvotes, which on a log scale is 87.5% as high a ranking.

But the point is that things are still roughly linear -- you only need to stay
positive (> 50% upvotes) to accrue score, and you will do so linearly. Maybe I
am underestimating how competitive scores are, and how important upvote rate
is. Maybe a small linear damper on upvote rate has a large effect relative to
other content? But it definitely seems like this system encourages
interesting, controversial content over bland, well-received content.

------
taylorbuley
Here's a SQL implementation:
[https://github.com/reddit/reddit/blob/master/sql/functions.s...](https://github.com/reddit/reddit/blob/master/sql/functions.sql)

Interesting here to also see their "controversy" algo

------
Geee
I've implemented this in one project with the addition that allows entries to
'keep afloat', or resurface if they are suddenly voted a lot. So, the behavior
is more like what you would see in 'Popular now' list on Youtube, without
having to compute running averages.

The way I did it was to update the post time with fractional of the difference
of the old timestamp and current time, i.e:

    
    
        post_time += (current_time - post_time)/10
    

That way, if the post is really old, voting gives a decent boost to it, but if
it's more recent, it has negligible effect.

------
untog
Fascinating to see how HN and Reddit differ. I wonder if anyone can provide an
analysis on which one is more appropriate for different scenarios, and why. I
certainly have no clue.

~~~
redthrowaway
Randall does a pretty good rundown of why the old comment ranking algorithm
(which is similar to HN's) was bad here:

[http://blog.reddit.com/2009/10/reddits-new-comment-
sorting-s...](http://blog.reddit.com/2009/10/reddits-new-comment-sorting-
system.html)

I wonder if pg would consider implementing the confidence sort on HN. It's
certainly been effective on reddit.

------
creamyhorror
I always like applying statistical theory to things like estimating true
average ratings (Bayesian weighted), and logarithms to scores that increase
exponentially. Applying suitable functions to transform values is just a great
way to fine-tune the measures we use in our sites and give exactly the sort of
behavior we want.

Thanks for a great link.

------
bryogenic
One thing that wasn't mentioned was that high scoring submissions are
artificially deflated in their viewed score so that the highest scores can get
are ~3000 votes. This keeps the massive site seeming less gargantuan and makes
your vote feel like it counts.

------
aprescott
I think I might be missing something about the effect of submission time as
described. t_s is a fixed value, and the post says, "The score won't decrease
as time goes by, but newer stories will get a higher score than older." So why
does the graph[1] show a decreasing scoring as "submission time ago" increases
with up- and down-votes fixed? t_s is fixed!

Maybe it's been too long a day. Can anyone explain?

[1]: <http://amix.dk/uploads/reddit_score_time.png>

~~~
joshuahedlund
The only thing I can think is that it means newer stories continuously start
out with higher scores. So a top story last week might have scored 3000, an
equivalently top story this week scores 3100, next week's will score 3200, and
on to infinity.

Not sure if that's what it actually means or not, though.

~~~
aprescott
I'm an idiot, I guess. I actually understood all along. I recognised that the
initial score for new submissions would be monotonically increasing, but the
graph's x axis was throwing me, by making it seem as though a single story
over time had its score dropping, but that's not it. If it were flipped
horizontally, I think that would be more intuitive, maybe, since it might
capture "these later (further-to-the-right) bars are newer stories" instead of
stepping back. Or maybe not, and I'm just an idiot. :)

~~~
0x09
It's right here in the code, with this odd fixed timestamp: seconds =
epoch_seconds(date) - 1134028003 Which happens to be December 8 2005 -- a few
months after reddit began and probably the date this scoring system went into
effect.

------
clux
I wrote a small npm module containing this algorithm as well as a few other
famous ones a while ago for anyone interested in a quick JS implementation.
<https://github.com/clux/decay>

------
AceJohnny2
Ranking in sites like Slashdot, Reddit, and HN is a difficult and fascinating
problem. I wasn't aware of the difference between Reddit's ranking methods,
nor of Munroe's involvement. Thanks for the input!

~~~
NameNickHN
If you mean difficult in the sense that it probably took some time to find the
right algorithm, than you're probably right. But the algorithm itself is
rather simple. I have looked at the reddit ranking algorithm and duplicated it
for a project of mine. It took some iterations but the result is rather
simple.

------
twfarland
To understand this, I translated it into Racket:
<https://gist.github.com/2214978>

------
SpiderX
It is interesting, but I don't think as implemented, the Wilson system is the
best. If an article has over 200 comments, you'll (by default) only see the
"best" 200 (max 500), so those will be more likely to receive the most votes.
It's their default paging that is the problem.

