
Deriving the Reddit Formula - jmduke
http://www.evanmiller.org/deriving-the-reddit-formula.html
======
raldi
Note: The author of this piece wrote the original analysis
([http://www.evanmiller.org/how-not-to-sort-by-average-
rating....](http://www.evanmiller.org/how-not-to-sort-by-average-rating.html))
that directly led to my advocating for reddit's "best" comment sort, including
to xkcd's Randall Munroe, who managed to finally convince the rest of Team
Reddit to make the change ([http://www.redditblog.com/2009/10/reddits-new-
comment-sortin...](http://www.redditblog.com/2009/10/reddits-new-comment-
sorting-system.html)).

So when the author says, "I realize that proposing any change to how Reddit
works is one of the Internet's most dangerous games," he should be informed
that it's actually a game he's won before.

~~~
Fando
Wow, I didn't know that. I was going to say that a long time ago I randomly
read Evan's article about 'how not to sort by average rating'. Good work.

------
jsnell
Do the conclusions change if we assume that the goal is not to optimize for
value to the user, but for value to the company? Specifically, it seems that
using the vote difference rather than the vote ratio would help controversial
stories rank high. Controversial stories in turn are great at driving lots of
discussion, which means lots of visits and revisits to the comment section.

(Compare to the apparent HN policy of discouraging controversy, both by the
large effect of flagging and through the flamewar detector).

~~~
tedunangst
Possibly short term, but if you build a site around the kinds of people who
like flame wars, your advertisers may notice they're getting very little
traction from your site. Page views go up, but uniques go down.

~~~
gohrt
Have you met Reddit? They love flamewars.

~~~
nostrademons
Yishan posted a couple hours ago that Reddit's business model is built upon
flamewars. Basically get people heated up and then they gild the comments they
agree with.

[https://np.reddit.com/r/announcements/comments/3dautm/conten...](https://np.reddit.com/r/announcements/comments/3dautm/content_policy_update_ama_thursday_july_16th_1pm/ct3ryvn)

...come to think of it, this explains a lot of the Reddit drama. The Reddit
Gold was flying left and right during the black-out (despite pleas not to give
any money to the site), and then every time an admin, CEO, former CEO, or
board member pours gas on the fire, their revenue goes up. It's brilliant!
They've figured out how to make money off of hurt feelings and angry people,
and now have an incentive to cause as much drama as possible.

~~~
RickHull
Ha, this is great, in a not-so-great but kinda-great way!

------
tim333
>This resolves one of the original mysteries — why the current time doesn't
appear in Reddit's formula.

The current time did appear in the very early Reddit's formula but they
figured it was computationally much faster to inflate the scores for new
stories than to have to go through the database reducing the scores for loads
of old stories.

~~~
keysersosa
Confirmed, but the time dependent model didn't ask for long. I seem to recall
switching to the current model (with slightly different constants) during the
switch to python in late '05.

That said, one of the early "mistakes" we made was not noticing the obvious
step of taking the log of on the exponential form formula, which ended up
meaning we had only a couple hundred days before we started to run afoul of
the max float size in postgresql. I also seem to recall coming up with that
obvious trick with only a few days to spare.

~~~
jedberg
I'm really glad you're here, because I was about to comment that you're the
only person I know who understands this stuff.

(for those that don't know keysersosa was the guy who wrote the current hot
algorithm)

------
studentrob
Cool. But why is an outside researcher writing about this? Where is the
research from within Reddit? They have as unique a dataset as Facebook, google
or yahoo. They should do something cool with it and share it, like this guy

------
nicboobees
Don't forget the group of users who can move stories around adding or removing
votes from them at will. I'm sure there's been a _lot_ of "Put this on the
front page please" favors done for friends.

------
Jack000
I think this type of formula assumes a uniform list of links? Whereas I'd
expect a discontinuity at the end of the front page. I wonder what it would
look like if it took that into account, maybe the q term could incorporate the
page number somehow.

------
helmett
The biggest problem is that reddit's voting system is still pretty easy to
game

------
brajesh
What algo does HN use?

~~~
tim333
2013 version:

[http://www.righto.com/2013/11/how-hacker-news-ranking-
really...](http://www.righto.com/2013/11/how-hacker-news-ranking-really-
works.html)

