

Reddit implements "best" comment sorting based on score confidence intervals. - zck
http://blog.reddit.com/2009/10/reddits-new-comment-sorting-system.html

======
zck
For more technical information, look at the page Randall linked to:
[http://www.evanmiller.org/how-not-to-sort-by-average-
rating....](http://www.evanmiller.org/how-not-to-sort-by-average-rating.html)

------
hxa7241
This depends rather on what the purpose of scoring is. If it is to filter
one's reading, there is an inherent contradiction:

* the reader wants some filtering, so they can just read the good stuff

* the filtering is done by the readers, which requires they read more than just the good stuff

How effective can this ever be? It seems a weakness in all public-
contribution-based systems (including Google search to some extent, because of
page-rank . . .).

 _(I posted this thought before somewhere, but I don't know if it wasn't
thought good, or that no-one much read it.)_

~~~
diN0bot
furthermore, it means instead of a personal filter you get a mob filter. kind
of sucks.

at least reddit has categories.

------
antirez
I like this trivial way to compute scores for similar applications:

    
    
        score = upvotes*(upvotes/(upvotes+downvotes))
    

Edit: some example output follows.

    
    
        10 up, 10 down = 5.0
        50 up, 50 down = 25.0
        100 up, 0 down = 100.0
        70 up, 30 down = 49.0
        2 up, 1 down = 1.3333
        10 up, 20 down = 3.3333
    

This removes some bias due to the time, but does not remove from the game the
idea that a lot of votes are still an hint of interest. Ah, and the math is
trivial ;)

Note that this is just the vanilla formula, you can simply alter the weight of
the different parameters changing a bit the math.

Useless to say that this is only the first step to get the actual sorting. You
may not want to order by score, but by rank, where rank is something like that
(if the items you are sorting must be fresh):

    
    
        rank = score / time^alpha
    

alpha is the "obsolescence factor".

~~~
zck
I'm not sure if this is actually better.

    
    
        up   down  score
        100   100   50
         20     0   20
        100   200   33
        100   300   25
        100   400   20
    

Wouldn't you want a comment with 20 upvotes and no downvotes to rank above a
comment with 100 upvotes and 300 downvotes? It gets even worse for higher
numbers:

    
    
        up   down  score
        200  1800   20
        300  4200   20
        400  7600   20

~~~
antirez
You can change this factor just modifying a bit the equation. In the vanilla
version the amount of votes play a very important role indeed, but it's simple
to hack it in order for avoid this problem. For instance using the logarithm
instead of do everything linear like I did.

------
ryandvm
You know what would be another good sorting comparator? Conversationality.

A comment that sparked a lot of conversation (has a lot of children) should be
a good candidate for floating towards the top as well.

~~~
csbrooks
Not sure about that. This would tend to promote flame wars, and (on reddit)
pun threads.

------
johnwatson11218
I hope this can lead to more long lasting discussions. Sometimes I find a
thread late and want to add something but it seems pointless when the last
comments were added more than a day ago. I think this site could be better if
there was a way to keep having the threads active and maybe summarized and
condensed every so often.

------
fleaflicker
was this implemented on hn as well? i noticed a change a few weeks ago.

~~~
antirez
I'm not sure but by the pattern I see by naked eye I guess it's something like
this:

rank = (upvote-downvote)/some_time_dependent_stuff

