
What Percent of the Top-Voted Comments in Reddit Threads Were Also First Comment? - minimaxir
http://minimaxir.com/2016/11/first-comment/
======
Houshalter
There is a thing in decision theory called the exploration/exploitation
tradeoff. Exploitation is always choosing the option that is estimated to be
best. Exploration is choosing other options to see if they might be even
better. In this case showing new comments is exploration.

Hacker news has a nice solution to this problem. New comments appear at the
top of the thread, and then slowly fall unless they get votes. I think this
works very well, although I think it's still a bit too far on the exploitation
tradeoff.

I've thought about how to solve this problem, and I think the best solution is
votes/time since the comment was posted. New comments will immediately appear
at the top, since they have infinite vote/time ratio. But they will quickly
fall to their correct ranking.

There are other solutions, like treating it like a multi-armed bandit problem.
And you can use something like
[https://en.wikipedia.org/wiki/Thompson_sampling](https://en.wikipedia.org/wiki/Thompson_sampling)
. Create a probability distribution for the upvote/downvote ratio for every
comment. Sample from that distribution, and rank the comments accordingly.
This should be close to optimal, but it requires people to use the downvote
button lots (because it measures comment quality by upvote/downvote ratio,
which is what reddit does currently.) Without downvotes, you could use time
again, and create a probability distribution for votes/time.

~~~
cjlars
I would add that while it's plausible, it's by no means obvious that Reddit is
leaving business value on the table. Imagine that they seek to minimize the
bounce-rate on any thread such that they maximize overall time on site for
each user. In this case, the important part could be that the top comment is
'not wrong' rather than 'right'. If the first thing users see is too factually
incorrect, controversial, or emotionally upsetting, people could be driven
people away from the site. Essentially this is the same mechanism that helps
cause the Facebook news bubble effect -- people leave if you tell them what
they don't want to hear, so you write an algorithm that underweights anything
that might do so.

To put it into economic terms, if users have high loss aversion (from being
upset, etc.) relative to their thirst for knowledge gained from great
comments, you would expect exploitation to be overweighted vs exploration. The
business value maximizing solution would then be that once an appropriate top
comment is found, the thread should tend to stay in that equilibrium.

~~~
Houshalter
I really doubt there is any conscious decision by reddit to do this. For years
there was a major bug in their ranking algorithm that hid posts with negative
scores. People invented reasons why this was intentional and why reddit might
want to do that. But eventually they fixed the bug and admitted it wasn't
intentional.

The current comment sorting algorithm was just copied off the internet from
some popular HN post. I don't think there was that much thought put into it.

Showing a few new comments closer to the top is not going to hurt anyone. I
think if anything the massive improvement in comment quality would make users
more loyal to the site. At the moment you can only see the good comments
posted by the first 20 people or so, imagine if it was the best comments out
of 1,000?

There are vastly more voters than commenters. If only one out of 5 people sees
a new comment and votes on it, that would be more than enough to improve the
comment quality 3 fold.

------
cousin_it
Back when I was a regular on LessWrong and it had an active community, it
didn't have that problem at all, despite using a variant of the Reddit
codebase. Their solution was pretty simple, just show the five most recent
comments in the right hand column. People would interact with them, check out
their parent comments, and threads would grow organically regardless of age.
In fact people would often come back to very old threads and revive them. It
might be tricky to scale that to larger communities though.

~~~
nagvx
Yes, this non-linear idea seems very interesting. I have read several
discussions on various boards with much hand-wringing about the best way of
sorting comments - by time, by votes, by user rep, by some sort of weighted
average of the above? Suddenly things get rather complicated as people try to
decide how the weights should be balanced. But this idea of showing multiple
different feeds side-by-side is very refreshing. Two (or even three?) feeds
with different weightings and (and relative sizes!) could provide a better
balance than what we are used to.

(As an aside, it would be entertaining to take it to an extreme - seeing the
top voted comments also shows you the bottom voted, most controversial also
shows least, etc. A sort of anti-echo-chamber measure.)

~~~
Houshalter
I think Lesswrong's solution only works because they have a small community,
and they have a lot of comment junkies that want to read every new comment.
Reddit and HN also have new comment feeds
([https://www.reddit.com/r/all/comments/](https://www.reddit.com/r/all/comments/)
[https://news.ycombinator.com/newcomments](https://news.ycombinator.com/newcomments)),
but I don't think anyone but bots read them. Granted they could be made more
prominent like on Lesswrong.

~~~
cousin_it
Many good subreddits are similar to LW in size and might benefit from a
similar UI change.

------
cs702
Makes sense, due to a self-reinforcing dynamic: early comments that are good
enough to gather upvotes will stay near the top of the page, gathering more
upvotes.

I suspect HN is subject to the same "first comment effect," often to the
detriment of later comments that might be more deserving of the top spot.

Does anyone here have ideas for fixing this?

~~~
joezydeco
I've always wondered why the karma of the upvoter isn't weighed into the
equation.

If someone like tptacek (with 259K karma) votes my comment up, I think that's
a lot more relevant than someone with 300 karma.

~~~
JoshTriplett
I think that would have a stronger tendency to create an echo chamber.

If anything, I'd suggest the reverse: it should count for more to have more
votes from a less tightly coupled portion of the graph. As in, cluster the set
of users based on various types of interactions, and include a multiplicative
factor in scoring based on the distinctness of the users upvoting something.
That way, you can get a modest score by appealing strongly to a small subset
of users, but a massive score only by appealing to many distinct sets of
users.

~~~
ThrustVectoring
It'd also automatically damp the effects of voting rings.

------
gnicholas
This is surprising, and it could mean that early commenters have their voice
heard more than later commenters. But it could also just mean that topics
raised by early commenters end up being discussed in those threads, and
commenters on both sides have their voices heard.

For example, if the first comment is roughly "I read the linked article and
disagree because of A, B, and C" then someone with an opposing viewpoint would
probably reply to the existing first comment instead of creating a separate
comment.

This doesn't mean that the second commenter's voice isn't heard—it just means
it all happens under the thread created by the first commenter.

~~~
asddddd
What if you consider it as an optimization problem for the late commenter? You
have to choose where to insert your comment, most likely there are multiple
options.

A root-level comment when there are already dozens is unlikely to be noticed;
there is significant momentum behind the top comments, usually for good reason
- whether jokes, clever insight, or a popular viewpoint.

Instead, if you reply near the top in a context where it makes _some_ sense,
you skip the root-level graveyard and are almost guaranteed good placement
with the downside of "overhead" due to the upwards chain always appearing
first.

Consider also that the penalty for being the 2nd, 3rd, or worse comment at
root-level gradually increases due to the increasingly nested replies to each
highly rated comment. If you assume root-level comments express opinions,
minority opinions are also more likely to maintain good position in a reply
comment such as the one you posited.

Lastly, "trickle down karma" could be a factor. A quality reply to a root-
level comment could enhance the parent comment's value and result in
additional karma for them. Such situations could include biases like assuming
comments which merited replies are _interesting_ due to peer validation and
thus reading them, and more fundamental added value such as uncited sources
being given.

~~~
minimaxir
There is a "hijacking the top post" meme that pops up occasionally where an
important reply is made to the top-ranking comment, since it would get more
exposure. (Rare, since it most cases it would be seen as spam if unrelated to
the parent)

------
tgb
I do disagree with one point: that there would be no correlation between time
posted and upvotes if the reddit algorithm were completely fair. If I post a
link to r/tipofmytongue asking "What was that low-budget movie that had
engineers inventing a time machine in it?" and the first comment correctly
tells me I was thinking of Primer, then it'll be highly upvoted. But later
commenters won't get to grab that easy karma since it's already answered.
There are only so many "great" things to say in response to most posts. Also,
if someone posts in a thread after it has dropped off the front page, they'll
get fewer upvotes regardless of what the algorithm does.

~~~
minimaxir
Not all subreddits behave that way, and is one of the reason why I looked at
100 different subreddits.

The subreddits not posted as exceptions in the article typically followed the
global trend.

~~~
tgb
I forgot to say that it was a great a post overall, and I enjoyed reading it.
Very interesting!

------
bagrow
What's the weird bump around rank = 30?

Edit: It's probably related to the cutoff of 30 comments, but it's still not
obvious to me...

~~~
KayEss
The reason is most likely that the top and bottom of the page are easiest to
find so the comments at the bottom have a little more visibility than those in
the middle.

------
gggggggg
I think someone HN suffers this also, not already first, by early.

Some of the best HN comments can come late and never make it to the top.

~~~
minimaxir
The HN algorithm changed very recently, which was one of the reasons I had the
idea to take a closer look at the Reddit data. HN, within the past few months,
will highlight new comments within the first 2 slots even on busy threads.

I would perform the same analysis on HN data to confirm if I could. (HN does
not expose comment scores)

------
baccheion
Well, this is one reason to consider normalizing up/down votes by views. Is
there any data on the number of views each comment received (number of times
each was scrolled into view, for example)?

You'd also have to factor in the likelihood of each person up/down voting (if
it's viewed by many people who weren't going to vote anyway, then no votes are
nothing out of the ordinary).

------
gregw134
The same thing happens with search engines as well, where items at the top of
the search results tend to stay at the top because they're purchased or
clicked on at a higher rate. Can anyone chime in with what they've done to fix
this in a search context? Has anyone had success with a one-armed bandit or
randomization approach?

------
dd367
Great read. I think the title "On Reddit, the earlier you comment, the louder
your voice." or something to that tune would've made for a more impactful
headline!

------
zokier
I think this analysis misses some subtlety of smaller subreddits by focusing
on the huge mass subreddits. As such I don't know if the results are at all
applicable for my experience when for example >80% of the threads currently on
my frontpage have <30 comments total (and as such would be cut off from the
analysis). And I don't think I'm such a total outlier here, the long tail of
reddit is really long.

------
dmfdmf
I wish HN had a "sort by new" option because on popular threads many of the
best comments come later and get buried.

~~~
et-al
That's the problem with threaded views, as opposed to the flat-view style of
other forums. Granted with the latter system, the first few comments on a new
page will always get more exposure, too.

------
lifeisstillgood
This is not a surprise - it is I guess inevitable and intuitively understood -
I mean who bothers commenting on a thread here that has 500 earlier comments?
There is zero chance of anyone finding reading or replying

I like the idea of less wrong using the right hand columns

------
nix0n
tl;dr: 17.24%

~~~
minimaxir
For posterity, the tl;dr concern is one of the reasons I include the relevant
chart at the beginning of the post.

~~~
dredmorbius
Putting it in the title would've been less a cheat.

The analysis is interesting. "What" and "17.2" both occupy four characters.
The latter conveys far more meaning.

Clickbait, even for an informative article, is insulting and shows contempt
for the reader. Don't do that.

------
miguelrochefort
It's almost as if voting was a poor quality selection strategy. /s

------
andrewmcwatters
Vote inflation might serve as an interesting dynamic for allowing other's
voices to be heard. While a new vote will always carry the same value,
acquired votes may suffer from inflationary decay, providing a rotation on
viewed comments. You could probably modify this inflation based on thread
velocity.

