Hacker News new | past | comments | ask | show | jobs | submit login
How to solve the problem that the topmost comments get all upvotes (debiki.com)
231 points by KajMagnus on March 23, 2013 | hide | past | favorite | 125 comments



I have actually done quite some research into HN's and Reddit's algorithms (i.e. http://amix.dk/blog/post/19574 and http://amix.dk/blog/post/19588 ) and I would say that using The Wilson score confidence interval would improve HN's comment sections a lot.

The real meat is here: http://www.evanmiller.org/how-not-to-sort-by-average-rating....

Even if it looks hairy it's fairly simple to implement in e.g. Python (from Reddit's code, rewritten from PyRex): https://gist.github.com/amix/5230165

Maybe pg could try this out for a few days and we could see what the results are!


While this is more sophisticated, I doubt it's the "correct solution". The underlying assumption of independent draws from a Bernoulli distribution is vitiated, as the order in which the siblings are read usually affects their proportion of upvotes to downvotes. It still significantly penalizes late comments, because the 95% confidence interval is much wider for them (and its inferior limit consequently lower). Last but not least, it brings down comments that are at all controversial; whereas an opinion with 1000 upvotes and 500 downvotes is probably more interesting to read than one that only gathered 50 upvotes within the same interval. In short, this sorting algorithm seems ideal for quickly placing on top bland comments, that nobody really disagrees with but will upvote anyway because they haven't yet seen the much more incisive discussion below.


> the 95% confidence interval is much wider for them

If you change this to an 80% confidence interval, it'd become narrower and might actually somewhat favor new comments, with only a few votes (upvotes)? So this might be configurable?

> whereas an opinion with 1000 upvotes and 500 downvotes is probably more interesting to read than one that only gathered 50 upvotes within the same interval

Isn't it more likely that the 1000 upvotes and 500 downvotes is a cute kitten photo? Or something similar (a very short but strong and popular comment)? :-) And that the one that gathered 50 upvotes and no downvotes is the truly interesting read?

However! If you're able to estimate how many people actually read the comment that got 50 upvotes — then you'll know if it's truly interesting, but too long to read — only 50 people have read it. Or if it's boring — 1500 people have read it.

Perhaps a good sorting algorithm would be: Lower bound of confidence interval for:

(upvotes - downvotes) / num-people-who-read-the-comment


The problem is that kittens are quite popular. They are more popular than a well formed discussion. What algorithm separates kitten posts from quality content? How could you tell the difference between a great photo and a lolcat? I think the only way to do this is to empower moderators. Any other scheme is just a race to the middle for bland, unoffensive and easily digestible content to maximize points.


Letting different comments reach the top wouldn't be such a bad thing most of the time, as long as there is more than 1 solid comment.

Only time this would be bad is if there is something important, such as a correction or a key reply by the author or the person they were calling out etc where it makes sense that everyone should read first.


Knowing how many read a comment would help with comment assessment but how do you get that information. I suppose we could track scroll positions, I'm not sure how reliable it would be.

It'd be cool if websites could use eye tracking, then we could easily tell what got read. Maybe in the future.


Wilson score confidence interval is easy to implement and already proven to work greatly on Reddit, where it is the default sorting algorithm. Do note that Reddit also has down votes and Reddit also has threaded comments. While I agree, it's not the "perfect" solution, but for a perfect solution HN admins would need to invest a lot of time (and it would probably be a lot more buggy and in the end they probably won't come up with something that's better). Going with Wilson they can do a quick fix that would improve the sorting a lot.


I can't speak for anyone else, but I've been consistently more happy with the comment rankings on HN as opposed to Reddit. This could be something to do with voting behavior, but I think it's at least partly thanks to HN's ranking algorithm as opposed to Reddit's.


I'm actually using the algorithm that Evan Miller writes about (i.e. the lower bound of a binomial proportion confidence interval), in the discussion system that powers the demo in the linked article. I'm using the Agresti-Coull interval, http://en.wikipedia.org/wiki/Binomial_proportion_confidence_..., however, because it seemed easier to implement. (I didn't know about Evan Miller's article (or any other).)


[deleted]


For instance?


So, there are some that require offline calculations (I can't really get into more than this for various reasons).


Perhaps I'm a bit dense today, but why are you unable to link to descriptions of one or more of these better approaches? Many people would find the information useful and interesting.


Not even a Wikipedia link? This was so anti-climactic.


Wikipedia does not always have the answers :)

Sorry, I fully intended to publish a description, but it turns out at least two of them are the subject of pending patent apps that haven't been published yet, so I can't.

In any case, do note there are a wide variety of wilson variants or alternatives that may be applicable to various situations (agresti-coull, etc), as well as jeffrey's prior bayesian intervals.

http://stat.wharton.upenn.edu/~tcai/paper/Binomial-Annals.pd... is a reasonable comparison of the current common methods.

Anyway, I deleted the original comment since it doesn't add anything now.


Search engines have to solve this problem, that the results listed higher garner more clicks, and thus using clicks to rank is biased. They can separate the position of the result from its relevance through something called a click model. They make some of the assumptions you list in your article, which you call "examples". Click models are mathematically rigorous and can be quite optimal for solving the upvote bias, and obviously you would replace "clicks" with "upvotes" in your implementation. Here are some well-known click models:

Cascade Model (one of the original click models): http://www.wsdm2009.org/wsdm2008.org/WSDM2008-papers/p87.pdf

Dynamic bayesian network model (a more generalizable Cascade Model): http://www2009.eprints.org/1/1/p1.pdf

DBN model with scroll and hover interactions (kind of like your example 2): http://jeffhuang.com/Final_CursorModel_SIGIR12.pdf


I've only skimmed through the articles so far and they seem really interesting. And your comment ought to be the topmost comment :-) but people won't have time to read the three articles and upvote :-(

Anyway I updated the article I wrote with a section "This problem elsewhere, and solved?" that lists your links (and said thank you to you).

May I ask, how come you knew about how search engines function? (For example, do you work with developing search engines or have you studied them at University?)


> your comment ought to be the topmost comment :-)

Indeed!

tags.push("ressources")

In general, I find that comment which links to a PDFs tend to be good. Especially when the linked-to PDF is formatted in two columns ;)


I think this is a bad idea. Specifically:

> In the two examples above: When you upvote a comment, the computer thinks that the other comments you have read (the blue ones) but did not upvote, are not terribly interesting.

If you upvote a comment as interesting and the not the ones leading up to it, then you are punishing that comment because you're punishing its predecessors in the threaded chain, so it will be less-seen because you upvoted it.

Any other assumptions of which comments are read will probably not end well. Some people may skip to comments to get someone's personal opinion, other people may only read shorter comments.

~~~

It's worth noting that HN has an interesting algorithm that does not exactly sort by comment score. I have a very high average karma, and when I post on a topic, even if the topic already have 60 comments, my comment begins at the top and stays there even if I'm not upvoted. I suspect that the sorting algorithm is doing something like implicitly adding average karma to the karma of each post when determining order. Therefore it doesn't sort by "best", but by "well regarded".

This is good and bad. It's good because if a person who usually makes high quality comments comes along late to the game, their voice will be heard. It's bad because it creates its own rich-get-richer problem.

I think a better solution to the problem is to simply:

1) Hide upvote numbers like HN is already doing[1]

2) Dither the comments between "new" and "well regarded". First comment is the newest, second is the most well regarded, third comment is the next newest, etc. Perhaps randomize which category comes first.

~~~

[1] Of course hiding upvote numbers has its own major problem because sometimes the hard number of "I agree" upvotes is important. When you want to "Ask HN", say, what particular frameworks people can vouch for, you have no real way of evaluating the comment responses in terms of people that agree. Since they aren't even sorted from best to worst, the information you have is somewhat dim. This could be remedied by giving submitters the ability to turn on comment scores for only their thread. (Or the ability for mods to do it)


> """I think this is a bad idea."""

I think the issues you mention can be addressed. I've outlined how, below:

1: > """If you upvote a comment as interesting and the not the ones leading up to it, then you are punishing that comment because you're punishing its predecessors in the threaded chain, so it will be less-seen because you upvoted it."""

That's a good point. I'm not sure if it is an issue, however. Not upvoting a comment would have very little impact — it wouldn't weight as much as a downvote, for example. So I think the effects wouldn't be that bad. (A comments score could be: `(upvotes - downvotes) / number_of_people_who_read_but_didn't_upvote` and adding +1 to `number_of_people_who_read_but_didn't_upvote` will have only a tiny tiny effect.)

Anyway I think the issue can be addressed like so:

If there is a thread with very interesting comments somewhere in it, prioritize that thread a little bit — so it'll get a score that is a little bit better than the very first comment in it (the comment that starts the thread).

I think SlashDot does something reminiscent of this? — Sometimes, [a mediocre comment that starts a new thread] is collapsed, but [comments deeper inside the thread but with many upvotes] are shown in full.

Alternatively: Modify the algorithm: Don't punish ancestors, only siblings.

2: > """Any other assumptions of which comments are read will probably not end well. Some people may skip to comments to get someone's personal opinion, other people may only read shorter comments."""

I think it's okay that the algorithm makes mistakes sometimes. It only needs to work well on the whole, given data about many visitors. — On mobile phones, however, where only 1 comment is shown at a time, it'd be easier to know what the reader is reading.

Also keep in mind that the examples in the article were intended as examples — if there are issues, the algorithms can perhaps be modified to take them into account. For example, your example visitor that reads only short comments, would be more likely to upvote only short comments. So the algorithm could then "realize" that "oh, this visitor only reads short comments. I'll take that into account" :-)


> This is good and bad. It's good because if a person who usually makes high quality comments comes along late to the game, their voice will be heard. It's bad because it creates its own rich-get-richer problem.

This is also bad because, if true, it creates a strong disincentive to comment on less popular stories because, regardless of the quality of your comments, doing so would lower your average karma.


Perhaps normalize by dividing each of your comment scores by the median score of other replies to the same parent? Then average these normalized scores.


But you have nothing against the second example, presumably? I still see that as a valid option.


Idea is good, but realization is far from perfect.


FYI I think all comments begin at the top. My comments also start at the top and then fairly quickly move down into upvote order, my karma is well below the top of the leaderboard.


Yes, I believe HN comment sorting uses the same algorithm as news items, where the score of an item is based on both votes and time, decaying over time.

Description: http://amix.dk/blog/post/19574

Comment sorting: https://github.com/nex3/arc/blob/b78c7f/lib/news.arc#L2028-L...


I've ruled this out as a possibility. For instance here I was the 4th comment (there are now 16), I posted 28 minutes ago and I'm still at the top comment slot at only 1 point.

It must be more than just a function of time and score, otherwise any of the new 1-point comments would have superseded mine, but they didn't.


Your average karma is 24, so your comments are weighted much, much more than others (it's one of metrics HN uses to rank comments). Also, your comment is long, which is also a factor (though, not a big one).

Source: experience :)


Wow, I never thought about weighing comments that are longer. But that makes a lot of sense!


If you take comment length + karma per post / comments on post you should end up with reasonably decent comments making it to the top. Ish. It's still never going to be able to fully deal with all the corner cases on a site as large as HN but it's often Good Enough.


It would make sense if your comments were weighted higher having a higher karma, but I think all of our comments start at the top to make sure that at least someone sees them and that they can get upvotes if they are insightful/interesting.


I think HN changed this in the past few days. it's not the case anymore.


Do people really care about where there comments are. This sounds so ridiculous.


You wouldn't create a top level comment if you weren't hoping to contribute and for it to be read. For it to be more likely to be read, it's better off at the top. So yes, I'd say that people care.


It's rather pathetic. Considering all the things that life has to offer, caring about making poll position in comments is really sad.


Why comment at the top level in this sort of discussion then?

The topic isn't so much poll position but giving every comment a chance at being noticed so that the better options rise naturally to the top and create a better conversation.

I don't think this is the "I don't care about silly internet points" discussion you think it is.


Well, I personally like to see the most interesting comments at the top. But that creates the inherit problem - the first interesting comment gets stuck at the top and prevents all the newer ones from being seen.


For HN specifically, collapsible comments would make a huge difference. A top-level comment in position one can have dozens of comments under it, making the next comment (which is perhaps only 1 upvote lower) appear far, far down the page.

If you could collapse the top level comments, this problem would evaporate.


This is exactly why HN needs collapsible comments. Not so I can collapse them for my convenience, so that stuff doesn't get buried so easily. Sometimes the top thread is all there is on the first page ... for example, there was this great blog post by raganwald a while. Here's the new version [1] since I assume the posterous one won't be around much longer. And here's the HN discussion of the original [2] in which, as he noted, a comment about IQ testing (completely tangential to the post) took the whole first page.

1 http://braythwayt.com/2012/03/29/a-womans-story.html 2 https://news.ycombinator.com/item?id=3772292


In that case, you would collapse the top comment chain and be shown nothing underneath, since HN only shows X amount of total comments before requiring you to click More.

I think HN would have to increase the total number of comments per page before allowing collapsable chains.


That's true, once it got that big, but I think the hope is that it wouldn't necessarily remain the top comment under those circumstances. Allowing people to collapse the thread, before it took a whole page, would allow them to see and upvote the other comments if they wished.



HN and reddit should randomize comment placement based on the poster's karma, the ranking of the comment with some notion of number of readers and/or time to normalize the expected value. Ideally, the algorithm would also keep track of how well comments due in their temporarily elevated positions.


I'd solve this completely differently. Implement a stochastic sorter. Every User sees potentially a different sort order.

However, the order is a probability distribution derived from the votes on all comments.

When there are few votes on the comments, the distribution is chaotic, because the algorithm doesn't really know anything about the comparitive quality. After enough comments, the sort order gets more and more deterministic...


Agree more or less, except for the last part. I don't see how convergence is necessary or even desirable.


When you hit the comments, it's nice to see the better comments towards the top.


Yeah, but specifically I don't think that it's necessary to have a near deterministic sort order once you get to (say) 500 comments. I think that a probabilistic but ordered sort might be best even as the number of comments approaches infinity. Obviously you'd want the better comments towards the top, but the argument is for randomization to bump the ordering away from local optima.


I don't think this would solve the problem as stated as the first comments will still see more votes having been there longer.


You could easily fix this by weighting the value of each new vote based on the position the comment was in when it got the vote. These weightings would need to be calibrated but would produce a better outcome.


Precisely what I was thinking. Rather than assuming the what people have read, simply give less importance to upvotes made to a top comment than to those made to comments below.

I think it would also have to be logarithmic, because being the top comment is MUCH better than being the third comment, while being the 6th comment is only marginally better than being the 8th comment.

Coupled with the stochastic model, it could make it a far more level game.


I think that's a good idea that could be implemented quickly, and perhaps it's good enough :-)

I also think it's easier to calibrate [an algorithm that estimates which comments people have read], than to calibrate [how much more importance to give to a vote that happens far away from the original post].

In fact, the latter is impossible? Because you have nothing to calibrate against? You don't know what the correct result is.

But here's a reasonable (?) calibration for an algorithm that estimates which comments people have read: If you upvote comment X, then you have probably read its 3 earlier siblings and its parent and grandparent. This doesn't need much calibration, and would probably (I think) work fairly well on the whole? — It's somewhat possible to validate and tweak this calibration, by observing how people actually do, when they read forum & blog comments.


You can sort by upvotes : views ratio, combining both approaches.


I'm not sure if I'm disappointed or impressed that the 1st comment on this post wasn't "first!".


Have you ever seen someone post "first!" on HN?


Hey, there's a First! time for everything.


Reddit is way ahead of the curve on this one. Here's a post by Randall Munroe on the problem and solution, from 2009:

http://blog.reddit.com/2009/10/reddits-new-comment-sorting-s...

I seem to recall they also randomly premute comment order a bit as well now.


Yes. The math is much more subtle than people realize. Bayesian (meaning conditional) probability is required for any solution to be passable. The problem that needs to be solved is given the number of upvotes, downvotes, and views, what is the probability that it is a comment that the next random reader would want to upvote it, and then sort on that probability. There are well established formulas for this kind of thing.

The one you link considers the upvotes and downvotes as the sample population then constructs the probability that you will upvote it and ranks on that. This allows late posts to take over early posts if they have a better ratio, even if they have significantly less votes, but only if it is "enough" better to make up for its lack of confidence.

This fails to address a few things here:

* It completely ignores people that read the post but didn't vote. There really is no way to get a perfect count of this no matter how much scroll logging, but you could approximate it and then include it in the calculation.

* On HN many people can't downvote.

* As others have pointed HN doesn't necessarily start with a base assumption of all commenters being equal, nor should they.


Yes, but you are almost guaranteed lower votes and lack of confidence if you tie voting and comment viewing into the same function, such that you have to scroll past comments to get to low confidence ones that need more voting in order to establish reasonable intervals.

For example: You would be better off also randomly displaying a comment that requires more voting to establish good confidence bounds on the first page to each user somewhere.

This enables you to get better results quicker over the entire group, and gives you better results than taking into account how many people read but didn't vote.


I have always disliked that assumption, it assumes too close of a correlation between "content someone wants to read" and "content someone will upvote". I would add to your list that it biases content towards posts that have popular opinions that the majority thinks are minority opinions.

It fits well with how people act and it fits well with what people complain about in the comments. Hard to measure though. Although if something as rough as scroll logging can still make an improvement I'm sure you could find something rough along these lines that would.

(It's so fun figuring out how to make our filter bubbles more harmful isn't it?)


"* It completely ignores people that read the post but didn't vote. There really is no way to get a perfect count of this no matter how much scroll logging, but you could approximate it and then include it in the calculation."

Would it help if there was an incentive to vote? You could, for example, increase a user's score by a point for each vote given but weight the vote with more points. This way user who write good comments and active readers could benefit.

But then again, this could tempt some users to abuse the voting system. I can't predict which of these two reactions would prevail.


Problem: The top comments get the most votes.

Solution: Disable voting on the top comment(s). This would allow the trailing comments to catch up until they are the top comments and their voting is disabled. The top comment can't race far ahead of the others.


But this would still result in the topmost comments getting most upvotes — all topmost comments except for number 1?

Now I'm oversimplifying, but the effect would be that comment 1 and 2 swapped place over and over again. Or the first N + 1 comments, if you restrict voting on the first N comments. (And people might be confused, perhaps annoyed, when they cannot vote on the first N comments?)

This would not help promoting [a really good but forgotten comment] that is located far away at position 10 or 20.


Perhaps a slight modification: the farther down the list, the more an upvote should count (because of the top-comment bias), with it being unnecessary all together for the current winner... If someone is up-voting a comment farther down, they not only went to the effort to scroll, and they should be rewarded for actively curating, but it means they had to endure other comments higher up that were not as worthy, which is an increasing burden the further down the comment was found (for example, a geometric weighting of upvotes). It might mean more flapping of the position, but it also ensures equal (fair) exposure of equally-good comments.

The submission ranking, I assume, takes more into account than just upvotes (age, number and quality of sub-comments, etc). The top-comment problem feels like the same problem, so might be an opportunity for code consolidation.


There's a fundamental problems with your approach.

To reliably determine if someone has actually read a comment, this means the comment has to be hidden. You cannot assume that simply because something is viewable that the user is viewing it. Since you have to hide a comment, this requires a user interaction event to change its state. This is a broken model since comments are primarily a passive, not interactive, experience.


Actually, no, there is no fundamental problem with the approach.

Search engines have a similar problem: They estimate how useful a search result link is, by counting the number of people that click that link. To do this, they need to take into account that people tend to click the topmost search results only.

But they are handling it just fine, without hiding any search results or something like that. Instead they simply count clicks on the search result link, and apply some mathematics related to the probability that you click that link, when it's so and so far away from the top. And this is similar to the approach I suggested in example 1, which relies on clicks on the vote up/down button (instead of search result link).

(If people don't interact with the page (don't upvote anything at all), I think an algorithm could simply disregard those people.)


But just because you, in one particular instance didn't see it, doesn't mean anything in the long term. Obviously with any statistical estimate, individual data points can fall far from the curve. But the idea is that, after enough time, enough views, and enough upvotes, the system slides into place.


I agree with this. The algorithm doesn't need to be exactly correct, always. It only needs to work well on the whole, given data on many visitors.


This is what I thought aswell. Unless you cripple the UI, there's no way you are gonna be able to reliably determine whether a comment has been read programmatically. You could try to guess through a number of things, but it'll never be accurate enough not to break the order by false positives.


This is a problem well-known in Web search as 'position bias': top results get more clicks and therefore get 'upvoted' by machine learning algorithms.

There are theoretically sound solutions. Typically, I recommend [1]. Very briefly the solution is to yield an unbiased click estimate (or for what matters here, an unbiased number of upvotes). Look at the position and cascade models as well as the solution in [1]. An approximation of this solution at the end of the article is very easy to implement while theoretically sound.

Side note: another 'solution' would be to randomly generate a slightly different order for every reader (e.g. sampling based on the current number of upvotes for every comment). The readers would then spread their upvotes across more comments overall.

[1] O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th International World Wide Web Conference (WWW), 2009. http://olivier.chapelle.cc/pub/DBN_www2009.pdf


whathappenedto also suggested the article you linked, in this reply: https://news.ycombinator.com/item?id=5430495 (the "Dynamic bayesian network model" link).

The approximative solution at the end of the article seems really interesting :-) It'd feel better to implement something that is theoretically sound. I suppose the algorithm could need some tweaking, since a discussion is a tree or a graph, but a search result listing is... a list.

Generating a slightly different order for every reader also seems like a good idea, and fairly easy to implement. — So as not to make people confused when they reload the page (by shuffling comments around), perhaps one could use the user ID or IP number as random seed.

(May I ask, how come you know how search engines solve this problem?)


Might be relevant: How Not To Sort By Average Rating

http://www.evanmiller.org/how-not-to-sort-by-average-rating....


It assumes average posts would have the same amount of upvotes and downvotes. This is rarely true - people don't care to downvote boring comments, they downvote ones they don't agree with more often. This is why considering views might be better idea.


Interesting :-) I didn't know about his article, but nevertheless that's roughly how I've currently implemented the sorting algorithm in the discussion system that powers http://www.debiki.com/demo/-71cs1-demo-page-1 — I'm using the simpler Agresti-Coull interval though (http://en.wikipedia.org/wiki/Binomial_proportion_confidence_... ), not the Wilson interval. (And I've largely forgotten all about mathematics since University, 10 years ago.)


I'm working on this same problem for my project https://www.newschallenge.org/open/open-government/submissio...

My solutions is just throw the idea of "order" out the window and instead use a "sorting hat" process. Sure, by default it may appear in chronological order or "most votes" but based on why you - the reader - came upon that content in the first place, you'll be able to more easily sift for what you're looking for.

Whether you're in the mood for a pun thread, some criticism, or some "deep thoughts", you'll be able to pluck those out from the greater total conversation and then work forward or backwards in the context from those points. Even if a pun thread was the first 100 upvoted comments, you'd be able to discard those with one click and get to the first "serious" response.

Basically, it's because I know what it's like to browse r/science.


(((There's a hash fragment appended to your link, which makes it point to somewhere in the middle of the linked page, and it took fairly long before I realized I should start reading from the top. The hash fragment: In `www.newschallenge.org/...-that-just-makes-sense/#c-aaf90...f7977e`, the #.... part should probably have been removed? )))

I like this initiative! In the video, I like the idea that people be able to discuss only a part of a legislation proposal. Actually I've been experimenting with something related, namely inline comments, http://www.debiki.com/-81101-future-features#inline-comments.

Re: "based on why you - the reader - came upon that content in the first place, you'll be able to more easily sift for what you're looking for" — do you sort comments based on some information you have on the visitor?

Re: "and then work forward or backwards in the context from those points" This somewhat reminds me of considering the discussion being a graph of comments, in which you can navigate freely back and forth? And you can quickly bypass subthreads (e.g. replies to the pun thread?)?


Thanks for the comment about the link!

Also, I like what you've done with your inline comments. For my project we don't just use it for "improvements" but for suggestions, general comments, questions, etc...a whole range of taggable purposes for which you'd be highlighting that section in the legislation. So you'd highlight first, name a purpose second, then type your content.

We wouldn't collect any info on the visitor other than what they provide ("I don't want to see _____" would remove posts marked as such).


I read some of the blog post comments, and got the impression that there's not currently any live demo. — If in the future you publish something online, then, if you want to, please feel free to send me an email. I live far away in Sweden though, so I'd be mostly interested in the tech parts (rather than contributing). (Perhaps I could help you with usability testing though, or give you related feedback, if that'd be useful.)


i'm pretty happy with http://www.slashdot.org/

the number of upvotes is limited.

the karma is actually useful.

the level you browse at is configurable.

the algorithm is simple

thus, top comments don't get all upvotes, AND you actually get a lot more useful comments. I like it. (the article submission and approval process however is far too slow for today's fast paced news systems, this is where HN excels)


This is very subjective, of course, but on slashdot, I think that the highest rated comments on controversial threads (e.g., recent PyCon uproar) were not at all "insightful" or "informative", though labelled as such. They often simply added heat to the discussion, usually with language almost guaranteed to inflame. On HN, a lot of those comments would have been flagged. That doesn't mean they would have been deleted, of course, but they most certainly would have been flagged. I also wonder how the friends/foes system affects moderation. I know that you don't necessarily know if/when you will be given moderation points, but I think it has to explain the scoring of some the comments. How else can you explain how a string of 4-letter words is given a score of 5- insightful/informative/funny?


all comment systems will eventually score the insightfulness as a whole, based on what the majority think.

I do find some high ranked posts to be "not so wise" on any forum, but I don't find "obvious trolls" to be highly rated (be it reddit, hn or /.)

the difference that i see with /. however is that I don't get all the high score posts "lost" somewhere because 3 posts have 10292938 points and the others "didn't get voted on recently" (where recently can be the past 5 minutes really)


I also wonder how the friends/foes system affects moderation.

That's just something you can assign your own score to; it doesn't affect anything else.


Did anyone else notice that you can slide on the page, like on touchscreen devices, with a mouse?


Yeah, it's actually pretty well implemented because you can select text aswell, it doesn't get in the way. I didn't think something like this could work, but it's in fact not that bad.

Edit: https://github.com/debiki/utterscroll


Isn't it just a bandit problem? Couldn't you use something like Thompson sampling to sort the comments? It'd have the nice benefits of adding random noise like reddit tries to do, while still having finite-time optimality guarantees.


I wonder if people treat comments a bit like search results? They will read the first one or two pages and stop after that (assuming the comments are paginated). If comments are posted in chronological order (oldest first), then you get the "click bias" that other people mention. Interesting comments that fall further down the discussion thread get buried because few people browse that far.

Also, if you have threaded discussions, a good post in response to another comment may need the other post(s) to provide context, so will simply highlighting that indivdual post make sense without the others?


How come we can upvote top comments, anyways?


Just value the vote in some proportional manor based on its location on the page.

For example: voting for the top comment only adds 0.1 to the comment_score but voting for the bottom comment adds 2.0 to its comment_score, etc.


Someone else also suggested this, here: https://news.ycombinator.com/item?id=5430541

I think it'd be hard to assign the 0.1, ..., 2.0 wheights "correctly"? How would one know if the current choice of weights make things better, or perhaps even worse (if you overdo it).


This problem has always seemed pretty simple to me. That usually means I'm missing something... but IMHO, from the point of view of any given user, the "top" of the page should correspond to a random offset from the chronologically-first comment. To keep from being too disorienting, the offset might be tied to a hash of the user's IP address or something similarly repeatable.

This would also fix the browsing experience on app stores, where apps near the top of an alphabetical or chronological list of search results are seen by a disproportionate number of users.


Amazon product reviews solves this problem. There's the regular feed of recent reviews in one column, and right next to it is a list of the top reviews and featured reviews.


I'm having a hard time coming up with a beautiful intuitive layout with 2 columns — I think most blogs would look somewhat odd if the comment field was split in 2 columns?

And if you don't use columns, but show the most recent comment in a box above other comments? Then a variant of the original problem appears? — The most recent comment is visible at the top of the comment section (in the most-recent-comments box) and gets most upvotes / attention. (If you cannot upvote it, people will feel annoyed?)

Also, showing the most recent comments first, rather than genuinely-interesting-comments, somewhat wastes people's time?

Anyway I think it's a good idea (although hard to implement?) and I've thought about something similar a bit too.


Upvotes/Downvotes are the bane of all discussion threads. What is this, Junior High? I can read and decide for myself if the comment is brilliant, valid, or just another jackass on the web. Just post the comments in the order they were made, and let the reader decide what to do with them. There's not a single website that uses upvotes/downvotes that hasn't turned into a huge circle jerk session of mutual back patting and popularity contests.


This is why I prefer forums most of the time, but even on forums it's a huge nuisance to have to comb through a thread with thousands of posts to get to some real information.


The boxes that determined me reading them were turning blue much faster than I got to them.

It really didn't consider me thinking about a topic, more just if I was reading them all quickly, what I might get to.

One thing that might be an interesting add is where I am clicking my mouse. Especially on high text ratio websites, I click where I am reading to help guide my eyes. Do others do that? That could help the when read algo.


Other people also mention that the-boxes-that-turn-blue don't track what they're actually reading very well. And the reading speed doesn't take into account that people pause and think. Or that they might pause and write a reply.

I suppose the-blue-boxes-approach would work only with mobile phones, where only one comment is shown at a time.

(I haven't thought much about it, but I think I tend to keep the mouse pointer just anywhere.)


Maybe I'm just stupid, but I feel like scoring by "attention" would subject comments to the same top-sticking, accelerating feedback loop. No?


You would divide by attention, not multiply :-) It could perhaps be done like so: `(upvotes - downvotes) / attention` (where attention = an estimate of the number of people who read the comment and voted on something)


Gotcha. I thought he proposed simply substituting hits/attention for upvotes


An easier solution would be to limit voting on a comment to, say, 2 hours after it was posted. However, is this really that big of a problem?

Often I find myself reading from the bottom up, so I'm more likely to promote overlooked comments. The comments on top don't need my vote, so I seldom give it to them. As long as I'm not the only person doing this, everything should be fine.


I don't think a time limit would mitigate the problem? — People won't return to vote, once the time limit has expired. They'll just forget about it. And new visitors (that arrives after the time limit has expired) will do as usual: read the topmost comments and upvote them.

If this is a big problem: I've heard people mention it, and I think I've encountered it sometimes. But I guess it is a rather serious problem, because people are so terribly lazy. I mean, they are short of time and have to prioritize.


It would mitigate the problem because then you couldn't upvote top comments period. If the assumption is true that all top comments are old, then this simple trick would definitely help.


Oh I thought you meant: Allow voting after two hours have passed. But you meant: Allow voting until two hours have passed (?).

If people cannot upvote top comments, then the problems mentioned in my reply to this comment should apply: https://news.ycombinator.com/item?id=5430699

Also, forbidding votes on top comments doesn't help a comment at position 10 or 20 to surface to the top. (So really useful comments posted rather late, would still be forgotten forever)


>is this really that big of a problem?

(My 2nd reply) Yes apparently it is, I feel fairly sure now.

Search this page for "Search engines have to solve this problem". — They have the same problem, and "have" to solve it. So I think it does matter fairly much. (Much enough to be worth solving :-))

And read this article from Reddit: http://blog.reddit.com/2009/10/reddits-new-comment-sorting-s... It's about an even worse version of the problem, when one uses a naive (but prevalent!) approach to comment sorting. Anyway it exemplifies how comments posted later on has no chance to reach the top of the page, no matter how useful/interesting they are.


One tweak: I would only count views by people who upvote at least one comment in the thread. The reason being that people who read all the way down to the bottom are more invested in the topic and more likely to upvote in general, while lurkers may all read the top 10 comments and leave without voting.


I wrote an article about this subject, using Reddit comments as an example, that's uses a Bayesian framework.

http://camdp.com/blogs/how-sort-comments-intelligently-reddi...


Why not simply give the reader control?

Sort by up-votes, down-votes, newest, oldest, author, poster karma, various auto-sort algos and, of course, random. Pick a few options and giv e the reader control over the post discovery process.

I think that could be very interesting.



The real problem with this approach is that when you reload the page all comments will slide around the page and you can't find the ones you've read already, unless you account for that by hiding read comments or something like that.


Am I the only one around here who doesn't consider this a problem at all?


It's a problem because good comments posted later are much less likely to end up near the top where they will be read.


That means the real problem is people only read comments at the top. The solution is simple: put every new comment at the top. If someone makes a good comment, it's more likely someone else will read it.


The simplest most effective change would simply be to not allow upvotes for the number 1 thing.

Alternatively, gray out the button more based on how a comment ranks (higher = harder to see)


"Alternatively, gray out the button more based on how a comment ranks (higher = harder to see)"

I suspect you were joking, but I don't think this will work because everyone knows where the upvote button is.

This did however lead me to the idea of hiding the upvote button in more difficult to find places the higher upvoted a comment is. And now I'm laughing about this when taken to a surreal conclusion so thanks! :)

(e.g. "This top comment is brilliant, where's the upvote button? How did HN get it inside my fishtank?!" etc.)


Someone else also suggested disabling upvotes for comment no. 1. I replied here: https://news.ycombinator.com/item?id=5430753

I'd guess people would eventually find the button anyway, even if it's grayed out? I mean, they know where it's located, and notice that they can click on various shades of gray... I'd guess they'd feel somewhat upset about the odd UI, with clickable "disabled" buttons :-)


graying would serve to make it slightly more annoying to see/click -- to dissuade casual upvotes.

You could also make the hitbox of the button increasingly smaller - to where the no.2 comment has a 1x1 px box.

tl:dr make it frustrating to upvote top comments -- if they really 'deserve it' people will go out of their way to do it


It would bring interesting comment to the top and then they would get a lot of views and go down and it would keep doing this until it sorted by time eventually


How strong that effect is, depends on how you implement the algorithm.

You don't have to assume a person has read all ancestors, just because he votes on a comment. — If you assume s/he has read only the closest 3 ancestors (which I think is more reasonable), the problem you describe is largely gone.

If you do, however, assume all ancestors have been read, I think there would be a tendency that the most popular comments cycled through the topmost positions (up, down, up, down, between position 1, 2, 3 perhaps). But things would not be sorted by time.


I always assumed that the best solution was the votes / age one, at least as a starting point.

Adding average karma is certainly an interesting system as somebody else mentioned.


An important variable not considered by this solution is who the author of a comment is.


Yes, I'm sure this has been thought about before, but what would happen if we threw the idea of everybody seeing the same order out the window? The comments upvoted by people whose comments you have upvoted would have added weight for example, even to the 2nd, 3rd, nth degree. There's a whole bunch of questions that spring from this, I think, and you wouldn't have to go all-in on it, but I think it could be an interesting exercise.


Progressively decrease what each vote is worth.


The problem statement in the submitted public discussion post:

"The really interesting comments, however, remain forgotten somewhere below, because too few people take the time to scroll down, find them and read them."

The solution proposed:

"This should solve the above-mentioned problem:

"The computer counts how many people have read each comment, and takes this into account, when it sorts all comments."

Ladies and gentlemen, please check my reading comprehension. Do you see what I see here? The person posting says "too few people take the time to scroll down, find them and read them" and then says "The computer counts how many people have read each comment, and takes this into account, when it sorts all comments." How does this give any more prominence to comments that few people are reading (as compared to comments that more people are reading) than any other way of sorting comments? If the problem is that some people aren't reading certain comments, how can how often those comments are read be used to draw more attention to those comments?

Perhaps I am too tired after a weekend day of teaching followed by research to understand what is being proposed here, but I don't think this makes sense.

Anyway, in threads here on Hacker News, there are other ways to find good comments. First of all, there is the bestcomments view of the community,

https://news.ycombinator.com/bestcomments

which, while not a perfect technical solution either, sometimes does promote sub-sub-subcomments to visibility far greater than the visibility of the original greatgrandparent comment in the same thread. Some readers of Hacker News also follow people who post good comments by looking up the links to their comments from their user profiles, for example:

https://news.ycombinator.com/threads?id=patio11

https://news.ycombinator.com/threads?id=raganwald

https://news.ycombinator.com/threads?id=jgrahamc

We can also use HN search to search up comment threads by keyword, and evaluate them for ourselves rather than by how they are placed in a thread. Anyhow, I don't worry about this. Sometimes I think the top comment in a thread is the most interesting and informative, by far, and other times I read far down into a thread to find the comments I like best and need most. Either way, there is plenty of good stuff here. The best way to bring about more good stuff here is to read a lot of the comments thoughtfully, and to upvote all the good stuff you find. Emphasize the positive, and upvote early and often.


Instead of ranking by net points, or points over time, the comments could be ranked by points per view. This essentially gives a close approximation to "What percentage of people who viewed this comment found it worthwhile?"


So the problem is that sorting by upvotes alone creates a list of comments where the first comments to the pack get the most reads, and therefore the most upvotes, such that the first comment will have 200+ upvotes if it is posted first, but may never even see the light of day if it's added later, once a majority of the votes have already been cast.

The solution is to sort instead by (upvotes/total views). That way, the comment at the top that 1000 people have viewed and 500 people have upvoted falls back behind one that 50 have viewed and that 40 have upvoted, a comment that typically would go unseen by the masses, but which is signficantly more likely to be upvoted (or enjoyed) by any given person.


Quora did a good job of fixing this. You frequently see highly upvoted answers that aren't among the top answers and similarly answers with relatively few upvotes at or near the top.

How quickly late answers gather upvotes seems to be a big factor. I'm also fairly certain the relationship between the upvoter and upvotee plays a big role. For example if you frequently upvote a persons answers, that upvote carries much less weight than if you upvote some random noob. And if you downvote somebody whom you frequently uvpote, that seems to count as a super downvote.

Also, not all answerers are treated equally. Answers from a person with an algorithmically good (or bad) track record start out higher (or lower) than for others. This happens to me on a couple of topics where I've had popular answers. I can make a stupid one line answer and it'll instantly leapfrog answers from randoms with up to ~10 upvotes each, sometimes more.


This seems quiet advanced and well thought through. And Quora works really well as far as I've seen. — I've also been thinking about letting the initial position of an answer depend on who wrote it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: