How Karma Should Be Measured (nashcoding.com)
211 points by tansey 2158 days ago | 67 comments

We are effectively saying that each user should generate an average comment score of 1 point per day to break even. Anything you make beyond 1 point is considered an excess return for the day. We then simply take the Sharpe ratio of the average daily excess returns. The resulting metric ensures that users are incentivized to make consistent, high-quality submissions and punishes one hit wonders and those who take a spray-and-pray approach.

In my opinion, we want people to keep quiet until they have something meaningful to contribute to the conversation. The idea that making any post is a "return on investment" is nonsense.

Agreed. I lurk...a lot.

I rarely open my mouth. There was a short period where I looked to snipe with some witty comment, but I quickly out grew that.

I've made one post and I am rarely finding things pertinent to the tribe that haven't already been posted, so instead I look for new interesting content here and only open my mouth when I think I have something valuable to add to an existing discussion.

I think something like the proposed solution would leave a user like me out in the cold. I am by no means saying this objection should kill the idea, you gotta break a few eggs, and if I happen to be one of them, so be it. It should be given some thought though.

I'm in the same boat. I've noticed something interesting though. Back when I was commenting a lot more, I cared a lot more about my karma points. Now that I'm more of a lurker, I realize I don't keep track of my karma at all, nor do I really care about it anymore. So maybe actually, I don't care either way... but people like you and me do exist in terms of signal to comment volume ratio (at least, I hope my ratio's high :D).

Likewise with lurking-until-valuable. If karma were converted to an incentive for daily engagement, that would be a strong disincentive for me to even log in. It changes the relationship from a friendly "it's okay if you contribute, however often you feel like" to "be here and be loud, or be screwed".

(If I were changing HN, I'd love an option to replace the karma score with a reply inbox, so I don't have to poll my threads page. I'm trying to break myself of my post-and-run habits, of my general fear of defending my ideas, and of my overvaluation of karma.)

I completely agree.

I think the submission makes an incorrect assumption that valuable users contribute regularly. Certainly regularity isn't baked in to any karma measures on HN right now, and I'd expect that's be design rather than a flaw to be corrected.

I second this and I should add that I often do a search on HN and upvote comments in quite old discussions. Just as a gesture of gratitude.

Those OP measures don't account for that, I think.

I have to disagree...somewhat.

Since Reddit is open source, I could create a copy right now that would have no content on it. Lets say I populated it with all of the old, highly rated posts. This would be valuable as a source of high quality content, but has less value to me because it lacks current events and interaction with other users.

I said 'somewhat' because there clearly is a value in having high quality content, so I see it as a trade off. I think there's a market for both high-quality/low-volume sites as well as lower-quality/higher-volume sites.

Couldn't you solve that by increasing the "break even" score to whatever we think an "average" post should get? Spray a bunch of unvoted responses (technically, those are worth zero points of karma, I believe) and the average post is worth two, you will kill your ratio pretty quickly.

As far as I can see, the formula gives no penalty for commenting infrequently. There is a penalty for commenting highly once and then poorly many times as compared to what you would get with an average.

I believe there is under "A Better Metric"

dx is the number of days the user is registered. If the user made one post and then stopped, his karma score would shrink over time as dx approaches infinity while the numerator remains constant.

The numerator technically has an extra term for each successive day, but it looks like that additional term is 0 if no comments were made on that day.

It's worse. The additional term is undefined (it's 0/0-1) for days with no comments.

I figured that was just a special case they didn't explain directly. The equation follows their description if you treat the 0-comment days as a 0 for the term in the parenthesis: people who post infrequently will see a shrinking karma.

Yes, it does. At the end, the score is divided by the age of the account in days. Essentially, this formula is a treadmill forcing users post as much as they can just to keep their karma. It's a very bad idea that has no upside.

Given the propensity for users to downvote comments they disagree with and upvote populist but empty rhetoric like anything that bashes Gruber, I'm not sure that the karma formula should be HN's first concern.

Before we optimize how we weight karma, we should first ensure that points are awarded for valuable behavior. Right now, I think it's measuring conformity. Is that true? And if true, is that deireable?

Aside: if you want automatic downvotes, suggest that Groupon might be viable.

Since downvoting erases, it should definitely not be used for simple disagreement. It should be used on poor arguments, bad faith, pointless posts, etc.

It's harder to suggest that upvoting should not be used for agreement.

Regardless, in both cases, I prefer that voting be used to evaluate quality.

You're absolutely right. Upvote and downvote should be separate dimensions not two extremes of one dimension. The opposite of up/down vote is apathy.

By summing up and down votes, you are destroying data. There is no difference between a controversial point and an idea nobody cares about.

Excellent point.

I use a browser extension to show up and down votes on Reddit and I really miss that info here on HN. Not that we get to see comment scores anymore but I still miss it even on my own comments.

It would be interesting to have no downvote option at all and instead rely on flagging the comments that are effectively spam. The problem then becomes relying on the moderators to remove or degrade such posts. But clearly the upvote/downvote mechanism is broken as you've described...

If you remove downvote, people will start to use 'flag' to censor people and you won't be able to trust it anymore. I think leaving the buttons but just separating the calculations is enough:

mostly upvotes: keep it

mostly downvotes: drop it, its junk; the community has spoken

many mixed up/down votes: controversial topic, maybe delay the appearance of reply links?

few votes: boring, maybe sort it lower

The problem is that downvotes are used as a way of disagreeing with a comment. But what would be more productive and worthwhile is when there's differing perspectives there's a dialogue. Downvotes should be used to filter spam and other cruft but not differences of opinion.

I still stand by my point, that symmetrical actions should have symmetrical meaning. If upvote is used for agreement, downvote should be used for disagreement. Would you enjoy if your right arrow were to move cursor to the right, and the left arrow to move it up? If you don't want downvotes as the sign of disagreement then remove that arrow altogether and leave just "flag". On the other hand, I don't really get how is the system where showing only agreement is allowed is any better.

The current problem is that while both arrows have a similar effect on positioning, down voting also causes the post to sort of disappear, which is ok for lousy posts but tragic for good, but unpopular posts.

Maybe the right karma metric is whether the poster has added anything new to the conversation. Downvote for repetition and zero new information.

High quality, contentious comments should be front and center in the discussion. People also seem to like expressing their opinion by clicking, which acts counter to this goal.

I find myself wishing there were two values: agree/disagree and valuable/useless. Not because I think we particularly care about the agree/disagree ratio (we can form our own opinions, and comments are always more interesting valuable in this regard anyways), but because then it would absorb opinion votes, removing them from the valuable/useless score that helps determine where a comment is located and how it is colored.

> we should first ensure that points are awarded for valuable behavior

Do url submissions deserve points: http://news.ycombinator.com/submitlink?u=http%3A%2F%2Fnews.y...?

I think that this may drive certain undesirable behavior (e.g. duplicate story submissions, down-vote capability for new-ish accounts). On the other hand, points derived from the comments you make are probably a better indicator of the quality of your contribution to HN.

I don't know how you could possibly tell. Upvote/downvote isn't a very fine-grained tool to express one's opinion with.

I find this kind of thing funny, but misguided.

Karma in websites is not necessarily about accuratly reflecting some ground truth upvote probability, mean chance of liking a comment, expected future vote ratio of the commenter, etc. It's an incentive-design mechanism, in that a karma system is good if and only if it leads to the desired behavior when people use the website. When you ask yourself "how should I compare 5 upvotes and 5 downvotes versus 1 upvote versus 1 downvote versus no action at all", the answer is not weighting one of these situations higher/lower because it will better approximate one of the criteria above in expectation or something like this, but instead weighting these high/low depending on, for example, if you want to encourage activity, agreement, controversy, etc.

Ideally, you should have some other behavioral metric in mind (say, mean comment quality, top comment quality, bottom comment quality, engagement, etc) and try to tune the voting system to maximize this quality over time. (this tuning can either be done intuitively, as pg tries to do, or algorithmically, using something like the technique behind Gmail's priority inbox or bandit algorithms)

Do not "define away" a social problem with mathematics, use the mathematics to help you solve the actual social problem.

>Ideally, you should have some other behavioral metric in mind (say, mean comment quality, top comment quality, bottom comment quality, engagement, etc)

That's precisely what my metric is intended to measure. The community derives quality of individual comments. The article explains why measuring mean, top, or bottom comment karma is a flawed approach. By measuring what we might call an "enchanced Sharpe" [1], we encourage consistent, high-quality engagement.

I suppose I'm not really understanding what your issue is with the formula. It's certainly not trying to "define away" some social problem-- how users vote, which articles make it to the front page, etc., is an exercise left to the reader. The only thing this metric is intended to do is replace the total score shown in the top right with one that better reflects your contribution to the community.

[1] Or just the Tansey Ratio, if it's not too presumptuous.

There are, however, potential problems with this formula, which I think is what the thread parent was getting at.

For example, it is discouraging to log in to HN and see that your karma has fallen since your last visit. Using your formula, this would be a very regular occurrence for all but the most active users. Discouraged users are less likely to continue attempting to engage and many would eventually give up their attempts to maintain a decent karma. Then, while your measure may be "more accurate" in some sense, it would easily be less effective for goals that have more to do with engagement & participation than with notional accuracy.

Not trying to poo-poo the spirit of your post though, because as someone without much of a math background, this type of discussion is very enlightening. Just trying to clarify that inaccuracy may very well be a feature, not a bug.

Alternatively (and equally bad), upon seeing the karma has fallen, they might strive to post anything that won't be down-voted, but doesn't really add anything either, in the hopes of at least getting 1 vote to stem the tide.

I can see this punishing infrequent, but quality posters and decreasing the signal-to-noise ratio.

Sadly, somethings that work out just fine in math, don't mesh that neatly with human behavior.

That's an interesting insight.

I'm not sure that's true. Lots of people play games where their rankings shift down if they don't constantly play. In a lot of cases, this actually increases engagement. I think we would need to see some evidence here, but the null hypothesis should be that user engagement does not change.

If one wanted to assume that it does negatively effect engagement, however, then maybe an extended approach then is to show the ranking of the user's ratio score? This is less of a judgement of their score and more of a pleasant reminder that they are not contributing as much as others. Alternatively, you could also set the risk-free-rate to 0 for both the comment and day, then only update the scores periodically.

I suppose one could argue that it's not a competition and you shouldn't be vying for a higher score, but then why show us our karma at all? Similarly, if one believes that consistent contribution is not important to the community, then it's a philosophical difference and we'll have to agree to disagree there.

Part of my point is that even engagement might not be the best metric. For example, here in hn, I'm far happier if people don't post than if they post something trivial or uninteresting (your post and comments are interesting as I think this is a discussion worth having).

> For example, it is discouraging to log in to HN and see that your karma has fallen since your last visit.

To counter this, all we would need to do is define what rate of karma inflation is acceptable, then adjust all displayed karma ratings to compensate.

I will freely admit: I would not have paid attention to an article defining the Sharpe Ratio without it having been framed as an HN karma problem. Nice trick.

Thanks! I tried to make it as hacker-friendly as possible, hence the matching pseudo-code for both formulas.

The article is more or less missing the forest for the trees. What matters is the quality of the comments, not the quality of the karma scoring algorithm. Changing the algorithm might provide someone with more information about my posting habits, but it doesn't provide me with any better editorial feedback - that comes directly from upvotes and downvotes and the threads I choose to comment on (a sincere reply to a personal dilemma in a low upvote AskHN thread might get an upvote and might not - while well timed snark might get twenty-five points...and the first link to a Macbook Pro refresh might hit 200.

Cumulative karma scores probably correlate to long term contribution, but they don't meaningfully reflect daily contribution because some days the best contribution I can make is to shut up and listen.

This metric has an attribute which is shared with average karma that I think is extremely bad, which is that it punishes users for making comments that get no upvotes. Or to put it another way, it punishes users for commenting on older articles (as those comments are significantly less likely to garner upvotes).

I agree that average karma is a very poor metric, and I found that my usage of HN changed drastically when pg started showing it more prominently. One optimizes for average karma by not commenting on stories that are older or less popular, as those are likely to produce a 1 or 2 rating due to lack of eyeballs. However such comments are often interesting and useful contributions, and it hurts HN to discourage them.

A possible solution is to scale by the number of people who look at the comment, although this might be difficult to do well. You could probably get better results by estimating a regression containing the following variables: age of post, score of post, number of comments, average score of comments in the thread, and the depth of the tree in which the comment was posted to get a good determination of how many points an average comment in that situation would get.

I think this brings up something that has bothered me a little bit about the comment system here. I would really like some kind of notification system when someone has replied to a post of mine because once a link falls off the front page, the conversation pretty much dies unless I randomly check my comments page.

That might be what the "notifo" field on your profile is for. They're some kind of notification service (YC2010) but I don't see any docs on how to use them on HN. (Not in the faq anyway and hnsearch just returns a bunch of articles about them making plugins and getting investors)

Wow, nice find, never even thought to dig in to that. If you check out the notifo services, they have the information on setting everything up.


The process is actually relatively painless and still seems to work.

Proposal neglects the value of simplicity. If humans can't perceive the link between cause and effect, they invent one, and you end up with cargo cults and other irrational behavior.

"Total" and "Average" are -really- easy to explain to someone, and encourage them to make good quality posts. Volatility adjusted Sharpe ratio doesn't readily explain anything.

I've argued before that h-index is a superior form of karma to what currently exists: http://www.quora.com/Should-Quora-ever-consider-using-H-inde...

I always thought the amount of times a comment gets read should be factored in to the karma score for the comment.

For instance I'm posting this late in this threads life. If 10 people read it, and 7 up vote it, that's a very high percentages of up votes. If I had posted this 10 hours ago when the thread was created, I would have received a lot more up votes even though the comment is the same. Sure, maybe the comment isn't as valuable now because less people will read it. But I think the goal should be to judge a comments value regardless of if the user was lucky enough to find the thread when it was first created.

The simplest way to calculate this is to use page views of the thread after the comment was posted, and maybe factor in how high on the page the comment is to estimate how many people have read it.

HN should get rid of the downvote button, that would do infinitely more for karma quality than a million stupid formula ideas. Since we can't see the actual score of individual posts anymore there is no way to determine if they are overrated anyway.

On top of that, it's semantically difficult for me to grok what the downvote button is supposed to be used for. Should I downvote posts simply because I don't agree with them? Should I downvote garbage instead of flagging it?

If pg got rid of that button, the meaning of karma would be cleared up, both for posts as well as for users. Upvote comments you think are high quality (or because you agree with them), flag things that are not supposed to be here, and simply ignore all the rest.

It already works like that for stories, let's just go one step further and treat posts the same way.

Wow, seeing the pseudocode next to the mathematical representation is really interesting to me. I don't consider myself very good at math but Im decent at programming. This helped me see the closer relationship. I've heard of math described as true programming, this drives it home.

I find the notation in the formulae extremely misguiding. All variables are user-dependant, yet for some reason the superscript x appears in every variable in the formula instead of being factored out.

In math, superscripts usually mean exponentiation. The formulae are really simple. Just take out the x completely and consider everything to be in the context of a single user.

And that's not even considering the many drawbacks of these calculations. The main one being that it encourages a lot of commenting when we don't necessarily need everybody commenting all the time, but rather when they have something useful to add to the discussion.

Yeah, it's essentially the mean karma (subtracting 1 from each karma) divided by the standard deviation. Looks like you've re-invented the signal to noise ratio formula, or the reciprocal of the coefficient of variation formula (standard deviation divided by the mean), although they recommend not making the mean be zero.


It's probably going to have problems. Wouldn't it punish someone who had mostly good comments and posts, and occasionally gets one with a huge amount of karma, versus someone who never got one with big values of karma.

>Looks like you've re-invented the signal to noise ratio formula

Well, I didn't, Sharpe did. :)

However, the important distinction is the notion of a risk-free rate of return. In this case, it's (loosely) the 1 upvote you automatically get for every comment; in finance, it's usually the return you get on US Treasuries (around 1%, thought right now it's effectively 0%).

>Wouldn't it punish someone who had mostly good comments and posts, and occasionally gets one with a huge amount of karma, versus someone who never got one with big values of karma.

Assuming all else is equal (meaning it's N comments of karma K vs. N+1 comments where the first N are of karma K and the N+1th comment is something huge relative to K)? No. This makes sense if you think about how the standard deviation is derived, also note that I am capping the minimum standard deviation at 1 so consistently hitting the same karma does not give you an infinite score.

However, if both individuals have the same mean karma but one got it from consistently scoring around that mean, and another got it from having one huge upvoted comment and several smaller ones, then yes. But isn't that what we want?

Can you address that by just discarding outliers?

This doesn't address the flaws of the karma system.

Karma goes astray as both an incentive and a measurement primarily because karma inflation distorts incentives; more users and more voting dilutes the impact of down voting, leading people to judge comment scores relative to other comments which makes genuine trouble makers harder to spot in the trends.

Hacker News appears to be the same as Reddit with regard to how people treat karma, which seems to have been encouraged by the adoption of private karma: thermostat voting has gone down (but I remember offhand a comment by pg that this has not influenced scores, if this is correct the only significance is that down votes are now a much better data point should pg ever want a more sophisticated ranking system), but it has exacerbated comment relativity, doubly so because rather than using comment scores to only order comments, HN also obscures posts with a score lower than one.

Disclosure: Community moderation is something I like to tinker with, and my own experiments have increasingly lead me away from karma. But there is no doubt that it is one of the most effective forms of soft moderation we have today, though I see little reason to believe that it can effectively scale past tens of thousands of users.


"Hacker News, since about two years or so when there was a large influx from Reddit and Digg people, appears to be the same as Reddit with regard to how people treat karma"

Karma is there to let you know if others find value in your comments or if you're being an asshole, it's not really meant as a longterm way of ranking users.

I often hold back when I don't feel like my commentary adds value to the conversation, regardless of how strongly I feel about a subject. I fear such a system may encourage commenting for it's own sake - as long as I keep it palatable, I shouldn't get any downvotes and I'd keep my default "1" (low volatility), unless I say something downright stupid. Another issue - more with the system itself, independent of any formula - lies in the very subjective nature of voting. I personally would not likely upvote a comment I disagree with, no matter how carefully thought out and presented. But I probably wouldn't downvote it either, unless I felt it was made in bad taste. Perhaps a more effective system would only take into account downvotes cast, since standards of what constitute poor form seem, IMO, to be fairly consistent around here. Just a thought you may want to look into.

Also, could be just me, but I found your notation a little confusing, where you've used x as a superscript.

I think it's more fruitful to consider where you can get karma and on what basis than how the actual karma sources are computed into some final number.

The most important aspect of HN is the culture which we want to preserve. So users should get new points/votes/karma when they reinforce the culture and lose points/votes/karma when they break the culture apart.

Voting is the only means to change the karmic dynamic of users. So voting should be reserved to the active old timers in greater proportion and to the newcomers in smaller proportion.

For example, newbies shouldn't be able to vote at all below a karma threshold and when they are, their votes should have a fractional effect compared to the vote of an old timer. Maybe something like if one 4000-karma guy downvotes a comment it would need to take 100 of 40-karma guys to upvote it back to zero but only 10 of 400-karma guys. If all HN users voted on an item, the weight of each vote would be a single user's karma / total sum of karma of all HN users. Of course, only a subset of users ever vote on a single item so the total sum should be limited to a subset of users, such as those expressing interest in voting for that item or those having voted for that submission or any of its other comments. As long as newcomers can't come and upvote each others comments to gain karma without the approval of the high-ranking old timers. Therefore, those newcomers who can already vote would contribute fractional karma points witht their votes.

Those old timers who define the culture can perhaps be identified by their karma, as recursive as it sounds. At some point the karma reaches some natural limitation as it's much more difficult to obtain tens of thousands of karma points than thousands. So eventually the most persistent ones would gradually join the higher ranks because it would be really hard to escape.

But the computation of karma probably doesn't matter much. Just adding up votes will sufficiently track the relative ranking of each user.

What you describe sound a lot like PageRank (PeerRank)?

while we're talking about karma can we please have points on comments back?

It might help to have a multi-dimensional system in which karma is measured by (k1, k2, k3, ..., kn). Let the reader specify thresholds for each, as they see fit. For example, k1 could relate to humor, k2 to historical insight, etc. If you're in the mood for a laugh, you'd increase your k1 weighting as you view the system. As for measurement, this could be partly by the votes of others who have high karma in the category.

Not that this matters much. I think people just stop reading various websites when they judge the site, overall, to be boring or useless. For example, if people find stackoverflow to be more informative than (insert name here) then stackoverflow "wins". Whatever winning means.

From a cursory glance at the formula, it looks like it can't be computed in a rolling fashion (as new comments come in), so you have to look at the entire history of the user to calculate the karma. Is that correct? Looks like a showstopper to me...

I'm getting lots of "latex path not specified" errors.

I think perhaps comment points should count for more than the ones you get for story submissions, have they tried that in the past before?

Is there a good way to deal with the case of sigma = 0 (i.e. all posts have the same number of votes)?

The submitted article is interesting, and it is especially reader-friendly that the author first shows a simple karma model in both mathematical notation and pseudocode, and then shows a more refined karma model each way. I think I agree with buff-a's comment from 12 hours ago and the several participants replying to buff-a (the top comment in this thread as I write this comment) that the best behavior to reward is a commenter waiting until the commenter has something thoughtful to say.

The other issue that has come up in multiple comments already posted is the role of downvoting. Downvoting is a pet issue on HN--I have seen more than a dozen full threads about downvoting and scores of comments about downvoting in other threads in my 1010 days of registered participation on HN. Before I came on board, 1284 days ago, pg (the site founder) wrote, "I think it's ok to use the up and down arrows to express agreement. Obviously the uparrows aren't only for applauding politeness, so it seems reasonable that the downarrows aren't only for booing rudeness."


Although I would agree with putting in a two-dimensional voting/flagging system (with one dimension being agreement with the statement(s) in the post, and the other dimension being a judgment of how much the post contributes to the community), while such a bivariate system is not yet implemented, it makes sense to downvote comments without further follow-up comment if they add nothing to the posted discussion as it is already posted, in light of the submitted article or question opening the thread. No one should be obligated to comment on a useless post before downvoting it. It is the responsibility of each commenter (as several commenters here implicitly agree) to make the case for his or her own comment being visible by what is said that is new and helpful in the comment.

When pg opened a thread 142 days ago with the question "Ask HN: How to stave off decline of HN?"


he wrote, "The problem has several components: comments that are (a) mean and/or (b) dumb that (c) get massively upvoted."

That's still the key issue. It doesn't do any reader of HN any good if a comment that is dumb gets net upvotes. Nor does it do any good if a mean comment is upvoted--that causes active harm to the community. If participant behavior brings about higher scores for good comments, and lower scores for mean, dumb, or other bad comments, that is helpful to all readers of HN.

Some users who are worried about downvotes are worried also about HN hivemind or groupthink. It may be that there are unexamined opinions without factual warrant that are held by the majority of HN participants--that is to be expected on the basis of psychological research.


The thing to do about groupthink is to dare to comment, karma be damned, and to respond with thoughtful, informative comments that challenge majority opinions. I have also thought that it might be useful for veteran participants here on HN who have a Web presence to post a Web page or blog post discussing what they see as the main hivemind or groupthink issues on HN, with citations to good sources of information on those issues, and then to put links to such online discussions in their user profiles. That way, if a user is a contrarian on an issue that a lot of HN participants care about, the user can invite all other HN participants to look up facts on the issue. That might help raise the level of discourse here.

After being here 1010 days and seeing a few rule changes and MANY discussions of upvoting, downvoting, and karma rules, I think the main thing to do here to improve the quality of discussion is to UPVOTE more. Upvote a person who asks a follow-up question like, "Do you have any sources to back up that statement?" (I often see such comments grayed out, indicating that they have been downvoted, but comments that ask for more verifiable information are nearly always helpful.) Upvote a person who says "Thank you" out loud, and silently upvote a comment that you think deserves thanks for politeness or thoughtfulness. Upvote a comment that provides a link to an online resource you didn't know about before. Upvote a comment that apologizes for a gaffe or that admits a factual mistake. Upvote that which is good, and there will be fewer problems with inaccurate signaling here.

Feel free to review the site guidelines


and the site welcome message


for guidance on what is desired here and thus guidance on how to vote.

The key point: everyone needs to up vote good comments more.

Garbage in, garbage out. If you don't upvote good stuff, other people will upvote bad stuff.

P.S. Sorry, I never tl;dr but it's important to highlight this key point.

Seriously, humor should count for something, it is actually an important method of comentary.

What about the spray and pray humor niche?

