
How Karma Should Be Measured - tansey
http://www.nashcoding.com/2011/08/23/how-karma-should-be-measured/
======
buff-a
_We are effectively saying that each user should generate an average comment
score of 1 point per day to break even. Anything you make beyond 1 point is
considered an excess return for the day. We then simply take the Sharpe ratio
of the average daily excess returns. The resulting metric ensures that users
are incentivized to make consistent, high-quality submissions and punishes one
hit wonders and those who take a spray-and-pray approach._

In my opinion, we want people to keep quiet until they have something
meaningful to contribute to the conversation. The idea that making _any_ post
is a "return on investment" is nonsense.

~~~
alanfalcon
I completely agree.

I think the submission makes an incorrect assumption that valuable users
contribute regularly. Certainly regularity isn't baked in to any karma
measures on HN right now, and I'd expect that's be design rather than a flaw
to be corrected.

~~~
thesz
I second this and I should add that I often do a search on HN and upvote
comments in quite old discussions. Just as a gesture of gratitude.

Those OP measures don't account for that, I think.

------
raganwald
Given the propensity for users to downvote comments they disagree with and
upvote populist but empty rhetoric like anything that bashes Gruber, I'm not
sure that the karma formula should be HN's first concern.

Before we optimize how we weight karma, we should first ensure that points are
awarded for valuable behavior. Right now, I think it's measuring conformity.
Is that true? And if true, is that deireable?

~~~
pbreit
Aside: if you want automatic downvotes, suggest that Groupon might be viable.

Since downvoting erases, it should definitely not be used for simple
disagreement. It should be used on poor arguments, bad faith, pointless posts,
etc.

It's harder to suggest that upvoting should not be used for agreement.

Regardless, in both cases, I prefer that voting be used to evaluate quality.

~~~
po
You're absolutely right. Upvote and downvote should be separate dimensions not
two extremes of one dimension. The opposite of up/down vote is apathy.

By summing up and down votes, you are destroying data. There is no difference
between a controversial point and an idea nobody cares about.

~~~
llambda
It would be interesting to have no downvote option at all and instead rely on
flagging the comments that are effectively spam. The problem then becomes
relying on the moderators to remove or degrade such posts. But clearly the
upvote/downvote mechanism is broken as you've described...

~~~
po
If you remove downvote, people will start to use 'flag' to censor people and
you won't be able to trust it anymore. I think leaving the buttons but just
separating the calculations is enough:

mostly upvotes: keep it

mostly downvotes: drop it, its junk; the community has spoken

many mixed up/down votes: controversial topic, maybe delay the appearance of
reply links?

few votes: boring, maybe sort it lower

~~~
llambda
The problem is that downvotes are used as a way of disagreeing with a comment.
But what would be more productive and worthwhile is when there's differing
perspectives there's a dialogue. Downvotes should be used to filter spam and
other cruft but not differences of opinion.

------
alextp
I find this kind of thing funny, but misguided.

Karma in websites is not necessarily about accuratly reflecting some ground
truth upvote probability, mean chance of liking a comment, expected future
vote ratio of the commenter, etc. It's an incentive-design mechanism, in that
a karma system is good if and only if it leads to the desired behavior when
people use the website. When you ask yourself "how should I compare 5 upvotes
and 5 downvotes versus 1 upvote versus 1 downvote versus no action at all",
the answer is not weighting one of these situations higher/lower because it
will better approximate one of the criteria above in expectation or something
like this, but instead weighting these high/low depending on, for example, if
you want to encourage activity, agreement, controversy, etc.

Ideally, you should have some other behavioral metric in mind (say, mean
comment quality, top comment quality, bottom comment quality, engagement, etc)
and try to tune the voting system to maximize this quality over time. (this
tuning can either be done intuitively, as pg tries to do, or algorithmically,
using something like the technique behind Gmail's priority inbox or bandit
algorithms)

Do not "define away" a social problem with mathematics, use the mathematics to
help you solve the actual social problem.

~~~
tansey
_> Ideally, you should have some other behavioral metric in mind (say, mean
comment quality, top comment quality, bottom comment quality, engagement, etc)
_

That's precisely what my metric is intended to measure. The community derives
quality of individual comments. The article explains why measuring mean, top,
or bottom comment karma is a flawed approach. By measuring what we might call
an "enchanced Sharpe" [1], we encourage consistent, high-quality engagement.

I suppose I'm not really understanding what your issue is with the formula.
It's certainly not trying to "define away" some social problem-- how users
vote, which articles make it to the front page, etc., is an exercise left to
the reader. The only thing this metric is intended to do is replace the total
score shown in the top right with one that better reflects your contribution
to the community.

[1] Or just the Tansey Ratio, if it's not too presumptuous.

~~~
noahth
There are, however, potential problems with this formula, which I think is
what the thread parent was getting at.

For example, it is discouraging to log in to HN and see that your karma has
fallen since your last visit. Using your formula, this would be a very regular
occurrence for all but the most active users. Discouraged users are less
likely to continue attempting to engage and many would eventually give up
their attempts to maintain a decent karma. Then, while your measure may be
"more accurate" in some sense, it would easily be less effective for goals
that have more to do with engagement & participation than with notional
accuracy.

Not trying to poo-poo the spirit of your post though, because as someone
without much of a math background, this type of discussion is very
enlightening. Just trying to clarify that inaccuracy may very well be a
feature, not a bug.

~~~
tansey
That's an interesting insight.

I'm not sure that's true. Lots of people play games where their rankings shift
down if they don't constantly play. In a lot of cases, this actually increases
engagement. I think we would need to see some evidence here, but the null
hypothesis should be that user engagement does not change.

If one wanted to assume that it does negatively effect engagement, however,
then maybe an extended approach then is to show the ranking of the user's
ratio score? This is less of a judgement of their score and more of a pleasant
reminder that they are not contributing as much as others. Alternatively, you
could also set the risk-free-rate to 0 for both the comment and day, then only
update the scores periodically.

I suppose one could argue that it's not a competition and you shouldn't be
vying for a higher score, but then why show us our karma at all? Similarly, if
one believes that consistent contribution is not important to the community,
then it's a philosophical difference and we'll have to agree to disagree
there.

~~~
alextp
Part of my point is that even engagement might not be the best metric. For
example, here in hn, I'm far happier if people don't post than if they post
something trivial or uninteresting (your post and comments are interesting as
I think this is a discussion worth having).

------
tptacek
I will freely admit: I would not have paid attention to an article defining
the Sharpe Ratio without it having been framed as an HN karma problem. Nice
trick.

~~~
tansey
Thanks! I tried to make it as hacker-friendly as possible, hence the matching
pseudo-code for both formulas.

------
brudgers
The article is more or less missing the forest for the trees. What matters is
the quality of the comments, not the quality of the karma scoring algorithm.
Changing the algorithm might provide someone with more information about my
posting habits, but it doesn't provide me with any better editorial feedback -
that comes directly from upvotes and downvotes and the threads I choose to
comment on (a sincere reply to a personal dilemma in a low upvote AskHN thread
might get an upvote and might not - while well timed snark might get twenty-
five points...and the first link to a Macbook Pro refresh might hit 200.

Cumulative karma scores probably correlate to long term contribution, but they
don't meaningfully reflect daily contribution because some days the best
contribution I can make is to shut up and listen.

------
eridius
This metric has an attribute which is shared with average karma that I think
is extremely bad, which is that it punishes users for making comments that get
no upvotes. Or to put it another way, it punishes users for commenting on
older articles (as those comments are significantly less likely to garner
upvotes).

~~~
Stwerner
I think this brings up something that has bothered me a little bit about the
comment system here. I would really like some kind of notification system when
someone has replied to a post of mine because once a link falls off the front
page, the conversation pretty much dies unless I randomly check my comments
page.

~~~
Sukotto
That might be what the "notifo" field on your profile is for. They're some
kind of notification service (YC2010) but I don't see any docs on how to use
them on HN. (Not in the faq anyway and hnsearch just returns a bunch of
articles about them making plugins and getting investors)

~~~
sgentle
<http://notifo.com/hackernews>

The process is actually relatively painless and still seems to work.

------
pkteison
Proposal neglects the value of simplicity. If humans can't perceive the link
between cause and effect, they invent one, and you end up with cargo cults and
other irrational behavior.

"Total" and "Average" are -really- easy to explain to someone, and encourage
them to make good quality posts. Volatility adjusted Sharpe ratio doesn't
readily explain anything.

------
shalmanese
I've argued before that h-index is a superior form of karma to what currently
exists: [http://www.quora.com/Should-Quora-ever-consider-using-H-
inde...](http://www.quora.com/Should-Quora-ever-consider-using-H-index-for-
reputation/followers)

------
nck4222
I always thought the amount of times a comment gets read should be factored in
to the karma score for the comment.

For instance I'm posting this late in this threads life. If 10 people read it,
and 7 up vote it, that's a very high percentages of up votes. If I had posted
this 10 hours ago when the thread was created, I would have received a lot
more up votes even though the comment is the same. Sure, maybe the comment
isn't as valuable now because less people will read it. But I think the goal
should be to judge a comments value regardless of if the user was lucky enough
to find the thread when it was first created.

The simplest way to calculate this is to use page views of the thread after
the comment was posted, and maybe factor in how high on the page the comment
is to estimate how many people have read it.

------
Udo
HN should get rid of the downvote button, that would do infinitely more for
karma quality than a million stupid formula ideas. Since we can't see the
actual score of individual posts anymore there is no way to determine if they
are overrated anyway.

On top of that, it's semantically difficult for me to grok what the downvote
button is supposed to be used for. Should I downvote posts simply because I
don't agree with them? Should I downvote garbage instead of flagging it?

If pg got rid of that button, the meaning of karma would be cleared up, both
for posts as well as for users. Upvote comments you think are high quality (or
because you agree with them), flag things that are not supposed to be here,
and simply ignore all the rest.

It already works like that for stories, let's just go one step further and
treat posts the same way.

------
joshontheweb
Wow, seeing the pseudocode next to the mathematical representation is really
interesting to me. I don't consider myself very good at math but Im decent at
programming. This helped me see the closer relationship. I've heard of math
described as true programming, this drives it home.

~~~
muyuu
I find the notation in the formulae extremely misguiding. All variables are
user-dependant, yet for some reason the superscript x appears in every
variable in the formula instead of being factored out.

In math, superscripts usually mean exponentiation. The formulae are really
simple. Just take out the x completely and consider everything to be in the
context of a single user.

And that's not even considering the many drawbacks of these calculations. The
main one being that it encourages a lot of commenting when we don't
necessarily need everybody commenting all the time, but rather when they have
something useful to add to the discussion.

------
edtechdev
Yeah, it's essentially the mean karma (subtracting 1 from each karma) divided
by the standard deviation. Looks like you've re-invented the signal to noise
ratio formula, or the reciprocal of the coefficient of variation formula
(standard deviation divided by the mean), although they recommend not making
the mean be zero.

[http://en.wikipedia.org/wiki/Signal-to-
noise_ratio#Alternati...](http://en.wikipedia.org/wiki/Signal-to-
noise_ratio#Alternative_definition)

It's probably going to have problems. Wouldn't it punish someone who had
mostly good comments and posts, and occasionally gets one with a huge amount
of karma, versus someone who never got one with big values of karma.

~~~
tansey
_> Looks like you've re-invented the signal to noise ratio formula_

Well, I didn't, Sharpe did. :)

However, the important distinction is the notion of a risk-free rate of
return. In this case, it's (loosely) the 1 upvote you automatically get for
every comment; in finance, it's usually the return you get on US Treasuries
(around 1%, thought right now it's effectively 0%).

 _> Wouldn't it punish someone who had mostly good comments and posts, and
occasionally gets one with a huge amount of karma, versus someone who never
got one with big values of karma._

Assuming all else is equal (meaning it's N comments of karma K vs. N+1
comments where the first N are of karma K and the N+1th comment is something
huge relative to K)? No. This makes sense if you think about how the standard
deviation is derived, also note that I am capping the minimum standard
deviation at 1 so consistently hitting the same karma does not give you an
infinite score.

However, if both individuals have the same mean karma but one got it from
consistently scoring around that mean, and another got it from having one huge
upvoted comment and several smaller ones, then yes. But isn't that what we
want?

------
JamesBlair
This doesn't address the flaws of the karma system.

Karma goes astray as both an incentive and a measurement primarily because
karma inflation distorts incentives; more users and more voting dilutes the
impact of down voting, leading people to judge comment scores relative to
other comments which makes genuine trouble makers harder to spot in the
trends.

Hacker News appears to be the same as Reddit with regard to how people treat
karma, which seems to have been encouraged by the adoption of private karma:
thermostat voting has gone down (but I remember offhand a comment by pg that
this has not influenced scores, if this is correct the only significance is
that down votes are now a much better data point should pg ever want a more
sophisticated ranking system), but it has exacerbated comment relativity,
doubly so because rather than using comment scores to _only_ order comments,
HN also obscures posts with a score lower than one.

Disclosure: Community moderation is something I like to tinker with, and my
own experiments have increasingly lead me away from karma. But there is no
doubt that it is one of the most effective forms of soft moderation we have
today, though I see little reason to believe that it can effectively scale
past tens of thousands of users.

~~~
roel_v
Correction:

"Hacker News, _since about two years or so when there was a large influx from
Reddit and Digg people_ , appears to be the same as Reddit with regard to how
people treat karma"

------
Alex3917
Karma is there to let you know if others find value in your comments or if
you're being an asshole, it's not really meant as a longterm way of ranking
users.

------
okal
I often hold back when I don't feel like my commentary adds value to the
conversation, regardless of how strongly I feel about a subject. I fear such a
system may encourage commenting for it's own sake - as long as I keep it
palatable, I shouldn't get any downvotes and I'd keep my default "1" (low
volatility), unless I say something downright stupid. Another issue - more
with the system itself, independent of any formula - lies in the very
subjective nature of voting. I personally would not likely upvote a comment I
disagree with, no matter how carefully thought out and presented. But I
probably wouldn't downvote it either, unless I felt it was made in bad taste.
Perhaps a more effective system would only take into account downvotes cast,
since standards of what constitute poor form seem, IMO, to be fairly
consistent around here. Just a thought you may want to look into.

Also, could be just me, but I found your notation a little confusing, where
you've used x as a superscript.

------
yason
I think it's more fruitful to consider where you can get karma and on what
basis than how the actual karma sources are computed into some final number.

The most important aspect of HN is the culture which we want to preserve. So
users should get new points/votes/karma when they reinforce the culture and
lose points/votes/karma when they break the culture apart.

Voting is the only means to change the karmic dynamic of users. So voting
should be reserved to the active old timers in greater proportion and to the
newcomers in smaller proportion.

For example, newbies shouldn't be able to vote at all below a karma threshold
and when they are, their votes should have a fractional effect compared to the
vote of an old timer. Maybe something like if one 4000-karma guy downvotes a
comment it would need to take 100 of 40-karma guys to upvote it back to zero
but only 10 of 400-karma guys. If _all_ HN users voted on an item, the weight
of each vote would be a single user's karma / total sum of karma of all HN
users. Of course, only a subset of users ever vote on a single item so the
total sum should be limited to a subset of users, such as those expressing
interest in voting for that item or those having voted for that submission or
any of its other comments. As long as newcomers can't come and upvote each
others comments to gain karma without the approval of the high-ranking old
timers. Therefore, those newcomers who can already vote would contribute
fractional karma points witht their votes.

Those old timers who define the culture can perhaps be identified by their
karma, as recursive as it sounds. At some point the karma reaches some natural
limitation as it's much more difficult to obtain tens of thousands of karma
points than thousands. So eventually the most persistent ones would gradually
join the higher ranks because it would be really hard to escape.

But the computation of karma probably doesn't matter much. Just adding up
votes will sufficiently track the relative ranking of each user.

~~~
stofu
What you describe sound a lot like PageRank (PeerRank)?

------
mixmax
while we're talking about karma can we please have points on comments back?

------
rnadna
It might help to have a multi-dimensional system in which karma is measured by
(k1, k2, k3, ..., kn). Let the reader specify thresholds for each, as they see
fit. For example, k1 could relate to humor, k2 to historical insight, etc. If
you're in the mood for a laugh, you'd increase your k1 weighting as you view
the system. As for measurement, this could be partly by the votes of others
who have high karma in the category.

Not that this matters much. I think people just stop reading various websites
when they judge the site, overall, to be boring or useless. For example, if
people find stackoverflow to be more informative than (insert name here) then
stackoverflow "wins". Whatever winning means.

------
StavrosK
From a cursory glance at the formula, it looks like it can't be computed in a
rolling fashion (as new comments come in), so you have to look at the entire
history of the user to calculate the karma. Is that correct? Looks like a
showstopper to me...

------
Sniffnoy
I'm getting lots of "latex path not specified" errors.

------
ulisesroche
I think perhaps comment points should count for more than the ones you get for
story submissions, have they tried that in the past before?

------
indec
Is there a good way to deal with the case of sigma = 0 (i.e. all posts have
the same number of votes)?

------
tokenadult
The submitted article is interesting, and it is especially reader-friendly
that the author first shows a simple karma model in both mathematical notation
and pseudocode, and then shows a more refined karma model each way. I think I
agree with buff-a's comment from 12 hours ago and the several participants
replying to buff-a (the top comment in this thread as I write this comment)
that the best behavior to reward is a commenter waiting until the commenter
has something thoughtful to say.

The other issue that has come up in multiple comments already posted is the
role of downvoting. Downvoting is a pet issue on HN--I have seen more than a
dozen full threads about downvoting and scores of comments about downvoting in
other threads in my 1010 days of registered participation on HN. Before I came
on board, 1284 days ago, pg (the site founder) wrote, "I think it's ok to use
the up and down arrows to express agreement. Obviously the uparrows aren't
only for applauding politeness, so it seems reasonable that the downarrows
aren't only for booing rudeness."

<http://news.ycombinator.com/item?id=117171>

Although I would agree with putting in a two-dimensional voting/flagging
system (with one dimension being agreement with the statement(s) in the post,
and the other dimension being a judgment of how much the post contributes to
the community), while such a bivariate system is not yet implemented, it makes
sense to downvote comments without further follow-up comment if they add
nothing to the posted discussion as it is already posted, in light of the
submitted article or question opening the thread. No one should be obligated
to comment on a useless post before downvoting it. It is the responsibility of
each commenter (as several commenters here implicitly agree) to make the case
for his or her own comment being visible by what is said that is new and
helpful in the comment.

When pg opened a thread 142 days ago with the question "Ask HN: How to stave
off decline of HN?"

<http://news.ycombinator.com/item?id=2403696>

he wrote, "The problem has several components: comments that are (a) mean
and/or (b) dumb that (c) get massively upvoted."

That's still the key issue. It doesn't do any reader of HN any good if a
comment that is dumb gets net upvotes. Nor does it do any good if a mean
comment is upvoted--that causes active harm to the community. If participant
behavior brings about higher scores for good comments, and lower scores for
mean, dumb, or other bad comments, that is helpful to all readers of HN.

Some users who are worried about downvotes are worried also about HN hivemind
or groupthink. It may be that there are unexamined opinions without factual
warrant that are held by the majority of HN participants--that is to be
expected on the basis of psychological research.

[http://www.project-
syndicate.org/commentary/stanovich1/Engli...](http://www.project-
syndicate.org/commentary/stanovich1/English)

The thing to do about groupthink is to dare to comment, karma be damned, and
to respond with thoughtful, informative comments that challenge majority
opinions. I have also thought that it might be useful for veteran participants
here on HN who have a Web presence to post a Web page or blog post discussing
what they see as the main hivemind or groupthink issues on HN, with citations
to good sources of information on those issues, and then to put links to such
online discussions in their user profiles. That way, if a user is a contrarian
on an issue that a lot of HN participants care about, the user can invite all
other HN participants to look up facts on the issue. That might help raise the
level of discourse here.

After being here 1010 days and seeing a few rule changes and MANY discussions
of upvoting, downvoting, and karma rules, I think the main thing to do here to
improve the quality of discussion is to UPVOTE more. Upvote a person who asks
a follow-up question like, "Do you have any sources to back up that
statement?" (I often see such comments grayed out, indicating that they have
been downvoted, but comments that ask for more verifiable information are
nearly always helpful.) Upvote a person who says "Thank you" out loud, and
silently upvote a comment that you think deserves thanks for politeness or
thoughtfulness. Upvote a comment that provides a link to an online resource
you didn't know about before. Upvote a comment that apologizes for a gaffe or
that admits a factual mistake. Upvote that which is good, and there will be
fewer problems with inaccurate signaling here.

Feel free to review the site guidelines

<http://ycombinator.com/newsguidelines.html>

and the site welcome message

<http://ycombinator.com/newswelcome.html>

for guidance on what is desired here and thus guidance on how to vote.

~~~
sunir
The key point: everyone needs to up vote good comments more.

Garbage in, garbage out. If you don't upvote good stuff, other people will
upvote bad stuff.

P.S. Sorry, I never tl;dr but it's important to highlight this key point.

------
donnaware
Seriously, humor should count for something, it is actually an important
method of comentary.

------
grandalf
What about the spray and pray humor niche?

