

Distributed karma: an idea for fixing recommendation systems - lkozma
http://lkozma.net/idea.php?dea=16

======
antirez
I and my cofounder used the same idea of users as nodes of a graph for
<http://oknotizie.alice.it> (a system similar to reddit for italian speaking
users) in order to indentify groups of spammers and users with very strange
behaviour. This information is used in order to decrease the weight of votes
in the system for this _bad users_.

Our experience is that while this works very well against spam it does not
stop the quality degradation that happens every time the community gets larger
because the most active users tend to become friends and stop voting the news
just for their quality.

------
portLAN
The problem with giving someone a default high score based on what they've
done in the past is it devolves into a type of "appeal to authority" fallacy
where deference is because of who someone _is_ , as opposed to what someone is
_now saying_. Trivial, offhand remarks by an authority figure are given
greater weight than insightful, useful posts by an unknown. Even worse, out-
and-out mistakes by the highly karmic come with an official stamp of karmic
approval -- the whole system is prejudicial by design.

If your goal is to create a system that _reflects_ people's typical judgement,
then this works, because people make all sorts of logical errors. If, however,
you are aiming for a meritocracy, judging each post on its own worth without
regard to _who_ said it (except when identity is actually applicable), the
correct approach is to have a swarm of AIs reading everything and assigning
points based on content. [1]

[1] Implementing the correct approach is left as an exercise for the reader.

~~~
lkozma
What you say is true for comments, I agree that every contribution should be
judged on its own merit. For filtering/ordering submissions though, such as on
the reco-page on reddit, a web of trust is quite natural. Since we have
limited time, we can't possibly read everything, so we might as well prefer
content from people we trust. Same thing in real life, when you read the next
book of your favorite author, instead of one by a complete stranger. If the
scores are distributed, the authority figures aren't necessarily the same for
everyone.

~~~
portLAN
Articles should also be read and ranked based on content instead of by
submitter. With books, you miss out on a lot by sticking with the same author,
as there are so many other worthy books you never become aware of; people's
usual habits can be improved upon.

------
amichail
Just out of curiosity, how much of a literature search are YC startups
expected to do? In particular, how much of a literature search did reddit do?

~~~
pg
There's no rule about it. It's good to know what you're talking about.

------
run4yourlives
Two points:

It seems a question of scope is in order here; what exactly is the purpose of
a recommendation system?

Is it a system to forward to users that which they want to see, or is it a
system that suggests various opinion of high quality to users? Often, I think
we're trying to construct the latter by designing the former.

Secondly, Reddit's system works perfectly to forward to the user what they
would like based on what's been submitted - the issue being that the average
quality of submission has lowered over time. Even if the system gives you the
best POS, you're still stuck with a POS.

The solution: Scaling is the problem, so stop/limit scaling.

We're not seeing a degradation of quality, we're seeing a better reflection of
the average opinion - the larger the crowd, the lower the average. We're
trying to enforce an expectation of quality that is held by a few on the many;
this is impossible! The many don't hold the same regard or opinions as the
few.

You can tweak things a little, perhaps come up with systems that use more cpu
power than the space navigation does, but the end result will be the same:
average opinion wins - exactly what you should expect.

Average opinion isn't what we want though, is it?

------
ashu
This is of relevance:

Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman,
"SybilGuard: Defending Against Sybil Attacks via Social Networks." Proceedings
of ACM SIGCOMM Conference , September 2006

<http://www.cs.cmu.edu/~yhf/sybilguard-sigcomm06.pdf>

In effect, this paper provides a way of "scaling" a trusted social network and
minimizing the influence of sock-puppet accounts.

------
dood
There is plenty of research into this, try a search on ACM
[<http://portal.acm.org/>] or google scholar for 'trust reputation
recommendation network'.

Reddit could/should have been using this kind of approach for ages (I don't
know if they have or not).

~~~
palish
Academic work is hard to translate into a good implementation though. For
example, if you trust one user more than another user, that means that you
would see totally different karma point values than any other user sees. That
sounds really expensive to compute.

~~~
dood
I imagine there are loads of variations on this theme (i.e. trust networks)
that could produce useful recommendations, without the need for unwieldy
computation. Sure there'll be a lot of number crunching, its just a matter of
choosing the right ones to crunch, and when and how to crunch them.

Also, I'm just talking about delivering a good set of recommendations,
anything else is gravy.

~~~
palish
Right, but this sounds like a "Beat your head through a concrete wall" sort of
problem. Sure, you might be able to do it.. But.. Why? Is the value add really
that big?

~~~
dood
It depends on the context you are using it in! For social news (e.g. reddit et
al.), this sort of thing might be a great help at providing personal
recommendations.

Here is a fun recent paper from google, about 'Scalable Online Collaborative
Filtering' for their personalised news service
[<http://www2007.org/paper570.php>], using MapReduce to run Expectation
Maximization. via [[http://www.datawrangling.com/google-paper-on-parallel-em-
alg...](http://www.datawrangling.com/google-paper-on-parallel-em-algorithm-
using-mapreduce.html)]

------
nickb
This has been done a lot of times before... one of the simplest examples is
Advogato trust metric.

<http://www.advogato.org/trust-metric.html>

------
palish
Excellent. It's unfortunate that it's computationally impractical.

