

Ask HN math nerds: How to rollup lots of data points into a 0-100 score? - ryanwaggoner

So we're building this new social media dashboard called MightyReach.com that tracks stats from Twitter, Facebook, Feedburner, Digg, Youtube, Google Analytics, etc.  We thought an interesting feature would be to do something akin to PageRank, but for your social media presence, rolling all these data points up into a score from 0 - 100.  But how best to accomplish this?  I don't think we necessarily want to penalize someone for not using a particular service and therefore not have added it to their MightyReach account.  On the other hand, if they've added it and their stats are really low, then their score would be lower.<p>Other than manually coming up with some kind of curve equation for each service, which seems dreadful, the only thing I've come up with so far is maybe a percentile score?<p>Let me know if this makes any sense and if I'm missing something obvious.
======
aneesh
I say work backwards. Instead of thinking how you want to compute the score,
think about how you want the distribution of scores to be. Anything you come
up with will be fairly arbitrary, so you might as well "game the system" by
creating incentives that will cause people to get addicted to increasing their
score on your service.

For example, you may want very few people in the 90-100 range (reward the few
who reach this high level), very few people in the 0-10 range (don't make
people feel bad), and most people below 50. Then come up with a scoring system
that fits this distribution. If you find a metric that doesn't quite fit the
distribution, tweak it by using a log transform, or by adding to/multiplying
everyone's score.

Also, make it _simple_ to understand. If it's some magic formula that people
can't understand, they may not engage as much. Make it clear what they need to
do to increase their score.

------
grandalf
figure out a metric for each service:

Twitter:

    
    
      a: tweets per day 
    
      b: # of followers
    

Facebook:

    
    
      c: # of friends
    
      d: # of tagged photos
    

Then figure out a quantity that you would consider the 100 score for each
attribute.

and just create a weighted average.

Aa + Bb +Cc +Dd where the capital letters are the coefficients weight the
average and the lowercase letter are the 1 to 100 score for each attribute.

Then divide by the sum of the coefficients.

Later, you can tweak the coefficients (or the 100th percentiles) to make the
score seem sensible based on whatever business criteria you have.

~~~
ryanwaggoner
Yeah, it's the step of picking the quantity that corresponds to a 100 score
that concerns me. How many RSS subscribers is considered the top? 50,000? 1
million?

Also, I wonder if some kind of curve makes more sense so that it's easier to
move your score up in the lower levels. Let's say you pick 50,000 Twitter
followers as being the 100 score for that service. You have to get 500
followers to go from score 0 to score 1.

~~~
joeyo
Scale it by your user base-- that is, if your top user has 50k RSS
subscribers, then make that the top of the scale. This makes it basically a
percentile system.

However, I suggest not updating the coefficients in real-time. Otherwise you
could have a new user join that instantly changes the scale. So calculate
which user has the most Foo once a week or once a month to keep things a
little more static.

With respect to giving newbies an easier time, you could try using a log
transform. This will give a quick rise in the beginning but saturate in the
high levels. Try something like:

    
    
      log(X)/log(N)
    

where X are the number of quanta that the user has achieved with a service
e.g. number of Twitter followers and N is the maximum achieved by any user
with that service. Then you could combine the services as grandalf describes
above.

~~~
grandalf
indeed. adding a log would probably make the resulting score a lot more
realistic and accurate.

Any additional statistical sophistication would help -- maybe make the 50th
percentile the average and assume a normal distribution? Maybe different
services/attributes have different distributions, etc.

* fine tuning this might be a good use for wolfram alpha :)

------
pz
do you have access to the social graph underlying these services? with a graph
structure you could let "social authority" get passed around between users and
essentially just copy the math behind page rank.

assume, though, you don't have this graph then you are going to be estimating
the score based on various features (# tweets, # followers, # friends, #
comments on posts, etc)

you could probably roll these up into an adhoc formula, but i think the better
thing to do is label some data and learn the model. something as simple as
linear/logistic regression might be all you need.

------
brendano
do you already have a single score for each one?

you could just take scores and average them. use a weighted average if you
think certain services' scores are more important than others.

is the problem they're on different scales? you already said you wanted 0-100,
so just turn all the data points per service into percentiles, then average
said percentiles to get a per-person score.

there are of course a zillion other ways to do it depending on what exactly
you have and what you want...

