

Computing Your Skill - Nic0
http://www.moserware.com/2010/03/computing-your-skill.html

======
nevinera
Unfortunately, all such efforts are limited by their central assumption - that
skill is orderable.

This is demonstrably untrue - skill at chess (to use their example) occurs on
several axes, the most obvious of which are opening theory, tactics, and
positional play. Players may excel only in certain regions of that space, and
it's quite easy to set up a player cycle A->B->C->A, in which each player is
more likely to win against one player and lose against another. A players
observed 'skill' therefore will depend on any biases in the population at
large rather heavily (at the lower levels of chess, the population has
overwhelmingly studied opening theory and some tactics)

Because of their non-dimensionality, skill ranking algorithms are universally
limited to expressing how likely one is to win against an _average_ person of
a given skill ranking, rather than the likely outcome of the match about to be
played. Sports match prediction techniques are all domain-specific, precisely
for this reason (and because substantial sums of money are riding on their
predictive effectiveness).

~~~
imurray
There are models that do learn several latent features for each player [e.g.
1]. And I recently supervised a Master's project motivated by exactly your
observation[2]. Models with multiple features per player do make better
predictions, but by less than I'd hoped. A result from the Master's project,
which surprised me, was quite how much details of the approximate inference
mattered — much more so than the choice of model.

[1] <http://www.cs.toronto.edu/~rpa/adams-dahl-murray-2010a.shtml>

[2]
[http://homepages.inf.ed.ac.uk/imurray2/projects/2011_marius_...](http://homepages.inf.ed.ac.uk/imurray2/projects/2011_marius_stanescu_msc.pdf)

------
_delirium
This is pretty interesting, though there's a _lot_ of introductory exposition
for people already familiar with statistics and rating systems (no doubt a
good thing for people who aren't, and it's well-written). The takeaway for
"how is this different from Elo?" appears to be this part:

 _The TrueSkill algorithm generalizes Elo by keeping track of two variables:
your average (mean) skill and the system’s uncertainty about that estimate
(your standard deviation)._

which in Elo terms translates to basically having a non-fixed K. Since that's
also the goal of the Glicko rating system (an already-used extension to Elo),
I was curious if this article would compare them. It doesn't, but their FAQ
does (result: there are minor technical differences, but the big difference is
that TrueSkill handles games other than 2-player games):
[http://research.microsoft.com/en-
us/projects/trueskill/faq.a...](http://research.microsoft.com/en-
us/projects/trueskill/faq.aspx)

Some additional Google-Scholaring turns up that there are some extensions to
that as well, notably one that computes the Bayesian estimate using the whole
history, instead of incremental updates: <http://halofit.org/papers/WHR.pdf>

------
HeyImAlex
Only ever implement trueskill as a hidden metric; after a sufficiently high
number of games, the level of uncertainty drops and your players get 'locked
in' to a certain rank and, while this is great for actually tracking skill, it
gives the end user a feeling of helplessness. People prefer seeing a volatile
and perhaps inaccurate skill that they can feasibly change to an extremely
accurate one which is frustrating to move. Taken from experience as a Halo
player (H2 used a points system kind of like regular elo, H3 used Microsoft's
trueskill).

------
ludwigvan
Here is his next article where he discusses porting his C# code for this to
PHP, and his views on PHP in general, from the view of a C# coder:
[http://www.moserware.com/2010/10/notes-from-porting-c-
code-t...](http://www.moserware.com/2010/10/notes-from-porting-c-code-to-
php.html)

------
beaumartinez
Jeff Moser's blog is full of good articles, I revisit it from time to time
hoping he's posted something new, but I think he's been too busy these passed
few years.

One of the most accessible and interesting is his HTTPS breakdown[1]—highly
recommended, and I'm sure it's been HN'd more than once.

[1]: [http://www.moserware.com/2009/06/first-few-milliseconds-
of-h...](http://www.moserware.com/2009/06/first-few-milliseconds-of-
https.html)

~~~
moserware
Thanks for the kind words! I joined a startup (kaggle.com) at the start of the
year and that has used up most of my time I had for blogging.

I'm hoping to get back to writing again sometime before the end of the year.
Unfortunately, it takes me a long time to write so I'm hoping to get at least
one post in this year :)

------
vinilios
Didn't have much time reading through out the whole article and beyond the
analysis of the subject it self, i really enjoyed the way the article was
written. Giving the ground on theory and mathematics behind the problem in a
way that anyone could understand. Thumbs up for authors writing/educational
skills.

