Ranking and selection systems -- fitness functions -- are very very hard to get right.
I did a lot of work with genetic algorithms and evolutionary computation / alife systems a while back, and thus did a lot of work with writing fitness or objective functions. It turns out that it's extremely difficult to write a fitness function that the evolving system will not "game."
To give a specific example: I once wrote an objective function to train an evolving system to classify images, a simple machine learning test. After running it for only an hour or so, the system's performance seemed spectacular, like way up in the 90'th percentile. This made me suspicious. The programs that had evolved did not seem complex enough, and past experiments had shown that it should take a lot longer to get something that showed reasonable performance.
After a lot of analysis I figured out what it was.
I was pulling test images from two different databases. One database had higher latency than the other. The bugs had evolved a timing loop to measure how long it took them to get their data (they were multi-threaded) and were basically executing a side-channel attack against the training supervisor.
In another very similar case, I found that the bugs were cooperating by communicating by way of the operating system's thread/task scheduler. They were using timing loops to kibitz.
Humans are smarter than little evolving computer programs. Subject them to any kind of fixed straightforward fitness function and they are going to game it, plain and simple.
It turns out that in writing machine learning objective functions, one must think very carefully about what the objective function is actually rewarding. If the objective function rewards more than one thing, the ML/EC/whatever system will find the minimum effort or minimum complexity solution and converge there.
In the human case under discussion here, apply this kind of reasoning and it becomes apparent that stack ranking as implemented in MS is rewarding high relative performance vs. your peers in a group, not actual performance and not performance as tied in any way to the company's performance.
There's all kinds of ways to game that: keep inferior people around on purpose to make yourself look good, sabotage your peers, avoid working with good people, intentionally produce inferior work up front in order to skew the curve in later iterations, etc. All those are much easier (less effort, less complexity) than actual performance. A lot of these things are also rather sociopathic in nature. It seems like most ranking systems in the real world end up selecting for sociopathy.
This is the central problem with the whole concept of meritocracy, and also with related ideas like eugenics. It turns out that defining merit and achieving it are of roughly equivalent difficulty. They might actually be the same problem.
I did a lot of work with genetic algorithms and evolutionary computation / alife systems a while back, and thus did a lot of work with writing fitness or objective functions. It turns out that it's extremely difficult to write a fitness function that the evolving system will not "game."
To give a specific example: I once wrote an objective function to train an evolving system to classify images, a simple machine learning test. After running it for only an hour or so, the system's performance seemed spectacular, like way up in the 90'th percentile. This made me suspicious. The programs that had evolved did not seem complex enough, and past experiments had shown that it should take a lot longer to get something that showed reasonable performance.
After a lot of analysis I figured out what it was.
I was pulling test images from two different databases. One database had higher latency than the other. The bugs had evolved a timing loop to measure how long it took them to get their data (they were multi-threaded) and were basically executing a side-channel attack against the training supervisor.
In another very similar case, I found that the bugs were cooperating by communicating by way of the operating system's thread/task scheduler. They were using timing loops to kibitz.
Humans are smarter than little evolving computer programs. Subject them to any kind of fixed straightforward fitness function and they are going to game it, plain and simple.
It turns out that in writing machine learning objective functions, one must think very carefully about what the objective function is actually rewarding. If the objective function rewards more than one thing, the ML/EC/whatever system will find the minimum effort or minimum complexity solution and converge there.
In the human case under discussion here, apply this kind of reasoning and it becomes apparent that stack ranking as implemented in MS is rewarding high relative performance vs. your peers in a group, not actual performance and not performance as tied in any way to the company's performance.
There's all kinds of ways to game that: keep inferior people around on purpose to make yourself look good, sabotage your peers, avoid working with good people, intentionally produce inferior work up front in order to skew the curve in later iterations, etc. All those are much easier (less effort, less complexity) than actual performance. A lot of these things are also rather sociopathic in nature. It seems like most ranking systems in the real world end up selecting for sociopathy.
This is the central problem with the whole concept of meritocracy, and also with related ideas like eugenics. It turns out that defining merit and achieving it are of roughly equivalent difficulty. They might actually be the same problem.