One thing I don't understand is how quantities with different units can be compared. For instance, you give the example of 1,000,000 people living in Ulan Bator and 200 Kelvin being the temperature on Mars. If you follow this procedure and nudge them toward one another (bringing 1,000,000 down and 200 up), then you're supposed to end up with a better predictor. But what if our units had been millions of people and milliKelvins. Then our quantities would have been 1 and 200,000, respectively. The procedure would have us nudge our estimates in the opposite directions. And that surely wouldn't also improve our estimates, right?
Clearly I'm misunderstanding something, so I'm going to read some of these papers.
Edit: It seems from the Galtonian perspective paper, all the distributions are assumed to have a constant standard deviation. So perhaps we shouldn't be measuring these quantities in terms of people or milliKelvins, but rather in terms of standard deviations? E.g., the mean is +4 standard deviations above 0?
This doesn't seem like a paradox to me. Rather it seems kind of obvious. If the statistics are anything like a random walk then random walk theory (usually revisiting the starting point) would predict this.
The OP's post is an outstanding exposition of James-Stein estimators though, so thanks for the post. There seems to be lots of connection between these and doing linear regressions with regularisation in machine learning.
But maybe it's more than that. The will power to overcome our intrinsic limits eventually runs out. We can only focus on the ball so long before our thoughts wander. And at some point below the average of our capabilities, we get tired of taking a break and start to perform again as well as we know we can.