
Stein's paradox in Statistics (1977) [pdf] - xtacy
http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf
======
Hermel
I've been looking into Stein's paradox last year more deeply as I could not
wrap my head around it. The baseball example is easy. It is quite intuitive
that some of those who scored well probably just did so by accident, and are
actually somewhat less skilled (and vice versa for the lowest scorers).
However, the Stein paradox is deeper than that. According to Wikipedia, it
apparently also applies to unrelated variables, for example the population of
Ulan Bator, the temperature on Mars, and the yearly chocolate consumption of
Switzerland. This goes counter all my intuition and I found the following
weakness in the theory behind Stein's paradox: the improved estimators all
seem to depend on things like the mean or the variance among the included
variables. For example, if you estimate Ulan Bator to have 1 million
inhabitants and the temperature on Mars to be 200 Kelvin, you would adjust the
former estimate a little downwards and the latter estimate a little upwards
(towards the common mean). However, this implicitely assumes that the
population and the temperature have been drawn from a distribution whose mean
exists. My guess, that this is not the case. Obviously, you can always
calculate a sample mean and a sample variance, but they might be meaningless
if the sample stems from a distribution such as Cauchy.

~~~
hyperbovine
The "correct" way to understand this phenomenon is via regression, in which
case it boils down to the simple fact that the regression lines E(θ|X) and
E(X|θ) are different. (This "Galtonian perspective" is the subject of one of
my favorite papers of all time,
[http://projecteuclid.org/euclid.ss/euclid.ss/1177012274](http://projecteuclid.org/euclid.ss/euclid.ss/1177012274)).
This is also the only intuitive explanation I am aware of which explains why
Stein's phenomenon only occurs in dimensions three and higher.

------
mchahn
The "paradox" is that if player's batting averages, when high, get worse
predictions and low averages are predicted to get higher, then these
predictions work better than just guessing they will stay the same.

This doesn't seem like a paradox to me. Rather it seems kind of obvious. If
the statistics are anything like a random walk then random walk theory
(usually revisiting the starting point) would predict this.

~~~
tadkar
I think the article by Richard Samworth lays out the paradox better. The whole
article is worth a read, but here's the paradox part " To give an unusual
example to emphasise the point, suppose that we were interested in estimating
the proportion of the US electorate who will vote for Barack Obama, the
proportion of babies born in China that are girls and the proportion of
Britons with light-coloured eyes. Then our James–Stein estimate of the
proportion of democratic voters depends on our hospital and eye colour data! "
<[http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf>](http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf>)
Surely that's paradoxical!

The OP's post is an outstanding exposition of James-Stein estimators though,
so thanks for the post. There seems to be lots of connection between these and
doing linear regressions with regularisation in machine learning.

~~~
stdbrouw
Yep, there's a link with regularization and also with informative priors –
James-Stein works so well because across an incredibly wide range of
scenarios, a parameter estimate of infinity is not nearly as likely as a
parameter estimate of 0, yet that's what ordinary least squares linear
regression assumes.

------
jonahx
In the case of the baseball example, at least, isn't the increased accuracy of
the Stein estimator a result of incorporating a good Bayesian prior into the
"observed average" result of the individual players -- that prior being the
batting average of a "typical" player (ie, the average of the averages)?

~~~
dnautics
That doesn't explain adding in a spurious statistic (in the case of the OP, %
imported cars in Chicago)

~~~
jonahx
It does if that statistic offers a better prior as well. stdbrouw's answer
explains why that will often be the case.

------
memming
I have seen several mentions of James-Stein estimator being almost an
empirical Bayesian estimator, but this article made it more clearer. Thanks
for sharing.

------
myle
Regression to the mean

