(I cannot speak for the original article, I’ve not put the effort to fully understand it so I won’t categorically say it’s wrong but it didn’t seem right to me.)
The “paradox” is that it can truly be arbitrary! Pick a random point. Shrink your least-squares estimator. You got yourself a “better” estimator - without having any additional information.
That’s why the “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution” paper had the impact that it had.
Let's say that you have a standard multivariate normal with unknown mean mu = [a, b, c].
The usual maximum-likelihood estimator of the unknown mean when you get an observation is to take the observed value as estimate. If you observe [x, y, z] the "naive" estimator gives you the estimate mû = [x, y, z].
For any arbitrary point [p, q, r] you can define another estimator. If you observe [x, y, z] this "shrinkage" estimator gives you an estimate which is no longer precisely at [x, y, z] but is displaced in the direction of [p, q, r]. For simplicity let's say the resulting estimate is mû' = [x', y', z'].
Whatever the choice you make for [p, q, r] the "shrinkage" estimator has lower mean squared error than the "naive" estimator. The expected value of (x'-a)²+(y'-b)²+(z'-c)² is lower than the expected value of (x-a)²+(y-b)²+(z-c)².
The “paradox” is that it can truly be arbitrary! Pick a random point. Shrink your least-squares estimator. You got yourself a “better” estimator - without having any additional information.
That’s why the “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution” paper had the impact that it had.