Hacker News new | past | comments | ask | show | jobs | submit login

There is something magical about the origin when the result does not respect translational symmetry.

In fact, in a real world setting I would probably use my first measurement to define the origin, having no other reference to reach for.




What does not respect translational symmetry?

You have an estimator. If you apply shrinkage towards the origin you have another estimator. If you apply shrinkage towards [42, 42, ..., 42] you have yet another estimator. Etc. Is it a problem that different estimators produce different results?


That's my understanding as well, FWIW. This is how I would phrase it:

Shrinking helps. In R^d there's no such thing as shrinking generally, only shrinking in the direction of some point. (The point that's the fixed point of the shrinking.) Regardless of what that point is, it's a good idea to shrink.


The James-Stein estimator does not respect translational symmetry. If I do a change of variables x2 = (x - offset), for an arbitrary offset, it gives me a different result! Whereas an estimator that just says I should guess that the mean is x, is unaffected by a change of coordinate system.

This is a big problem if the coordinate system itself is not intended to contain information about the location of the mean.

This makes sense if "zero" is physically meaningful, for example if negative values are not allowed in the problem domain (number of spectators at Wimbledon stadium, etc). Although in that case, my distribution probably shouldn't be Gaussian!


This is what the original paper from Stein says:

"We choose an arbitrary point in the sample space independent of the outcome of the experiment and call it the origin. Of course, in the way we have expressed the problem this choice has already been made, but in a correct coordinate-free presentation, it would appear as an arbitrary choice of one point in an affine space."

The James-Stein estimator in its general form is about shrinking towards an arbitrary point (which usually is not the origin). It respects translational symmetry if you transform that arbitrary point like everything else.


That just means that it's assuming arbitrary additional prior information about the problem, which is different than zero information.


I don't understand what you mean. Who assumes what?

Take any point and shrink your least-squares estimator in that direction. You get an estimator that it's strictly better - in some technical sense - which renders the original estimator inadmissible - in some technical sense.

That's a mathematical fact, it has nothing to do with prior information about the problem.


The article's presentation of the James-Stein estimator sets the arbitrary point at the origin. (My previous comments should be read in this context). Of course, we could set it anywhere, including [42,...]. Let's call it p. Regardless of where you set it, the estimator suggests that your best estimate û, of the mean μ, should be nudged a little away from x and towards p.

My point is that the choice of 'p' (or, in the article's presentation, the choice of origin) cannot truly be arbitrary because if it reduces the expected squared difference between μ and û, then it necessarily contains information about μ. If all you truly know about μ is x and σ, then you will have no way to guess in which direction you should even shift your estimate û to reduce that error.

If you do have some additional information about μ, beyond just x alone, then sure, take advantage of it! But then don't call it a paradox.


(I cannot speak for the original article, I’ve not put the effort to fully understand it so I won’t categorically say it’s wrong but it didn’t seem right to me.)

The “paradox” is that it can truly be arbitrary! Pick a random point. Shrink your least-squares estimator. You got yourself a “better” estimator - without having any additional information.

That’s why the “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution” paper had the impact that it had.


Then you'll have to clarify what you mean by "random" when you say "pick a random point".

Unless you mean that every point on a spherical surface centered on x would have a lower expected squared error than x itself?


We may be talking about different things.

Let's say that you have a standard multivariate normal with unknown mean mu = [a, b, c].

The usual maximum-likelihood estimator of the unknown mean when you get an observation is to take the observed value as estimate. If you observe [x, y, z] the "naive" estimator gives you the estimate mû = [x, y, z].

For any arbitrary point [p, q, r] you can define another estimator. If you observe [x, y, z] this "shrinkage" estimator gives you an estimate which is no longer precisely at [x, y, z] but is displaced in the direction of [p, q, r]. For simplicity let's say the resulting estimate is mû' = [x', y', z'].

Whatever the choice you make for [p, q, r] the "shrinkage" estimator has lower mean squared error than the "naive" estimator. The expected value of (x'-a)²+(y'-b)²+(z'-c)² is lower than the expected value of (x-a)²+(y-b)²+(z-c)².




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: