Right. The question is when (if ever) you would actually want to be minimizing t...

mmmmpancakes · 2024-05-05T17:43:46

So this is just showing a bit of your ignorance of stats.

The general notion of compound risk is not specific to MSE loss. You can formulate it for any loss function, including L1 loss which you seem to prefer.

Steins paradox and James Stein estimator is just a special case for normal random variables and MSE loss of the more general theory of compound estimation, which is trying to find an estimator which can leverage all the data to reduce overall error.

This idea, compound estimation and James-Stein, is by now out-dated. Later came the invention of empirical Bayes estimation and the more modern bayesian hierarchical modelling eventually once we had compute for that.

One thing you can recover from EB is the James-Stein estimator, as a special case, in fact, you can design much better families of estimators that are optimal with respect to Bayes risk in compound estimation settings.

This is broadly useful in pretty much any situation where you have a large scale experiment where many small samples are drawn and similar stats are computed in parallel, or when the data has a natural hierarchical structure. For examples, biostats, but also various internet data applications.

so yeah, suggest to be a bit more open to ideas you dont know anything about. @zeroonetwothree is not agreeing with you here, they're pointing out that you cooked up an irrelevant "example" and then claim the technique doesnt make sense there. Of course, it doesnt, but thats not because the idea of JS isnt broadly useful.

----

Another thing is that JS estimator can be viewed as an example of improving overall bias-variance by regularization, although the connection to regularization as most people in ML use it is maybe less obvious. If you think regularization isn't broadly applicable and very important... i've got some news for you.