I am not sure I would even say "believe", I would think of it more as short-circuiting our critical thinking. I think it taps into something at the core of our tribal instincts. It was famously present in even basic systems like Eliza. And it's not just machines... The same tricks are used by conmen, politicians, and psychopaths, which is more negative than I intend. Even with good intentions and positive outcomes, I feel we need to remember that we drive it, not the other way around.
People just don't like to be played for fools. Perhaps us giving into this is progress? I'd give a big ol' "fuck you" to anyone who claims it is, but I'm also pretty old.
It also took me a little while to realize “least squares” and MMSE approaches were not necessarily the “correct” way to do things but just “one thing we actually know how to do” because everything else is much harder.
We can use Calculus to do so much but also so little…
That isn't the case; mathematicians will do pages of calculations (particularly and especially the statisticians) if they can prove one approach is technically superior to another. These people, as a class, are the crazies who invented matrix multiplication. Something like MMSE is used because it provably optimum properties for estimating a posterior distribution.
It is certainly possible that there are complex approaches that the statisticians have not discovered or don't teach because they are too complicated, but they had a big fight about which techniques were provably superior early in the discipline's history and the choices of what got standardised on weren't because of ease of calculation. It has actually been quite interesting how little interest the statisticians are likely to be taking in things like the machine learning revolution since the mathematics all seems pretty amenable to last century's techniques despite orders of magnitude differences in the data being handled.
> optimum properties for estimating a posterior distribution
Circular reasoning: that's true only if the posterior is normal, or if your "optimal" is defined by second moments. In infinite variance cases, the best estimator can be median or an alpha moment for alpha < 2, but yikes the math is much more difficult.
-- A mathematician who has indeed fallen into the beauty trap
> Circular reasoning: that's true only if the posterior is normal, or if your "optimal" is defined by second moments.
That doesn't sound right, it is an error minimising technique. Are we not talking about minimising mean square errors? Why would the posterior need to be normal? And why would optimal need to be defined by 2nd moments?
reply