

Can we do better than R-squared? - mathattack
http://tomhopper.me/2014/05/16/can-we-do-better-than-r-squared/

======
wirrbel
The biggest problem: People hide between such quantities as R-squared and
p-Value. While they are just indicators on how the model fits the data, people
turn them into a monolithical, undoubtable truth. That is why studies are
being published with 7 participants,

~~~
tst
I would even go further. The biggest problem isn't just using p values and
R^2. I think the biggest problem is that a lot of people didn't learn
statistics properly.

Properly is a vague term. So what do I mean? Instead of obsessing with tons of
techniques going back to the basic and actually learn how to do design
studies, work with data, learn statistical reasoning and critical thinking.

I took quite a few courses in statistics because I liked it. But a lot other
people – especially those that apply statistics – maybe take one or two
courses in stats and then do research / studies. The results can be pretty
terrible. In conclusion, more fundamentals and less icing on the cake.

~~~
wirrbel
While I agree with the premise that most people do not know enough statistics,
there are several underlying problems to that.

When I was first encountered to statistical topics as an undergrad, the
lecturer himself was not confident of the matter and taught a very bare subset
that consisted of linear fitting, error propagation, means and standard
deviation. Even back then I had the feeling that the equations provided were
insufficient and contained a lot of things that were not motivated or
explained.

Nowadays I can see why I was not introduced to statistics and probabilities
back then as I was introduced to algebra, analysis and quantum mechanics. The
field of statistics is complex, full of contradicting best-practices and
analytically challenging. Let alone Bayes-vs.-Frequentist, etc. In a way I
doubt that all researchers working with Poisson-distributed data once in their
life as scientists can work through the details Poissonian statistics
analytically.

Maybe an illustrative introduction would be more beneficial. I imagine people
could perform statistical experiments before working with real data and
experience first hand how misleading a small data subset can be for example,
or how fundamentally data plots can change their face. Maybe then people would
stop being overenthusiastic about their N=20 experiments.

------
peatmoss
AIC? BIC? The article mentions adjusted R^2, which would still be an
improvement over plain R^2 if you have a number of predictors in your model.

------
waps
And the mistake is here :

> Minitab calculates predicted R-squared by systematically removing each
> observation from the data set, estimating the regression equation, and
> determining how well the model predicts the removed observation. Like
> adjusted R-squared, predicted R-squared can be negative and it is always
> lower than R-squared.

(linked from the article, and better explained than anywhere in the article).

The mistake is in what is compared against what. Ideally a model would be
independent from the data entirely. This is what you get if you correctly
derive a model from first principles, and then verify that observed data
points agree. The independence is perfect and it is valid to compare the two
data sets. Any agreement builds confidence that the model is correct.

However that's not what's being done here. The "model" is a linear combination
of the data set. So you're calculating the model generated from n data points
with the model generated from n-1 data points, for each point. So you're
effectively comparing the data against itself (which is also what R-squared is
doing of course).

The correct way to go about this is to randomly divide your dataset in 2
pieces, ideally both should have a decent size. The second part you give to
your colleague, with strict instructions to shoot you if you even ask to see
that data. You build your model with your data points, and then ask your
colleague to compare it with his data, and report only if they're close or
not, NOT the values. If your fit is not close to his fit ("too much" \- some
fuziness here), you're overfitting.

The key difference here is the random part. The "adjusted predictive R
squared" does not create independence between the compared data sets. This
means it can't work correctly.

I realize this is the typical mathematician argument. I'm claiming here that
there exists a way to make R squared and predictive R squared both be way too
high, without giving any method to do so. But this is obviously wrong
nonetheless.

~~~
ajtulloch
> I realize this is the typical mathematician argument.

Interestingly, a mathematician's argument on this exact method was actually
given in 1977, in "An Asymptotic Equivalence of Choice of Model by Cross-
Validation and Akaike's Criterion". [1]

What Stone showed was that (modulo technicalities) performing this procedure -
leave one out cross validation, the procedure you claim "cannot work
correctly" \- was asymptotically equivalent to minimising the AIC [2] of the
model. So we have an asymptotic guarantee (for whatever that's worth) on the
behaviour of this procedure.

[1]:
[http://www.jstor.org/stable/2984877](http://www.jstor.org/stable/2984877)

[2]:
[http://en.wikipedia.org/wiki/Akaike_information_criterion](http://en.wikipedia.org/wiki/Akaike_information_criterion)

