
Science fails to face the shortcomings of statistics - CWuestefeld
http://www.sciencenews.org/view/feature/id/57091/title/Odds_Are%2C_Its_Wrong
======
jerf
I've been grousing about the state of science lately in some other comments;
this is basically what I was talking about. It's still the best method to the
truth we've got, but it needs some upgrades.

One other big error I'm reasonably confident about (though I'd welcome
corrections from those who know more stats) is that the p-value, in addition
to its other faults and misinterpretations, is usually used in a manner that
assumes a Gaussian distribution. While the Central Limit Theorum does mean we
tend to see that more often than some other distributions, it is not true that
it is safe to simply assume your data is Gaussian. You really need to
demonstrate that it is, first, then you can start using Gaussian-based tools
on the data.

~~~
alex_stoddard
I am wary of trying to contribute to this (a little knowledge is a dangerous
thing) but hopefully someone can correct my mistakes.(And what better way for
me to learn...)

A typical controlled science experiment is designed to take measurements of
multiple groups where one variable is different between the groups and others
are controlled. We wish to see if the variable of interest has an effect.

Therefore the commonest statistical test is to determine if the mean value of
the groups are different (t-test for two groups, anova for multiple groups)
e.g. does mean blood pressure increase when on a high salt diet.

If I understand the CLT (big if) then the distribution of the _mean_ of a
sample is, by the CLT, going to be Gaussian, regardless of the distribution
from which the actual measurements are drawn. i.e. for comparing group means
it it doesn't matter if my data is sampled from Guassians or not.

Of course that leads to the question of if a significant difference in group
means is really relevant in a given context.

~~~
lutorm
"If I understand the CLT (big if) then the distribution of the _mean_ of a
sample is, by the CLT, going to be Gaussian"

Yes, for _an infinite number of samples_. The rate at which is converges to a
Gaussian, though, is strongly dependent on the distribution from which the
measurements are drawn.

~~~
joe_the_user
Actually, not even this is not true!

The Central Limit Theorem applies _only_ to samples which have _finite mean
and variance_ (see: <http://en.wikipedia.org/wiki/Central_limit_theorem>).

Take a distribution which has infinite variance or mean and you can wind-up
instead with one of the fractal distributions which Mandlebrot studied.

see <http://en.wikipedia.org/wiki/Stable_distribution>

Distributions with infinite variance or mean are more common than one might
imagine. Some might argue the stock market would qualify.

~~~
lutorm
Good point. Distributions with infinite variance are less common in physics,
so that's normally not something we have to worry about.

------
CWuestefeld
_Over the years, hundreds of published papers have warned that science’s love
affair with statistics has spawned countless illegitimate findings. In fact,
if you believe what you read in the scientific literature, you shouldn’t
believe what you read in the scientific literature._

 _"There is increasing concern," declared epidemiologist John Ioannidis in a
highly cited 2005 paper in PLoS Medicine, "that in modern research, false
findings may be the majority or even the vast majority of published research
claims."_

~~~
lutorm
I'm not sure that it's fair to blame this fact on statistics. Incorrect
statistics can give rise to illegitimate findings, just like improper
measurement technique can. That's not the fault of statistics, it's the fault
of people incorrectly applying tools and techniques.

That said, I agree with the conclusion that published research is full of
incorrect results. I basically don't believe anything that comes out of a
single paper unless it's been confirmed by unrelated work. That's the fault
not just of incorrect statistics but of many other things, not the least of
which is that doing research is hard and there is no way of knowing your end
result is right.

~~~
khafra
Here's a few suggestions about better alternatives to Science using
statistics:

[http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...](http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/)

~~~
scott_s
_The obvious fix is to (a) require raw data to be published; (b) require
journals to accept papers before the experiment is performed, with the advance
paper including a specification of what statistics were selected in advance to
be run on the results; (c) raising the standard "significance" level to p
<0.0001; and (d) junking all the damned overcomplicated status-seeking
impressive nonsense of classical statistics and going to simple understandable
Bayesian likelihoods._

Suggestion b is both radical and very thought provoking. Which, at this point,
I expect from Yudkowsky.

~~~
lutorm
Interesting. The idea that null results are not worth publishing has since the
very beginning struck me as one of the most fundamentally flawed ideas in
science. Interestingly, it seems to be very domain-dependent. In my field
(astrophysics), null results are fairly frequently published, but I've heard
that in other fields it's totally impossible to do.

~~~
argv_empty
I've run across one or two compiler optimization papers where the conclusion
given was that the proposed technique didn't work out so well, but on the
whole, it seems it applies there as well. I agree that it's a problem -- if a
null result is not published, other people will probably waste years making
the same mistakes.

~~~
lutorm
Yeah, the wasted repeated effort was what I initially thought about, too. The
argument about how compilations of results will be systematically skewed by
certain results not being published is perhaps even more persuasive, because
it doesn't just lead to wasted effort but to incorrect results.

------
TrevorBurnham
_Correctly phrased, experimental data yielding a P value of .05 means that
there is only a 5 percent chance of obtaining the observed (or more extreme)
result if no real effect exists (that is, if the no-difference hypothesis is
correct). But many explanations mangle the subtleties in that definition._

This is important, but not quite accurate. It would be more correct to say
that this is what the P value means _if the model is correctly specified._ And
in the social sciences (or, for that matter, biology), this is almost never
the case.

When I was an undergrad taking econometrics, this was incredibly frustrating.
I swore there had to be something I just wasn't getting; why did scientists
put so much credence in numbers that rely on assumptions that they know to be
false? Of course, at the same time, I love microeconomic theory, which lies on
a similarly fictitious basis.

Over time, I relaxed a bit in my attitude toward statistics. While I don't
mean to diminish the importance of proper, rigorous methodology, the fact is
that statistical methods are just a narrative device. They give us a way of
telling plausible stories and discarding implausible ones. We'd be foolish to
believe that we can always tell correlation, causality and coincidence apart,
but we do a better job by using statistics than we would without.

~~~
lutorm
" _If_ the model is correctly specified"

Indeed. I had a similar realization when I observed that the estimated
parameter error on a chi-square fit does not depend on the actual chi-square
value itself. This seemed preposterous to me, shouldn't the parameters be more
uncertain if the fit is bad? Then I came across this passage in Numerical
Recipies that said something like "remember that all of this is under the
assumption that the model being fit to is actually the one from which the data
points are drawn. If the reduced chi-square value is >>1, then that indicates
that this is not the case and then _the entire procedure_ is suspect."

------
lotharbot
I remember recently reading a study that looked at possible correlations
between political beliefs and squeamishness. They asked about 20 questions in
each area, and found that 3 of the political questions correlated to 8 of the
squeamishness questions at the 95% significance level.

Recall that 95% significance means "5% chance this correlation is a fluke".
Correlating 20 questions with 20 others gives you 400 possibilities; they
found 24 correlations. Given 400 trials, if there was no real correlation,
they should've found about 20 flukes.

Needless to say, I was overall unimpressed by their results.

------
sketerpot
The usual rule of thumb that I hear is this: _don't trust any single medical
study._ Look at the broader literature in the area and see if there's a
general consensus. Until there's some consensus, be wary.

------
corruption
"The “scientific method” of testing hypotheses by statistical analysis stands
on a flimsy foundation"

If you have only studied first year statistics this is what you will learn as
"science". As soon as you get to subsequent study, you realise why hypothesis
testing is almost always the wrong approach.

I'm not sure who this guy has been talking to, but it sure hasn't been
statisticians!

------
claymmm
The diet pill example ("Oomph" and "Precision") from the Ziliak and
McCloskey's book really made it clear for me:

[http://books.google.com/books?id=JWLIRr_ROgAC&lpg=PA25&#...</a>

~~~
claymmm
BTW, I actually read the book. It's alright but really repetitive after the
first couple of chapters. McCloskey overwrites as usual.

------
brc
I must have entered a time portal - that article isn't published until the
27th March!

I totally agree that translation of results usually makes the jump from
'statistically significant at a p < .5' to 'fact' - and that's where the
problem is.

------
marshallp
The real problem is "scientists". What is a scientist and how is he
selected/trained ?

Scientists are hands-on technicians, theoreticians/statisticians, fundraisers,
and team leaders/project managers. Four distinct skillsets.

They are usually selected bassed on their ability to do bookwormish things
such as ace tests in their teens and early twenties.

The profession goes against the basic principle of capitalism - division of
labour.

I'm kinda amazed anything gets done by scientists, and this article is
validating my intuition.

~~~
cdr-n-car
Yes statistics, like logic itself, can be misused and fool the
untrained/unthinking.

~~~
joe_the_user
Training isn't useful unless institutions doing the train make awareness of
the limits of statistics part of their curriculum.

~~~
cdr-n-car
The limits of statistics is build right into statistics..."all models are
wrong...but some are useful.."

