

The Statistical Crisis in Science - jonathansizz
http://www.americanscientist.org/issues/feature/2014/6/the-statistical-crisis-in-science/99999

======
yummyfajitas
So at least two people reading this seem to think it's about using science in
the context of their pet peeves. It's not.

It's about using a statistical test for a data dependent hypothesis and
interpreting the test as if it were used for a data-independent hypothesis.
That's all.

It's not about using statistics in politics or finance. It's about first
looking at the data, then formulating a hypothesis, then running a standard
test which is based on the idea that you chose the hypothesis independently of
the data. This is a problem in any field.

~~~
noahl
So actually, I think the idea that data-dependent hypotheses are bad is
fundamentally wrong, and is based on a misunderstanding of probability.

The reason you'd avoid data-dependent hypotheses is simple: if the data comes
from a process with some sort of randomness in it, then there will usually be
things that appear to be interesting patterns but are in fact random
artifacts. If you look at your data, you may be tempted to formulate a
hypothesis based on these random artifacts. It may pass statistical tests
(because the data you have does contain the pattern), but it is not, in fact,
causal. To avoid this, you maintain the discipline of only making hypotheses
_before_ you look at your data, which means that you can't see a random effect
and then guess that it's real.

The problem is, this doesn't mean that if your hypothesis passes a statistical
test, the result must be causal. It only lowers the probability - there is
still a chance that your hypothesis was wrong, but your data happens to
contain a random fluctuation that makes it look right. The only way to protect
against _this_ danger is to continuously gather data and re-evaluate your
hypotheses, while understanding that there is always some probability that the
effect you think you see is really random noise.

And once you're doing this continuous monitoring anyway, then there's no
reason to reject data-dependent hypotheses. By definition, if the effect
you're seeing is a random occurrence, then it should go away with more data.
If it doesn't go away, then maybe you've found something that you wouldn't
have been able to guess in advance, which is good! And if you see a random
effect, form a hypothesis that passes some test, and then assume that your
hypothesis must be true, then the problem is not your data, but rather that
you misunderstand how probability works.

In short, avoiding data-dependent hypotheses is a hack that only reduces the
probability of an error that you should be avoiding entirely anyway. Once you
accept this and start avoiding the error, there's no reason to avoid data-
dependent hypotheses, and they can be quite useful.

~~~
cja23
Yes! I think any good statistician would agree with all of that and emphasize
the importance of how you present your conclusions. Forming "data-dependent
hypotheses" is part of what most statistician's call "exploratory data
analysis" (EDA). When presenting EDA findings, we should use terms like
"association", "relationship", "correlation", "possible", and not words like
"cause", "effect", "p-value", "test", etc.

------
dschiptsov
Not only errors and misuse of statistics and misapplying of probability
theory, but also abstract modeling in general.

The very idea of modeling dynamic abstract processes such as finance markets,
which itself are mere abstractions is a non-science, it is misuse of pseudo-
scientific methods and mathematics, and what we have seen so far is nothing
but failures.

Too abstract or flawed abstractions and wrong premises cannot be fixed by any
amount of math or modeling. They only has to be discarded.

The famous "subject/object" false dichotomy in philosophy is the good example
too. People could spent ages modeling reality using non-existent abstractions.

Today all these multiverse "theories" are mere speculations about whether
Siva, Brama or Visnu is the most powerful, forgetting that all these were
nothing but anthropomorphic abstractions of the different aspects of one
reality.

The notion that so-called "modern science" is a new religion (a contest of
unproven speculations) is already quite old.

btw, a good example of the reductionist mindset (instead of pilling up
abstractions) could be the Upanishadic reduction of all the Gods to one
Brahman, to which Einstein accidentally discovered a formula - E = mc2, where
_c_ is a constant, implying that there is no time in the Universe).

~~~
hessenwolf
You are throwing the baby out with the bath water, with respect to financial
modelling. Yes, there are failures, and, yes, the models are severely
imperfect.

We reduced the risk on a portfolio of 2 billion Euro, from a about a billion
Euro to a risk of about 50 million Euro using hedging. The remaining 50
million was mostly basis risk, i.e., the mismatch between the underlying
instruments in the liabilities and the hedge assets.

Using a similar logic to yours, senior management argued that we introduced a
new risk called basis risk by trading derivatives.

~~~
dschiptsov
What is wrong with financial modeling, in my opinion, is not only that models
cannot grasp too complex "reality", but that it is changing while you are
finishing your model, so no statistical "snapshot" or data-set is even close
to be correct.

Also, so-called Black swans could occur only within such models. There is no
chance that one day c or even g could change (no matter what "scientists" used
to say in journals).

Btw, finance is a business, not science.)

~~~
hessenwolf
Yeah, so, I'm guessing you don't do a lot of financial modelling.

~~~
dschiptsov
Not everyone is so lucky.

------
chriswarbo
Scientists have tried over at least the past few hundred years (depending on
your definitions) to build, from scratch, a perspective on the world which is
as free from human bias as possible. At the moment, the jewel in the crown is
quantum physics: an inherently statistical theory, so detached from human
biases and assumptions that many smart people have struggled to understand or
accept it, despite its incredible predictive power.

At the heart of the whole process is statistical inference: generalising the
results of experiments or observations to the Universe as a whole. A
"statistical crisis in science" would be terrible news. We may have been
standing on the shoulders of the misinformed, rather than giants. Our
"achievements", from particle accelerators to nukes and moon rockets, could
have been flukes; if the underlying statistical approach of science was
flawed, the predicted behaviour and safety margins of these devices could have
been way off. We may be routinely bringing the world to the edge of
catastrophe, if we don't understand the consequences of our actions.

Oh wait, it seems like some "political scientists" have noticed that their
results tend to be influenced by external factors. I hope they realise the
irony in their choice of examples:

> As a hypothetical example, suppose a researcher is interested in how
> Democrats and Republicans perform differently in a short mathematics test
> when it is expressed in two different contexts, involving either healthcare
> or the military.

The article criticises scientists' ability to navigate the statistical
minefield of biases, probability estimates, modelling assumptions, etc. in a
world of external, political factors like competitive funding, positive
publication bias, etc. and they choose an example of _measuring how political
factors affect people 's math skills_!

To me, that seems the sociological equivalent of trying to measure the thermal
expansion of a ruler by reading its markings. What do you know, it's still
30cm!

~~~
semi-extrinsic
Saying that quantum mechanics is an inherently statistical theory is a blatant
misrepresentation. Precisely the point that makes QM so weird is that it is
_not_ caused by statistics. In a (properly set up) double slit experiment, a
single electron is simultaneously travelling through both slits and causing an
interference pattern.

~~~
ChrisLomont
>Saying that quantum mechanics is an inherently statistical theory is a
blatant misrepresentation.

Every axiomatic QM formulation (for example, [0]) I have ever seen has an
axiom a statistical statement, usually about observables.

Can you provide a set of axioms reproducing QM without such an axiom?

If not, then it's inherently a statistical theory.

In your example, where a single electron lands is completely governed by a
probability distribution that is determined by the setup of the test.

[0]
[http://en.wikipedia.org/wiki/Dirac%E2%80%93von_Neumann_axiom...](http://en.wikipedia.org/wiki/Dirac%E2%80%93von_Neumann_axioms)

------
chuckcode
"all models are wrong, but some are useful." \- George Box [1]

George Box expressed early my general feeling about statistics, it is a very
useful tool but remember the limitations of the methods, the data, and the
people applying them. I would like to seen an emphasis on openness and
transparency with data so others can replicate the analysis and the community
can come up with ways to make best practices accessible to anyone.

[1]
[http://en.wikiquote.org/wiki/George_E._P._Box](http://en.wikiquote.org/wiki/George_E._P._Box)

------
SaberTail
A good (in my opinion) trend in physics in the past decade or two has been the
rise of "blind" analyses[1]. Basically, the entire analysis is predetermined,
before looking at the data. Once all the details are nailed down and everyone
agrees with the approach, the blinds are taken off. There's no room for
"p-hacking".

This has some disadvantages, though. It requires a good understanding of the
experiment so that you can figure out what an analysis will actually tell you.
It's difficult to do a blind analysis on a brand new apparatus, since there
can always be unanticipated problems with the data. As an example, one dark
matter experiment invited a reporter to their unblinding. At first, it looked
like they'd detected dark matter, but then they had to throw out most of the
events because they were due to unanticipated noise in one of the
photomultiplier tubes[2].

[1]
[http://www.slac.stanford.edu/econf/C030908/papers/TUIT001.pd...](http://www.slac.stanford.edu/econf/C030908/papers/TUIT001.pdf)
is a quick review.

[2]
[http://www.nytimes.com/2011/04/14/science/space/14dark.html](http://www.nytimes.com/2011/04/14/science/space/14dark.html)

------
amelius
I really don't understand the meaning of this sentence (below). Perhaps
somebody could explain?

> As a hypothetical example, suppose a researcher is interested in how
> Democrats and Republicans perform differently in a short mathematics test
> when it is expressed in two different contexts, involving either healthcare
> or the military.

~~~
lkbm
There was a paper recently that found people did poorly in math problems if
the naive, wrong solution confirmed their political views:
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2319992](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2319992)

It's been making its rounds in the popular media with headlines like "Politics
makes you dumb".

------
jmmcd
> In general, p-values are based on what would have happened under other
> possible data sets. As a hypothetical example, suppose a researcher is
> interested in how Democrats and Republicans perform differently in a short
> mathematics test when it is expressed in two different contexts, involving
> either healthcare or the military. [...] At this point a huge number of
> possible comparisons could be performed, all consistent with the
> researcher’s theory. For example, the null hypothesis could be rejected
> (with statistical significance) among men and not among women—explicable
> under the theory that men are more ideological than women.

The meaning of a p-value is expressed in terms of what would have happened
with a different data set, yes, but that different data set would have arisen
through a different random sampling from the population. The explanation above
seems to completely misunderstand the issue.

------
CurtMonash
Between the failings in statistics and those in modeling, there's a whole lot
of science that's on shaky ground.

