
Why most published scientific research is probably false [video] - xijuan
http://www.economist.com/blogs/graphicdetail/2013/10/daily-chart-2
======
chasing
Look. Clearly science makes progress that's "true" in the sense that it
becomes useful and can be used as functional models of the way things work.

This video's using a very reductionist kind of statistics to point out that,
yes, an individual piece of research making a claim might have a good chance
of being wrong. Which is why science doesn't say, "Oh, well Larry just proved
that Saturn orbits Uranus so let's just never think about that again and
instead move on to proving that the Sun is fully powered by excess heat
radiating off of Elon Musk's brain." Science is a process that works in
aggregate, using a large volume of research and scientists checking one
another as a way to smooth over this very imperfect process. Over time.
Science checks itself. That's the whole point. That's why it reaches some
pretty damned good conclusions about the way things work.

So.

I don't know what the point of this video is. Science is wrong? Scientists are
stupid? The Economist is smart? I should believe the Republicans when they say
the Earth couldn't possibly be warming because that one time it snowed in a
part of Texas where it never really snows all that often?

~~~
pygy_
_> Science checks itself._

Not that much, actually. There's no point in reproducing the work of others to
advance your own scientific career. Unless your research directly directly
depends on them, it is counterproductive.

~~~
epistasis
In the important areas, it does check itself. If a scientist has been doing
some interesting work, she will soon find herself with colleagues working on
the same area. If she has not been doing something interesting, that line of
research does not get cited, dies out, and does not get funding.

In active learning, there's a tradeoff between exploration and exploitation.
Science does this too, but heuristically.

~~~
001sky
_If she has not been doing something interesting, that line of research does
not get cited, dies out, and does not get funding._

Could you explain how this verifies results? Citation (alone) isn't
verification. And its safe to say followers and funding correlate with one
another, at least statistically.

~~~
epistasis
You're reversing what I'm trying to say, I'm saying that verification or
invalidation causes citations to occur, not that citation is verification. And
that work that doesn't get validated dies out. Even work that gets
"invalidated" in some sense can reveal other truths, or necessary changes in
paradigms.

This is why scientists check the citations of a paper when considering its
contents. It's important to ask if other people followed up on this, and what
have they found if they did follow up? New papers, without any citations, must
be held in a state of meta-information, until there's follow up papers. Old
papers with few citations, and no validation citations, must also be
considered as in a state of meta-information. Sometimes, really important
things like Mendel's genetics get lost for decades in this state, until they
are rediscovered, but it's fairly rare.

~~~
pygy_
Work often gets cited without being either replicated or invalidated.

Do you do any kind of research? You seem to have an unrealistically rosy
perception of the scientific process as it happens concretly.

~~~
epistasis
>Work often gets cited without being either replicated or invalidated.

I'm not sure why you're bringing up this fact, which is not at all
inconsistent with my comment. What's your implication?

>Do you do any kind of research? You seem to have an unrealistically rosy
perception of the scientific process as it happens concretely.

In the past decade, I've spent probably 70% of my time on scientific research.
At this moment I'm procrastinating from writing a letter of support on a
grant. What, in particular, do you think I'm unrealistically rosy about?

~~~
pygy_
_> I'm not sure why you're bringing up this fact, which is not at all
inconsistent with my comment. What's your implication?_

In that case I may have misunderstood your point. What I mean is that, for a
paper with 100+ citations (which, in some fields, is not rare), most of them
are not verified by the authors.

 _> In the past decade, I've spent probably 70% of my time on scientific
research. At this moment I'm procrastinating from writing a letter of support
on a grant. What, in particular, do you think I'm unrealistically rosy about?_

The self-correcting nature of the process is very slow in most cases. Bad
results end up being forgotten for minor findings, but for things of mild
interest, it may linger far longer.

------
lvs
This is extremely misleading and feeds an anti-intellectual notion that
scientists are just lying to everybody.

First, it perpetuates a common claim of those who don't practice any sort of
science: that the output of scientific studies is an enumeration of true/false
claims determined with statistical inference logic. (Science media/blogs
really aren't helping on this one.)

Second, the math is just wrong: the space of hypotheses is infinite, so it's
impossible to say what fraction of these are true.

~~~
capnrefsmmat
I don't think the technicalities of the size of the hypothesis space or its
uncountability matter. We do studies over large numbers of hypotheses, of
which we know _a priori_ only a few are likely to be true. Genome-wide
association studies, for example, will study thousands of genes knowing that
only a few will have true correlations.

In any case it is an entirely reasonable assumption that of all tested
scientific hypotheses, a small fraction are true. We often do not _expect_
them to be true. (A clinical trial may test thirty or more hypotheses, such as
"this medication causes diarrhea," and may expect most of them to be false.
But they have to check just in case.)

------
001sky
Two game-theoretic strategies need to be mitigated/bred out out of Academia:

(1)'Security through obscurity' problem, where nobody can be bothered to
verify your results as they are likely meaningless, lack broad applicability,
or are not intellectally cost-effective for anyone to be bothered to
understand them (etc).

(2) The "lick the cookie" problem, where nobody will verify your results
because there its considered degrading (professionally) to 'not be first' at
the table, as the author of origin. [a]

These both in combination lead to something of a "tradgedy of the commons"
where the basic core of the discipline erodes in presitige/utility, as the
individual contributors seek to maximize their personal productivity from the
public good (the repuation of groundbreaking science).

[a] This is the childhood strategy of making anything you touch first
unatractive to all those who follow.

edits: for clarity.

~~~
eli_gottlieb
>(1)'Security through obscurity' problem, where nobody can be bothered to
verify your results as they are likely meaningless, lack broad applicability,
or are not intellectally cost-effective for anyone to be bothered to
understand them (etc).

This is simply a problem of explaining findings to other researchers. _Every
single researcher_ with a genuine finding on his hands will, at some point,
seem like his work is obscure and pointless compared to "all those other guys"
who are building on established research for immediate application.

And then there's the issue that your failure to understand someone's research
doesn't mean it isn't research. Sometimes what a field needs is just a few
more smart people willing to work at understanding what the hell's going on.

------
leot
The conclusions of this video depend on an idealized view (and thus a poor
model) of research and science. In fact, there are many different kinds of
results (associated with different levels of confidence and which almost all
require a nuanced interpretation in order to be properly understood) and many
different kinds of researchers. The best results, across many fields, are
rarely if ever single papers with a single experiment with p<0.05. The good
ones have multiple mutually-confirming experiments with _much_ smaller
p-values. And often for the very best results, p-value-style analyses are
redundant: what would be the p-value associated with the line that Hubel and
Wiesel claim was triggering the firing of their cat's retinal ganglion cell
[[https://www.youtube.com/watch?v=IOHayh06LJ4](https://www.youtube.com/watch?v=IOHayh06LJ4)]?
Does it even matter?

[Edit: Parenthesis in first sentence, for clarity]

------
jamesaguilar
Probably false? As in you have a better chance claiming the negation of a
scientific paper's conclusion than the actual conclusion? I doubt it.

~~~
mjn
In certain fields, that's not implausible. Many results are along the lines of
"compound X is effective against disease Y", where the negation, "compound X
is not effective against disease Y", is a reasonable baseline assumption,
because most compounds are not effective against most diseases.

Results where the prior odds could plausibly be considered 50/50 are another
story. Scientific research that doesn't take the form "rule out the null
hypothesis with p values" is also a different matter (lots of CS and physics,
among other fields, has a more complex mingling of theory and epistemology
than experiment/nullhypothesis/pvalue/repeat).

~~~
jamesaguilar
That's a good point. I didn't consider the fact that possibly many discoveries
have a very low prior expectation of truthood.

------
snowwrestler
It is a mistake of reasoning to take a meta-analysis of medical research and
expand its conclusions to the rest of science.

Medical research has a number of peculiarities among the sciences, including
the complexity of its subject (perhaps the highest of any discipline), the
emotional reaction to the subject, the speed at which people try to turn
scientific findings into products or advice, and the concommittal eagerness to
trust epidemiological results without a known physical mechanism.

It's also a huge mistake to get your science news and opinion from The
Economist--a magazine with a great reputation that has nothing to do with its
coverage of science

------
gabriel34
Here is the link to the source, much friendlier to HN folk that rather read
than watch:
[http://www.economist.com/news/briefing/21588057-scientists-t...](http://www.economist.com/news/briefing/21588057-scientists-
think-science-self-correcting-alarming-degree-it-not-trouble)

EDIT: To add on to this, the source to the video does make an interesting
observation about statistical methods being employed by scientists who doesn't
know their pitfalls.

Other thing I take from a more careful reading (as opposed to viewing a two
minute highly superficial video) is that the article makes the assumption that
all hypothesis will be subject to only one study, if we have three studies
denying a certain hypothesis and one confirming it it's pretty easy to catch
the false positive on a literature review article (routinely done by people
entering academia)

------
anigbrowl
This is just an adjunct to this:
[https://news.ycombinator.com/item?id=6566915](https://news.ycombinator.com/item?id=6566915)
(article and HN discussion).

------
bnegreve
The claim is:

    
    
        "most published scientific research is probably false"
    

and the evidence for that claim is:

    
    
        The number of false negative "might easily be 4 in 10, or in some fields 8 in 10" 

(quoted from the video)

This is rather weak.

And maybe more importantly: this assumes that researchers test random
hypothesis uniformly drawn in the space of all possible hypothesis, which
clearly isn't the case.

Anyway, this can't be really serious.

------
epistasis
The vast majority of scientific papers are not single experiments with one
p-vaule, but rather a handful experiments to a dozen or more experiments, only
some of which may be reduced to a p-value. And in most biological research, at
least two lines of evidence are required before a reviewer will accept a claim
(e.g. "OK, you may have found something, now verify it with a PCR.").

So this entire setup is just kind of crap, and not representative of
scientific research.

In addition, this simple point, which is quite interesting, and necessary to
keep in mind when interpreting multiple p-values, is widely acknowledged in
the field, which is why False Discovery Rate methods started to be used as far
back as the 90s. This initial point was first published as a "The sky is
falling, what are all you idiot medical researchers doing?!" type of paper by
Ioannidis, which is a great way to make a name for oneself. However, even his
own interpretation did not hold up well, and he has stopped pushing the point.
Summarizing an extensive comment on Metafilter [1]

>Why Most Published Research Findings Are False: Problems in the Analysis >The
article published in PLoS Medicine by Ioannidis makes the dramatic claim in
the title that “most published research claims are false,” and has received
extensive attention as a result. The article does provide a useful reminder
that the probability of hypotheses depends on much more than just the p-value,
a point that has been made in the medical literature for at least four
decades, and in the statistical literature for decades previous. This topic
has renewed importance with the advent of the massive multiple testing often
seen in genomics studies.Unfortunately, while we agree that there are more
false claims than many would suspect—based both on poor study design,
misinterpretation of p-values, and perhaps analytic manipulation—the
mathematical argument in the PLoS Medicine paper underlying the “proof” of the
title's claim has a degree of circularity. As we show in detail in a
separately published paper, Dr. Ioannidis utilizes a mathematical model that
severely diminishes the evidential value of studies—even meta-analyses—such
that none can produce more than modest evidence against the null hypothesis,
and most are far weaker. This is why, in the offered “proof,” the only study
types that achieve a posterior probability of 50% or more (large RCTs
[randomized controlled trials] and meta-analysis of RCTs) are those to which a
prior probability of 50% or more are assigned. So the model employed cannot be
considered a proof that most published claims are untrue, but is rather a
claim that no study or combination of studies can ever provide convincing
evidence.

>ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY
MOST PUBLISHED RESEARCH FINDINGS ARE FALSE" >A recent article in this journal
(Ioannidis JP (2005) Why most published research findings are false. PLoS Med
2: e124) argued that more than half of published research findings in the
medical literature are false. In this commentary, we examine the structure of
that argument, and show that it has three basic components: >1) An assumption
that the prior probability of most hypotheses explored in medical research is
below 50%. >2) Dichotomization of P-values at the 0.05 level and introduction
of a “bias” factor (produced by significance-seeking), the combination of
which severely weakens the evidence provided by every design. >3) Use of Bayes
theorem to show that, in the face of weak evidence, hypotheses with low prior
probabilities cannot have posterior probabilities over 50%. >Thus, the claim
is based on a priori assumptions that most tested hypotheses are likely to be
false, and then the inferential model used makes it impossible for evidence
from any study to overcome this handicap. We focus largely on step (2),
explaining how the combination of dichotomization and “bias” dilutes
experimental evidence, and showing how this dilution leads inevitably to the
stated conclusion. We also demonstrate a fallacy in another important
component of the argument –that papers in “hot” fields are more likely to
produce false findings. We agree with the paper’s conclusions and
recommendations that many medical research findings are less definitive than
readers suspect, that P-values are widely misinterpreted, that bias of various
forms is widespread, that multiple approaches are needed to prevent the
literature from being systematically biased and the need for more data on the
prevalence of false claims. But calculating the unreliability of the medical
research literature, in whole or in part, requires more empirical evidence and
different inferential models than were used. The claim that “most research
findings are false for most research designs and for most fields” must be
considered as yet unproven.

[1] [http://www.metafilter.com/133102/There-is-no-cost-to-
getting...](http://www.metafilter.com/133102/There-is-no-cost-to-getting-
things-wrong#5256675)

~~~
Fomite
What's infuriatingly ignored is that in that very same PLoS Medicine issue is
a response to Ioannidis' work by Greenland, IIRC, that notes that by "False"
he means the significance is wrong, but what's really of interest is the
effect measure.

On a meta level, I've always wondered why we take a paper about most findings
being false as clearly correct.

~~~
capnrefsmmat
It's true that effect sizes are often more important, but it's also true that
they're also often incorrect. See e.g.

Ioannidis, J. P. A. (2008). Why Most Discovered True Associations Are
Inflated. Epidemiology, 19(5), 640–648. doi:10.1097/EDE.0b013e31818131e7

Most studies are underpowered and are incapable of detecting the true effect.
Only if they get lucky and observe an abnormally large effect will they obtain
a statistically significant result, so the published results tend to be
significant overestiates.

For another good example, see

Gelman, A., & Weakliem, D. (2009). Of beauty, sex, and power: statistical
challenges in estimating small effects. American Scientist, 97, 310–316.

[http://www.stat.columbia.edu/~gelman/research/unpublished/po...](http://www.stat.columbia.edu/~gelman/research/unpublished/power4r.pdf)

~~~
Fomite
I think part of the point there is not to pass effect estimates through a
significance test filter first. Most studies are underpowered to _detect a
true effect at alpha = 0.05_. That doesn't actually suggest that most studies
are wrong as much as if a study is underpowered _and_ doesn't find a
significant finding, we assert its dull and uninteresting.

Ironically, the Ioannidis paper is in Epidemiology, which is a journal that is
fairly anti-significance testing, but where I still get reviewers suggesting
that an effect measure with a confidence interval that brushes against the
null must mean nothing at all.

------
pallandt
A good opportunity for mentioning Benford's Law:
[http://en.wikipedia.org/wiki/Benford%27s_law#Scientific_frau...](http://en.wikipedia.org/wiki/Benford%27s_law#Scientific_fraud_detection)

~~~
kineticfocus
Maybe an opportunity for mentioning Simpson's Paradox:
[http://en.wikipedia.org/wiki/Simpson%27s_paradox](http://en.wikipedia.org/wiki/Simpson%27s_paradox)

~~~
pallandt
Good point!

------
yetanotherphd
The problem with their reasoning is that it relies on a very high prior that
the hypothesis is false.

In fact an explicit analysis of the prior over the hypothesis and the power of
the test, would be roughly equivalent to the informal discussion that goes
along with the statistical results.

The main issues in my opinion are that the number and nature of studies that
produce null results if unknown, and that there is a bias in the literature
towards positive results. While this bias incentivizes researchers to use
powerful tests, it comes at a big cost.

------
gabriel34
IMO what is damaging to society is that there is no PR to science, so the
press takes things that are not yet fully understood or verified by the
academic community and publishes it as confirmed (studies say, confirm, etc.)

Groundbreaking results give much more press than their rebuttal, for example,
see neutrinos faster than light or that arsenic consuming bacteria both of
which were later dismissed in academic circles, but did not enjoyed the same
treatment from the media.

~~~
leoc
That sounds very much like the UK's Science Media Centre:
[http://www.sciencemediacentre.org/about-
us/](http://www.sciencemediacentre.org/about-us/) . How good a thing the SMC
has been is another question.

------
officialjunk
so... non-scientific articles about scientific research are true? probably
not.

~~~
chattoraj
See the original paper:
[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/)

~~~
officialjunk
i see. but then again, wouldn't their findings imply their own published
results are likely false?

------
timr
For me, the really remarkable thing about this graphic is that it doesn't even
support the headline: the number of false positives is a minority of total
positives in the given example: 45 / 125 = 36%

------
Gravityloss
It's just basic Bayesian reasoning. Most of the positive HIV test results are
false positives. Even when the P value is less than 0.01 or so.

