Hacker News new | past | comments | ask | show | jobs | submit login
Why most published scientific research is probably false [video] (economist.com)
62 points by xijuan on Nov 3, 2013 | hide | past | favorite | 53 comments



Two game-theoretic strategies need to be mitigated/bred out out of Academia:

(1)'Security through obscurity' problem, where nobody can be bothered to verify your results as they are likely meaningless, lack broad applicability, or are not intellectally cost-effective for anyone to be bothered to understand them (etc).

(2) The "lick the cookie" problem, where nobody will verify your results because there its considered degrading (professionally) to 'not be first' at the table, as the author of origin. [a]

These both in combination lead to something of a "tradgedy of the commons" where the basic core of the discipline erodes in presitige/utility, as the individual contributors seek to maximize their personal productivity from the public good (the repuation of groundbreaking science).

[a] This is the childhood strategy of making anything you touch first unatractive to all those who follow.

edits: for clarity.


>(1)'Security through obscurity' problem, where nobody can be bothered to verify your results as they are likely meaningless, lack broad applicability, or are not intellectally cost-effective for anyone to be bothered to understand them (etc).

This is simply a problem of explaining findings to other researchers. Every single researcher with a genuine finding on his hands will, at some point, seem like his work is obscure and pointless compared to "all those other guys" who are building on established research for immediate application.

And then there's the issue that your failure to understand someone's research doesn't mean it isn't research. Sometimes what a field needs is just a few more smart people willing to work at understanding what the hell's going on.


Neither of these are actually problems.

If (1), your results are obscure or lack applicability, then they are not useful anyway and so it doesn't matter if they are ever confirmed. A surprisingly large amount of scientific research fits into this category.

While (2) few researchers set out solely to verify another's result, if the result is noteworthy, it will be incorporated as a component of future research and therefore tested indirectly. This other comment (not mine) explains how:

https://news.ycombinator.com/item?id=6662124


Interesting, and nicely stated! I wonder what it would look like if funding and publishing could be made to incentivize really valuable pursuits, rather than rewarding trendiness, novelty, and shock value.


The conclusions of this video depend on an idealized view (and thus a poor model) of research and science. In fact, there are many different kinds of results (associated with different levels of confidence and which almost all require a nuanced interpretation in order to be properly understood) and many different kinds of researchers. The best results, across many fields, are rarely if ever single papers with a single experiment with p<0.05. The good ones have multiple mutually-confirming experiments with much smaller p-values. And often for the very best results, p-value-style analyses are redundant: what would be the p-value associated with the line that Hubel and Wiesel claim was triggering the firing of their cat's retinal ganglion cell [https://www.youtube.com/watch?v=IOHayh06LJ4]? Does it even matter?

[Edit: Parenthesis in first sentence, for clarity]


Probably false? As in you have a better chance claiming the negation of a scientific paper's conclusion than the actual conclusion? I doubt it.


In certain fields, that's not implausible. Many results are along the lines of "compound X is effective against disease Y", where the negation, "compound X is not effective against disease Y", is a reasonable baseline assumption, because most compounds are not effective against most diseases.

Results where the prior odds could plausibly be considered 50/50 are another story. Scientific research that doesn't take the form "rule out the null hypothesis with p values" is also a different matter (lots of CS and physics, among other fields, has a more complex mingling of theory and epistemology than experiment/nullhypothesis/pvalue/repeat).


That's a good point. I didn't consider the fact that possibly many discoveries have a very low prior expectation of truthood.


I wouldn't be so sure. Not talking about mathematics or CS, but in many social sciences (and also medicine and so on) the paper is just stating that A->B but i) there can be a lot more things going on that explain whatever correlation you are finding (from reverse causality to bad experimental setup), and *more importantly ii) in order to get published, you need to present a somewhat interesting or controversial statement.

If you state something obvious and most likely true, good luck getting to Nature/Science. If you state something unusual, then that will sell (where selling is getting citations) so you will see it published


Asimov wrote an excellent piece on this - The Relativity of Wrong: http://chem.tufts.edu/answersinscience/relativityofwrong.htm

The Earth isn't flat, but it certainly approximates being flat for small measurements of its surface (such as those early civilizations would've been able to make).


yes, probably false. This is not news, e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/ came out in 2005.

of course, research "proving" that research tends to be false can get us into a little circular game...


There's a follow-up paper to the one you linked that claims that much less published research findings are false:

>We estimate that the overall rate of false discoveries among reported results is 14% (s.d. 1%), contrary to previous claims. We also found that there is no a significant increase in the estimated rate of reported false discovery results over time (0.5% more false positives (FP) per year, P=0.18) or with respect to journal submissions (0.5% more FP per 100 submissions, P=0.12).

[1] http://biostatistics.oxfordjournals.org/content/early/2013/0...

The circle has begun :)


You should also read the responses to that article, such as Ioannidis's, which is scathing. They're all open-access, thankfully, and you can find them here:

http://simplystatistics.org/2013/09/25/is-most-science-false...

Jager and Leek put their code on GitHub, so the commenters were able to review their code and tinker with it to see what would happen.


Thanks for all these links! I skimmed them, and it's fun to see that two of them rip into Ionnadis' work, while Ionnadis rips into the Jager/Leek work... It's also a cool exercise in reproducible research and open peer review, both of which are far from common.

In my opinion, the entire exercise of data-mining the published literature is pretty much futile. We already know there are problems in the published literature and that scientists are pretty mediocre at statistics. Pin-pointing the exact value of how mediocre only leads to, as your link shows, a mountain of published works, hurt feelings (on Ionnadis side I guess) and doesn't solve anything.


It is a mistake of reasoning to take a meta-analysis of medical research and expand its conclusions to the rest of science.

Medical research has a number of peculiarities among the sciences, including the complexity of its subject (perhaps the highest of any discipline), the emotional reaction to the subject, the speed at which people try to turn scientific findings into products or advice, and the concommittal eagerness to trust epidemiological results without a known physical mechanism.

It's also a huge mistake to get your science news and opinion from The Economist--a magazine with a great reputation that has nothing to do with its coverage of science


Here is the link to the source, much friendlier to HN folk that rather read than watch: http://www.economist.com/news/briefing/21588057-scientists-t...

EDIT: To add on to this, the source to the video does make an interesting observation about statistical methods being employed by scientists who doesn't know their pitfalls.

Other thing I take from a more careful reading (as opposed to viewing a two minute highly superficial video) is that the article makes the assumption that all hypothesis will be subject to only one study, if we have three studies denying a certain hypothesis and one confirming it it's pretty easy to catch the false positive on a literature review article (routinely done by people entering academia)


This is just an adjunct to this: https://news.ycombinator.com/item?id=6566915 (article and HN discussion).


The claim is:

    "most published scientific research is probably false"
and the evidence for that claim is:

    The number of false negative "might easily be 4 in 10, or in some fields 8 in 10" 
(quoted from the video)

This is rather weak.

And maybe more importantly: this assumes that researchers test random hypothesis uniformly drawn in the space of all possible hypothesis, which clearly isn't the case.

Anyway, this can't be really serious.


The vast majority of scientific papers are not single experiments with one p-vaule, but rather a handful experiments to a dozen or more experiments, only some of which may be reduced to a p-value. And in most biological research, at least two lines of evidence are required before a reviewer will accept a claim (e.g. "OK, you may have found something, now verify it with a PCR.").

So this entire setup is just kind of crap, and not representative of scientific research.

In addition, this simple point, which is quite interesting, and necessary to keep in mind when interpreting multiple p-values, is widely acknowledged in the field, which is why False Discovery Rate methods started to be used as far back as the 90s. This initial point was first published as a "The sky is falling, what are all you idiot medical researchers doing?!" type of paper by Ioannidis, which is a great way to make a name for oneself. However, even his own interpretation did not hold up well, and he has stopped pushing the point. Summarizing an extensive comment on Metafilter [1]

>Why Most Published Research Findings Are False: Problems in the Analysis >The article published in PLoS Medicine by Ioannidis makes the dramatic claim in the title that “most published research claims are false,” and has received extensive attention as a result. The article does provide a useful reminder that the probability of hypotheses depends on much more than just the p-value, a point that has been made in the medical literature for at least four decades, and in the statistical literature for decades previous. This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies.Unfortunately, while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity. As we show in detail in a separately published paper, Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence.

>ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE" >A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components: >1) An assumption that the prior probability of most hypotheses explored in medical research is below 50%. >2) Dichotomization of P-values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design. >3) Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%. >Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings. We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.

[1] http://www.metafilter.com/133102/There-is-no-cost-to-getting...


What's infuriatingly ignored is that in that very same PLoS Medicine issue is a response to Ioannidis' work by Greenland, IIRC, that notes that by "False" he means the significance is wrong, but what's really of interest is the effect measure.

On a meta level, I've always wondered why we take a paper about most findings being false as clearly correct.


It's true that effect sizes are often more important, but it's also true that they're also often incorrect. See e.g.

Ioannidis, J. P. A. (2008). Why Most Discovered True Associations Are Inflated. Epidemiology, 19(5), 640–648. doi:10.1097/EDE.0b013e31818131e7

Most studies are underpowered and are incapable of detecting the true effect. Only if they get lucky and observe an abnormally large effect will they obtain a statistically significant result, so the published results tend to be significant overestiates.

For another good example, see

Gelman, A., & Weakliem, D. (2009). Of beauty, sex, and power: statistical challenges in estimating small effects. American Scientist, 97, 310–316.

http://www.stat.columbia.edu/~gelman/research/unpublished/po...


I think part of the point there is not to pass effect estimates through a significance test filter first. Most studies are underpowered to detect a true effect at alpha = 0.05. That doesn't actually suggest that most studies are wrong as much as if a study is underpowered and doesn't find a significant finding, we assert its dull and uninteresting.

Ironically, the Ioannidis paper is in Epidemiology, which is a journal that is fairly anti-significance testing, but where I still get reviewers suggesting that an effect measure with a confidence interval that brushes against the null must mean nothing at all.


>On a meta level, I've always wondered why we take a paper about most findings being false as clearly correct.

Because in science, not believing things is the default state. If you say, "most published findings are false", you're really saying, "most of the time we have to accept the null hypothesis", which is what we all not-so-secretly believe regarding everything, all the time, in any case.


On a meta level, I've always wondered why we take a paper about most findings being false as clearly correct.

This is a fair question. I think the reasons the Ioannidis paper was persuasive are that

1) Ioannidis replicated earlier results about the lack of replication of most research reports,

and

2) Ioannidis "showed the work" for how possible, and indeed likely, it is for an effect size that permits a false-positive finding to be published, under reasonable assumptions about the prevalence of false-positive findings and publishing practices. Most scientists were vaguely aware of lack of replication years before anyone heard of Ioannidis, but not many scientists were fully aware of how readily a false-positive finding can be published.


A good opportunity for mentioning Benford's Law: http://en.wikipedia.org/wiki/Benford%27s_law#Scientific_frau...


Maybe an opportunity for mentioning Simpson's Paradox: http://en.wikipedia.org/wiki/Simpson%27s_paradox


Good point!


The problem with their reasoning is that it relies on a very high prior that the hypothesis is false.

In fact an explicit analysis of the prior over the hypothesis and the power of the test, would be roughly equivalent to the informal discussion that goes along with the statistical results.

The main issues in my opinion are that the number and nature of studies that produce null results if unknown, and that there is a bias in the literature towards positive results. While this bias incentivizes researchers to use powerful tests, it comes at a big cost.


IMO what is damaging to society is that there is no PR to science, so the press takes things that are not yet fully understood or verified by the academic community and publishes it as confirmed (studies say, confirm, etc.)

Groundbreaking results give much more press than their rebuttal, for example, see neutrinos faster than light or that arsenic consuming bacteria both of which were later dismissed in academic circles, but did not enjoyed the same treatment from the media.


That sounds very much like the UK's Science Media Centre: http://www.sciencemediacentre.org/about-us/ . How good a thing the SMC has been is another question.


I can't imagine what process you have in mind to put right what you see as a problem. Who is the arbiter of the press coverage? You are assuming unrealistic habits on the part of the passive observer of the results of research. Life isn't spent checking out media coverage or providing it with checks and balances unless there's a major issue involved which impinges on people's lives. The solution to your perceived difficulty is to check the facts for yourself from original sources and not expect some arbitrary secondary source to keep you up to date because that is something they are never going to do.


so... non-scientific articles about scientific research are true? probably not.



i see. but then again, wouldn't their findings imply their own published results are likely false?


For me, the really remarkable thing about this graphic is that it doesn't even support the headline: the number of false positives is a minority of total positives in the given example: 45 / 125 = 36%


It's just basic Bayesian reasoning. Most of the positive HIV test results are false positives. Even when the P value is less than 0.01 or so.


Look. Clearly science makes progress that's "true" in the sense that it becomes useful and can be used as functional models of the way things work.

This video's using a very reductionist kind of statistics to point out that, yes, an individual piece of research making a claim might have a good chance of being wrong. Which is why science doesn't say, "Oh, well Larry just proved that Saturn orbits Uranus so let's just never think about that again and instead move on to proving that the Sun is fully powered by excess heat radiating off of Elon Musk's brain." Science is a process that works in aggregate, using a large volume of research and scientists checking one another as a way to smooth over this very imperfect process. Over time. Science checks itself. That's the whole point. That's why it reaches some pretty damned good conclusions about the way things work.

So.

I don't know what the point of this video is. Science is wrong? Scientists are stupid? The Economist is smart? I should believe the Republicans when they say the Earth couldn't possibly be warming because that one time it snowed in a part of Texas where it never really snows all that often?


> Science checks itself.

Not that much, actually. There's no point in reproducing the work of others to advance your own scientific career. Unless your research directly directly depends on them, it is counterproductive.


In the important areas, it does check itself. If a scientist has been doing some interesting work, she will soon find herself with colleagues working on the same area. If she has not been doing something interesting, that line of research does not get cited, dies out, and does not get funding.

In active learning, there's a tradeoff between exploration and exploitation. Science does this too, but heuristically.


Not necessarily true. Ioannidis did a review of 49 studies cited more than 1,000 times in the medical literature -- prime candidates for being replicated or tested by future results. Of those,

- 16% were found by later studies to be wrong

- 16% were found to have exaggerated the size of effects they claimed to detect

- 44% were replicated, and

- 24% were unchallenged and unreplicated by later literature.

So a full quarter of incredibly prominent research was never tested.

Ioannidis, J. P. A. (2005). Contradicted and initially stronger effects in highly cited clinical research. JAMA, 294(2), 218–228. doi:10.1001/jama.294.2.218


Thank you for a very informative and helpful reply! However, I would like to see this work validated before I fully accept his conclusions.

Regarding the 24%, if you look in Table 2 of that study, of the 12 that were "unchallenged and unreplicated", the notes for 10 list follow on studies that would reveal problems with the initial study, were the initial study to be "wrong."

That leaves 4% that are "unchallenged and unreplicated" yet have 1000 citations. So I would be quite interested to know the take of the researchers in the field on those papers, and if they would agree with Ioannidis' assessment on those 2 papers.


If she has not been doing something interesting, that line of research does not get cited, dies out, and does not get funding.

Could you explain how this verifies results? Citation (alone) isn't verification. And its safe to say followers and funding correlate with one another, at least statistically.


Say someone publishes a paper that has the conclusion: If you preform procedure X, you get carbon nanotubes with property Y.

Lets then go on to say that someone publishes a paper which states: carbon nanotubes with property Y will theoretically allow us to do Z.

If Z is something interesting and worthwhile, it won't be long before people read both papers and decide to try using procedure X to make Y nanotubes and then measure their ability to do Z. If no one can actually get to Z, then one or both of those papers was wrong, and the field will recognize that. The papers may not be formally retracted, but everyone will move on and stop referencing the incorrect papers.


You're reversing what I'm trying to say, I'm saying that verification or invalidation causes citations to occur, not that citation is verification. And that work that doesn't get validated dies out. Even work that gets "invalidated" in some sense can reveal other truths, or necessary changes in paradigms.

This is why scientists check the citations of a paper when considering its contents. It's important to ask if other people followed up on this, and what have they found if they did follow up? New papers, without any citations, must be held in a state of meta-information, until there's follow up papers. Old papers with few citations, and no validation citations, must also be considered as in a state of meta-information. Sometimes, really important things like Mendel's genetics get lost for decades in this state, until they are rediscovered, but it's fairly rare.


Work often gets cited without being either replicated or invalidated.

Do you do any kind of research? You seem to have an unrealistically rosy perception of the scientific process as it happens concretly.


>Work often gets cited without being either replicated or invalidated.

I'm not sure why you're bringing up this fact, which is not at all inconsistent with my comment. What's your implication?

>Do you do any kind of research? You seem to have an unrealistically rosy perception of the scientific process as it happens concretely.

In the past decade, I've spent probably 70% of my time on scientific research. At this moment I'm procrastinating from writing a letter of support on a grant. What, in particular, do you think I'm unrealistically rosy about?


> I'm not sure why you're bringing up this fact, which is not at all inconsistent with my comment. What's your implication?

In that case I may have misunderstood your point. What I mean is that, for a paper with 100+ citations (which, in some fields, is not rare), most of them are not verified by the authors.

> In the past decade, I've spent probably 70% of my time on scientific research. At this moment I'm procrastinating from writing a letter of support on a grant. What, in particular, do you think I'm unrealistically rosy about?

The self-correcting nature of the process is very slow in most cases. Bad results end up being forgotten for minor findings, but for things of mild interest, it may linger far longer.


Actually there are loads of scientific papers only reviewing other papers (usually a set of them), trying to infer conclusions and criticizing them on methodology and rigidity. Scientific careers advance both in quality and in quantity of papers published. Not everyone gets featured on Science or Nature.


This is actually what I'm doing for my undergrad senior thesis right now. It's in theory-heavy CS, but there are some results that are based on some moderately difficult math, and experimental evidence of those results, which I'm independently verifying. So, it seems to apply. It's not the most exciting thing to be doing in the field, but it's more interesting than most of my other classes, and can be used for later work.


Agreed. Try getting funding for verifying a paper's results. If you can't publish results, it'll be a hard time convincing anyone that your work is worthwhile.


This is extremely misleading and feeds an anti-intellectual notion that scientists are just lying to everybody.

First, it perpetuates a common claim of those who don't practice any sort of science: that the output of scientific studies is an enumeration of true/false claims determined with statistical inference logic. (Science media/blogs really aren't helping on this one.)

Second, the math is just wrong: the space of hypotheses is infinite, so it's impossible to say what fraction of these are true.


I don't think the technicalities of the size of the hypothesis space or its uncountability matter. We do studies over large numbers of hypotheses, of which we know a priori only a few are likely to be true. Genome-wide association studies, for example, will study thousands of genes knowing that only a few will have true correlations.

In any case it is an entirely reasonable assumption that of all tested scientific hypotheses, a small fraction are true. We often do not expect them to be true. (A clinical trial may test thirty or more hypotheses, such as "this medication causes diarrhea," and may expect most of them to be false. But they have to check just in case.)


Isn't the space of already enumerated and studied hypotheses finite?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: