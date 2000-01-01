Hacker News new | comments | show | ask | jobs | submit login
Most scientists 'can't replicate studies by their peers' (bbc.co.uk)
53 points by DanBC 47 minutes ago | hide | past | web | 38 comments | favorite





"all you have to do is read the methods section in the paper and follow the instructions."

I wish science is that simple. The methods section only contains variables the authors think worth controlling, and in reality you never know, and the authors never know.

Secondly, I wish people say: "I replicated the methods and got a solid negative result" instead of "I can't replicate this experiment". Because most of the time, when you are doing an experiment you never done, you just fuck it up.

Here is an example: we are studying memory using mice. Mice don't remember that well if they are anxious. Here are variables we have to take care of, but they are never going to go to the methods section: Make sure the animal facility haven't cleaned their cages. But make sure the cage otherwise relative clean. Make sure they don't fight each other. Make sure the (usually false) fire alarm haven't sound for 24 hours. Make sure the guy who was installing microscope upstairs have finished producing noise. Make sure there is no irrelavant people talking/laughing loudly outside the behaviour space. Make sure the finicky equipment works. Make sure the computer don't start to fucking updating windows half way during experiment. Make sure the animals love you. The list can go on. Because each one of this happens, the animals are anxious, and they don't remember. That's why if your lab just start to do something you haven't done for years, you fail. And replicating other people's experiment is hard.

> The methods section only contains variables the authors think worth controlling, and in reality you never know, and the authors never know

Maybe the format needs to change. Perhaps journals should require video, audio commentary or automated note taking for publication.

> The problem, it turned out, was not with Marcus Munafo's science, but with the way the scientific literature had been "tidied up" to present a much clearer, more robust outcome.

I've seen this time and time again while working in neuroscience and hearing the same from friends that are still in those fields.

Data is often thoroughly massaged, outliers left out of reporting and methods tuned to confirm, rather than falsify certain outcomes. It's very demotivating as a PhD student to see very significant results, but when you perform the same study, you don't find reality to be as black and white as published papers.

On this note, the majority of papers is still about reporting significant results, leading to several labs chasing dead ends, as none of them can publish "negative" results.

I wonder if paying grad students to write a more full paper that includes all the steps and the negative results would help. It wouldn't be something that is published right away, and perhaps it wouldn't need to be published. Maybe it would simply be a follow-up to the original paper. It would be a "proof" of sorts, provided by the authors. There are many students out there that would happily do this, I think. I know so many that clamor for even the slightest bit of work in their departments. I think it would also be beneficial to their future, teaching them about reproducibility and impressing upon them to continue this practice down the road. The current climate of publish-or-perish isn't going away anytime soon, and neither are the clean, pretty papers with only positive results. And that's fine. Those are the quick highlights. But the full studies still need to be out there, and I think this could potentially be a way to approach that necessity.

For what it's worth I see the same thing in enterprise app development.

We've been doing a lot of data visualization and it often happens that someone comes to me with a thinly veiled task that's really to prove this or that person/process is at fault for delaying a project or something.

Sometimes though the numbers either don't support their opinion or even show a result they don't like and so inevitably they have me massage the graphs and filters until they see a result that looks how they want it to and that's what gets presented at various meetings and email chains.

The information at that point isn't wrong per se, just taken out of context and shown in a persuasive (read: propaganda) rather than informative way.

Yeah, I used to do a lot of financial reporting for a medical group. It eventually got to the point that after the second "those numbers don't look right" that I started asking what they wanted the numbers to show so I didn't waste any more of my time.

It gets even worse as if you produce a follow-up paper for an improvement, you're generally expected to produce something better. If the original result doesn't hold up, the only alternative is more even fraud, I mean, data massaging.

I am a social scientist studying human behavior, and this is a huge problem in the field. Myself and my statistician friends who analyze the literature have basically concluded that most extremely "novel" and "surprising" findings in the literature aren't even worth trying to replicate (remember, replications cost money to run, so before you start you have to make some judgment about the likelihood of success.) This is especially true of the "sexiest" sub-topics in the field, like social priming and embodied cognition. If you want to learn more about this, the place to look is Andrew Gelman's blog: http://andrewgelman.com/

At this point, since papers like Nature and Cell are so important to scientists, could it be feasible for them to simply require any submitted paper to them would only qualify for publication if the results were independently replicated?

They could even smoothen the process by giving the draft a 'Nature seal of approval' that the authors could use to get other institutions to replicate their work, and add a small 'Replicated by XX' badge to each publication to reward any institution that replicated a study.

Funders of studies might improve the quality of the research they paid for by offering replication rewards. I.e. 5% of all funding goes towards institutions who replicated results from research they funded.

Of course there would still be some wrinkles to iron out, but surely we could come up with a nicely balanced solution?

One heuristic I use for scientific papers is "Does it have supplementary materials and what are their quality?"

Supplementary materials is where you put raw data and the 'howtos'. It is not just a place to cram in extra figures that wouldn't fit.

I thought the reproducibility crisis was limited to fhe social sciences and some areas of health/medicine, this is the first article I've seen that claims it is a general problem through all of academia.

The Wiki article on the Reproducibility Crisis cites a Nature survey that makes it seem like the issue is widespread through every industry, including the hard sciences like physics and engineering: https://en.m.wikipedia.org/wiki/Replication_crisis#General

Hmmm.

Software Engineering has Continuous Integration, since it is so expensive to fix software later in the day.

Is there any such thing as Continuous Reproducibility?

Constantly checking that the science can be reproduced?

How prevalent is this in different branches of Science?

In applied mathematics, the idea of having a standard platform for releasing numerical experiments and standard datasets have come and gone over the years. My advisor said that in the early 2000s, there was a push in some areas to standardize around Java applets for this in a few journals, but never really took hold. Nowadays I would think some form of VM or container technology could probably do the trick while avoiding configuration hell. Commercial licensing for things like MATLAB or COMSOL etc. would be the real challenge for totally open validation in a lot of disciplines. Proprietary software is way more prevalent in scientific and engineering disciplines than I think many general software developers realize.

The good news is that you can't really fake proofs or formal analysis. But the truth is, many folks in the area do cherry pick use case examples/numerical validation as much as you see in other disciplines. Perverse incentives to publish, publish, publish while the tenure clock is ticking keeps this trend going I think.

The problem is that many branches of science don't have any immediate pressure to produce something that is usable by people outside the field, and sometimes not even peers. So they do what is needed to get out papers, and bothering about eliminating false results goes against that interest.

I'm going to bet that in competitive research branches, for practical applications, that have objectively verifiable results, most studies will, in fact, be reproducible.

In the physical sciences experiments can be enormously expensive to run and doing them "continuously" is impractical. Groundbreaking work is usually verified independently, but it varies across fields. E.g. physics is usually quite careful about reproducing new physics before accepting it, while in the biological sciences it seems that work isn't always reproduced.

>Constantly checking that the science can be reproduced?

Scientific experiments usually need actual things to be manipulated in the real world. So I think a concept of Continuous Reproducibility may only applicable to a subset of science that can be done by robots given declarative instructions.

This is ego, politics and career ambitions undermining modern science. Unfortunately, the fact that this is occurring so rampantly will bolster anti-intellectuals and give them a very potent argument to point to when presented with facts. This is a systemic failure of basic ethics that will hurt us all. The success-at-all-cost career mindset is toxic in all tracks, but this is one of the most dangerous for it to take hold in.

> will bolster anti-intellectuals and give them a very potent argument to point to when presented with facts

Case in point: the very first thing I thought of is, does this have any relevance to the field of climate science!

So....does it? Because we're told the reason we have to get on board with the program is because the people telling us the facts are scientists, and scientists are smart and trustworthy. However, we know this is not always true, don't we.

So what is a deliberately skeptical person to think?

imho the headline should be the other way around:

> Most scientific studies cannot be replicated by peers

.. which is more to the point.

I was looking for this. The current title makes it sound like most scientists are incompetent.

I have a high level question that I don't see answered, but maybe I missed it?

Are most studies conducted by folks in academia before they get tenure? That is, are most of these results that they're trying to replicate or study done by people who are rather new to doing studies? Is this even possible to know? My guess would be yes, but really I don't have much to base that on. And if that is a yes could that have something to do with the problem here?

Most people doing science don't have and probably never will have tenure.

My guess would be that most small studies are set and conducted by people under immediate evaluation. I don't have any good guess about large ones.

Submitting this because it's come up on HN before and there's a few people who think it's limited to just social psychology or similar. But this report include eg cancer treatments.

Way for a startup offering alternative / unattached peer reviewing maybe?

>The reproducibility difficulties are not about fraud, according to Dame Ottoline Leyser, director of the Sainsbury Laboratory at the University of Cambridge. That would be relatively easy to stamp out. Instead, she says: "It's about a culture that promotes impact over substance, flashy findings over the dull, confirmatory work that most of science is about."

Well, maybe I'm too much of a layman, but that doesn't quite seem to add up. Is not calling it fraud about protecting people's egos and saving face?

Or is it like if an accountant completely screwed up all his work and got the numbers wrong, but it was because they were a buffoon- not a fraudster? I guess that would need a different word than fraud.

A lot of it is people over hyping their results and cherry picking their data to fit a narrative. Can you blame them? You can literally build a career off a paper or two published in Science or Nature.

Meanwhile, no one controlling funding sources or faculty appointments cares that you did amazing, rigorous work if it leads to less interesting conclusions. This is especially true if you generate null results, even though this work may have advanced your field. This puts in place a dangerous incentive system.

Another thing which is not mentioned is that the level of detail provided in many methods sections in papers is not sufficient for adequately reproducing the work. This can be due to word limit constraints or because people forget to include or aren't aware of key steps which are impacting their results. I've been on projects where seemingly irrelevant steps in our assay prep significantly impacted the resulting experiment outcomes.

It's more like all the competitors in a market lowering their safety standards to cut costs. If buyers can't accurately assess value then it turn into a bad situation for everyone.

Anyone trying to do the right thing goes out of business and someone cutting corners get's their business.

So a tragedy of the commons style "collective action" problem.

It's because there is a huge problem with the scientific publishing business but because people don't want to admit it because it makes science seem "weak" these issues are largely ignored because of other conflicts of (self)interest.

I'm not a scientist in anything but the colloquial term used as description for a curious and interested person, but when I spent time as a sysadmin at a genetics lab I actually had to read papers as part of the job.

I had previously held "science" up on a pedestal, but I quickly learned that bad science abounds even in reputable publications, and is rarely called out (mostly because scientists use publication to further careers largely based on name-on-paper count).

These days, every time I hear some scientist say "I've been published $largenumber of times," I think to myself 1/3 are probably impossible to reproduce, and 1/3 are probably "I developed this field specialized technique so I get a name drop but didn't actually participate in the study."

"Science is facing a "reproducibility crisis" where more than two-thirds of researchers have tried and failed to reproduce another scientist's experiments, research suggests. "

Ironically, it says this as a bad thing, but in an ideal world this would be 100%.

It would be like saying "2/3rds of coders have reviewed their colleague's code and found bugs". Since bugs are basically unavoidable, the fact that 1/3 haven't found any points more in the direction that they're not looking hard enough.

edit: pretty much everyone seems to have taken this the opposite way to how I intended it, but re-reading I can't figure out why that is the case. I'll try to re-phrase:

Science cannot be perfect every time. It's just too complex. This is why you need thorough peer review including reproduction. But if that peer review/reproduction is thorough, then it's going to find problems. When the system is working well, basically everyone will have at some point found a problem in something they are reproducing. This is good because that problem can then be fixed and it will become reproducible or be withdrawn. The current situation is that people don't even look for the problems and no-one can trust results.

edited again to change "peer-reviewed" -> "peer-reviewed including reproduction"

It is a terrible thing, and it is absolutely nothing like finding bugs.

Reproducibility is a core requirement of good science, and if we need to compare it to software engineering, the reproducibility crisis is like the adage "many eyes make all bugs shallow", when the assumption that there is many eyes even looking is often untrue. Most studies are never reproduced, but are held as true under the belief that if someone tried they could.

EDIT: You claimed that in an ideal world, 100% of experiments/studies would not be reproducible. This denotes a profound misunderstanding of the scientific process, or the whole basis of reproducibility. In an idea world, 100% of studies would be vetted through reproduction, and 100% of them would be reproducible. This is essentially the fundamental assumption of the scientific process.

No, I claimed that all scientist would have had the experience of not reproducing something. Because if they do it a lot, as part of a regular process then they will eventually find something that doesn't work because the original scientist didn't document a step correctly or misread the results or just got lucky due to random chance.

Just like all developers will eventually find a bug in code they code review. This is different from all code they review having bugs.

I don't think that analogy is at all accurate and I think the conclusion that you reach from it is completely incorrect.

I think I must have expressed myself poorly, as I think my conclusion is the same as the article suggests i.e. that the science/code shouldn't be considered "done", until it's been peer-reviewed, since it's easy to fool yourself and others if you're not actually reviewing and testing your code.

What did you take away from my comment?

Peer review does not imply reproducibility, and it's the latter that is the problem.

I can confirm, as a reviewer, that your methodology and analysis looks sensible, but the flaws may be deeper, and the fact that you didn't publish the 19 other studies that failed, but that this is the "lucky one", or that you simply cherry picked the data, is not something I can see as a reviewer.

This is especially true if the experiment is nontrivial to re-do.

" Peer review does not imply reproducibility, and it's the latter that is the problem."

I think this is the key to it, I'm suggesting that reproducibility should be part of considering something peer-reviewed, but of course as currently practised, that isn't true.

Of course in a software metaphor, that would probably cover both code review and QA, which is sometimes done by a different job role which further muddies the water.

Science publishing is based in a review by peers system. All evident bugs (and many non so evident bugs) should be catched before to appear in a journal. Is totally different to standard journalism.

If there are errors in a study's methods that make it unreplicable, then it shouldn't have passed peer review or been published.

The whole point of an experiment is to isolate a single variable so you can test a falsifiable statement about it.

It's the exact opposite of building a system, which is what coders do.

