I wish science is that simple. The methods section only contains variables the authors think worth controlling, and in reality you never know, and the authors never know.
Secondly, I wish people say: "I replicated the methods and got a solid negative result" instead of "I can't replicate this experiment". Because most of the time, when you are doing an experiment you never done, you just fuck it up.
Here is an example: we are studying memory using mice. Mice don't remember that well if they are anxious. Here are variables we have to take care of, but they are never going to go to the methods section:
Make sure the animal facility haven't cleaned their cages.
But make sure the cage otherwise relative clean.
Make sure they don't fight each other.
Make sure the (usually false) fire alarm haven't sound for 24 hours.
Make sure the guy who was installing microscope upstairs have finished producing noise.
Make sure there is no irrelavant people talking/laughing loudly outside the behaviour space.
Make sure the finicky equipment works.
Make sure the computer don't start to fucking updating windows half way during experiment.
Make sure the animals love you.
The list can go on. Because each one of this happens, the animals are anxious, and they don't remember. That's why if your lab just start to do something you haven't done for years, you fail. And replicating other people's experiment is hard.
reply
Maybe the format needs to change. Perhaps journals should require video, audio commentary or automated note taking for publication.
I've seen this time and time again while working in neuroscience and hearing the same from friends that are still in those fields.
Data is often thoroughly massaged, outliers left out of reporting and methods tuned to confirm, rather than falsify certain outcomes. It's very demotivating as a PhD student to see very significant results, but when you perform the same study, you don't find reality to be as black and white as published papers.
On this note, the majority of papers is still about reporting significant results, leading to several labs chasing dead ends, as none of them can publish "negative" results.
We've been doing a lot of data visualization and it often happens that someone comes to me with a thinly veiled task that's really to prove this or that person/process is at fault for delaying a project or something.
Sometimes though the numbers either don't support their opinion or even show a result they don't like and so inevitably they have me massage the graphs and filters until they see a result that looks how they want it to and that's what gets presented at various meetings and email chains.
The information at that point isn't wrong per se, just taken out of context and shown in a persuasive (read: propaganda) rather than informative way.
They could even smoothen the process by giving the draft a 'Nature seal of approval' that the authors could use to get other institutions to replicate their work, and add a small 'Replicated by XX' badge to each publication to reward any institution that replicated a study.
Funders of studies might improve the quality of the research they paid for by offering replication rewards. I.e. 5% of all funding goes towards institutions who replicated results from research they funded.
Of course there would still be some wrinkles to iron out, but surely we could come up with a nicely balanced solution?
Supplementary materials is where you put raw data and the 'howtos'. It is not just a place to cram in extra figures that wouldn't fit.
The Wiki article on the Reproducibility Crisis cites a Nature survey that makes it seem like the issue is widespread through every industry, including the hard sciences like physics and engineering: https://en.m.wikipedia.org/wiki/Replication_crisis#General
Software Engineering has Continuous Integration, since it is so expensive to fix software later in the day.
Is there any such thing as Continuous Reproducibility?
Constantly checking that the science can be reproduced?
How prevalent is this in different branches of Science?
The good news is that you can't really fake proofs or formal analysis. But the truth is, many folks in the area do cherry pick use case examples/numerical validation as much as you see in other disciplines. Perverse incentives to publish, publish, publish while the tenure clock is ticking keeps this trend going I think.
I'm going to bet that in competitive research branches, for practical applications, that have objectively verifiable results, most studies will, in fact, be reproducible.
Scientific experiments usually need actual things to be manipulated in the real world. So I think a concept of Continuous Reproducibility may only applicable to a subset of science that can be done by robots given declarative instructions.
Case in point: the very first thing I thought of is, does this have any relevance to the field of climate science!
So....does it? Because we're told the reason we have to get on board with the program is because the people telling us the facts are scientists, and scientists are smart and trustworthy. However, we know this is not always true, don't we.
So what is a deliberately skeptical person to think?
> Most scientific studies cannot be replicated by peers
.. which is more to the point.
Are most studies conducted by folks in academia before they get tenure? That is, are most of these results that they're trying to replicate or study done by people who are rather new to doing studies? Is this even possible to know? My guess would be yes, but really I don't have much to base that on. And if that is a yes could that have something to do with the problem here?
My guess would be that most small studies are set and conducted by people under immediate evaluation. I don't have any good guess about large ones.
Well, maybe I'm too much of a layman, but that doesn't quite seem to add up. Is not calling it fraud about protecting people's egos and saving face?
Or is it like if an accountant completely screwed up all his work and got the numbers wrong, but it was because they were a buffoon- not a fraudster? I guess that would need a different word than fraud.
Meanwhile, no one controlling funding sources or faculty appointments cares that you did amazing, rigorous work if it leads to less interesting conclusions. This is especially true if you generate null results, even though this work may have advanced your field. This puts in place a dangerous incentive system.
Another thing which is not mentioned is that the level of detail provided in many methods sections in papers is not sufficient for adequately reproducing the work. This can be due to word limit constraints or because people forget to include or aren't aware of key steps which are impacting their results. I've been on projects where seemingly irrelevant steps in our assay prep significantly impacted the resulting experiment outcomes.
Anyone trying to do the right thing goes out of business and someone cutting corners get's their business.
So a tragedy of the commons style "collective action" problem.
I'm not a scientist in anything but the colloquial term used as description for a curious and interested person, but when I spent time as a sysadmin at a genetics lab I actually had to read papers as part of the job.
I had previously held "science" up on a pedestal, but I quickly learned that bad science abounds even in reputable publications, and is rarely called out (mostly because scientists use publication to further careers largely based on name-on-paper count).
These days, every time I hear some scientist say "I've been published $largenumber of times," I think to myself 1/3 are probably impossible to reproduce, and 1/3 are probably "I developed this field specialized technique so I get a name drop but didn't actually participate in the study."
Ironically, it says this as a bad thing, but in an ideal world this would be 100%.
It would be like saying "2/3rds of coders have reviewed their colleague's code and found bugs". Since bugs are basically unavoidable, the fact that 1/3 haven't found any points more in the direction that they're not looking hard enough.
edit: pretty much everyone seems to have taken this the opposite way to how I intended it, but re-reading I can't figure out why that is the case. I'll try to re-phrase:
Science cannot be perfect every time. It's just too complex. This is why you need thorough peer review including reproduction. But if that peer review/reproduction is thorough, then it's going to find problems. When the system is working well, basically everyone will have at some point found a problem in something they are reproducing. This is good because that problem can then be fixed and it will become reproducible or be withdrawn. The current situation is that people don't even look for the problems and no-one can trust results.
edited again to change "peer-reviewed" -> "peer-reviewed including reproduction"
Reproducibility is a core requirement of good science, and if we need to compare it to software engineering, the reproducibility crisis is like the adage "many eyes make all bugs shallow", when the assumption that there is many eyes even looking is often untrue. Most studies are never reproduced, but are held as true under the belief that if someone tried they could.
EDIT: You claimed that in an ideal world, 100% of experiments/studies would not be reproducible. This denotes a profound misunderstanding of the scientific process, or the whole basis of reproducibility. In an idea world, 100% of studies would be vetted through reproduction, and 100% of them would be reproducible. This is essentially the fundamental assumption of the scientific process.
Just like all developers will eventually find a bug in code they code review. This is different from all code they review having bugs.
What did you take away from my comment?
I can confirm, as a reviewer, that your methodology and analysis looks sensible, but the flaws may be deeper, and the fact that you didn't publish the 19 other studies that failed, but that this is the "lucky one", or that you simply cherry picked the data, is not something I can see as a reviewer.
This is especially true if the experiment is nontrivial to re-do.
I think this is the key to it, I'm suggesting that reproducibility should be part of considering something peer-reviewed, but of course as currently practised, that isn't true.
Of course in a software metaphor, that would probably cover both code review and QA, which is sometimes done by a different job role which further muddies the water.
It's the exact opposite of building a system, which is what coders do.
I wish science is that simple. The methods section only contains variables the authors think worth controlling, and in reality you never know, and the authors never know.
Secondly, I wish people say: "I replicated the methods and got a solid negative result" instead of "I can't replicate this experiment". Because most of the time, when you are doing an experiment you never done, you just fuck it up.
Here is an example: we are studying memory using mice. Mice don't remember that well if they are anxious. Here are variables we have to take care of, but they are never going to go to the methods section: Make sure the animal facility haven't cleaned their cages. But make sure the cage otherwise relative clean. Make sure they don't fight each other. Make sure the (usually false) fire alarm haven't sound for 24 hours. Make sure the guy who was installing microscope upstairs have finished producing noise. Make sure there is no irrelavant people talking/laughing loudly outside the behaviour space. Make sure the finicky equipment works. Make sure the computer don't start to fucking updating windows half way during experiment. Make sure the animals love you. The list can go on. Because each one of this happens, the animals are anxious, and they don't remember. That's why if your lab just start to do something you haven't done for years, you fail. And replicating other people's experiment is hard.
reply