
Most scientists 'can't replicate studies by their peers' - DanBC
http://www.bbc.co.uk/news/science-environment-39054778
======
nacc
"all you have to do is read the methods section in the paper and follow the
instructions."

I wish science was that simple. The methods section only contains variables
the authors think worth controlling, and in reality you never know, and the
authors never know.

Secondly, I wish people say: "I replicated the methods and got a solid
negative result" instead of "I can't replicate this experiment". Because most
of the time, when you are doing an experiment you never done, you just fuck it
up.

Here is an example: we are studying memory using mice. Mice don't remember
that well if they are anxious. Here are variables we have to take care of to
keep the mice happy, but they are never going to go to the methods section:

Make sure the animal facility haven't cleaned their cages.

But make sure the cage is otherwise relative clean.

Make sure they don't fight each other.

Make sure the (usually false) fire alarm hasn't sound for 24 hours.

Make sure the guy who was installing microscope upstairs has finished
producing noise.

Make sure there is no irrelevant people talking/laughing loudly outside the
behaviour space.

Make sure the finicky equipment works.

Make sure the animals love you.

The list can go on.

Because if one of this happens, the animals are anxious, then they don't
remember, and you got a negative result which have nothing to do with your
experiment (although you may not notice this). That's why if your lab just
start to do something you haven't done for years, you fail. And replicating
other people's experiment is hard.

A positive control would help to make sure your negative result is real, but
for some experiments a good positive control can be a luxury.

~~~
JumpCrisscross
> _The methods section only contains variables the authors think worth
> controlling, and in reality you never know, and the authors never know_

Maybe the format needs to change. Perhaps journals should require video, audio
commentary or automated note taking for publication.

~~~
jamessb
> Maybe the format needs to change. Perhaps journals should require video,
> audio commentary or automated note taking for publication.

A 'world view' column in Nature suggested the same things last week [1]; the
author described a paper of theirs [2]:

> Yes, visual evidence can be faked, but a few simple safeguards should be
> enough to prevent that. Take a typical experiment in my field: using a tank
> of flowing water to expose fish to environmental perturbations and looking
> for shifts in behaviour. It is trivial to set up a camera, and equally
> simple to begin each recorded exposure with a note that details, for
> example, the trial number and treatment history of the organism. (Think of
> how film directors use clapper boards to keep records of the sequence of
> numerous takes.) This simple measure would make it much more difficult to
> fabricate data and ‘assign’ animals to desired treatment groups after the
> results are known.

[1]: [http://www.nature.com/news/science-lies-and-video-taped-
expe...](http://www.nature.com/news/science-lies-and-video-taped-
experiments-1.21432) [2]:
[http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12668/f...](http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12668/full)

~~~
mattkrause
I don't think this is particularly practical.

Most experiments run for _years_ (literally) and no one is going to record or
archive, let alone watch, years of footage to confirm that _one_ paper is
legit.

A brief experiment showing the apparatus and the collection of a few data
points might be helpful for understanding the paper, but I can't see using it
to verify a non-trivial experiment.

~~~
hackuser
> no one is going to record or archive, let alone watch, years of footage

Recording and storing years of footage shouldn't be a significant problem with
modern tech.

Nobody has to watch years of it; they can watch the parts they are interested
in. They also can watch at 4x and search, as needed.

------
thehardsphere
An alternative headline would be "Most Published Studies Are Wrong"

Am I wrong for considering this not-quite-a-crisis? It has long been the case
that initial studies on a topic fail to get reproduced. That's usually because
if someone publishes an interesting result, other people do follow up studies
with better controls. Sometimes those are to reproduce, sometimes the point is
to test something that would follow from the original finding. But either way,
people find out.

I mean, I guess the real problem is that lots of sloppy studies get published,
and a lot of scientists are incentivized to write sloppy studies. But if
you're actually a working scientist, you should understand that already and
not take everything you see in a journal as actual truth, but as something
maybe might be true.

~~~
skosuri
Yes! This is the point that most people miss. No scientist treats published
studies as gospel. Our focus shouldn't be on exact replication, and should be
on how generalizable such results are. If the results don't hold in slightly
altered systems, it falls into the wastebin of ideas.

~~~
lr4444lr
Maybe not, but these studies get swept up into meta-studies, books, think
tanks and special interest groups write policy papers based on that secondary
layer, which then become position papers that inform policy makers about Big
Decisions. There's a lot at stake when only scientists are aware of the
fallacy.

~~~
archgoon
The reason why meta studies get done is precisely because individual studies
aren't reliable.

------
jobvandervoort
> The problem, it turned out, was not with Marcus Munafo's science, but with
> the way the scientific literature had been "tidied up" to present a much
> clearer, more robust outcome.

I've seen this time and time again while working in neuroscience and hearing
the same from friends that are still in those fields.

Data is often thoroughly massaged, outliers left out of reporting and methods
tuned to confirm, rather than falsify certain outcomes. It's very demotivating
as a PhD student to see very significant results, but when you perform the
same study, you don't find reality to be as black and white as published
papers.

On this note, the majority of papers is still about reporting significant
results, leading to several labs chasing dead ends, as none of them can
publish "negative" results.

~~~
YCode
For what it's worth I see the same thing in enterprise app development.

We've been doing a lot of data visualization and it often happens that someone
comes to me with a thinly veiled task that's really to prove this or that
person/process is at fault for delaying a project or something.

Sometimes though the numbers either don't support their opinion or even show a
result they don't like and so inevitably they have me massage the graphs and
filters until they see a result that looks how they want it to and that's what
gets presented at various meetings and email chains.

The information at that point isn't wrong per se, just taken out of context
and shown in a persuasive (read: propaganda) rather than informative way.

~~~
6d6b73
I've seen something similar in my field - industrial automation and testing.
When a company wants to upgrade their testers, the testers we create are
usually much more precise when compared to something created 20-30 years
earlier. Often we have to modify our testers to match the results generated by
these old, barely working testers.These companies request us to do it simply
because otherwise they would need to change all of its literature, and explain
to their customers why the products have slightly different specs then what
they delivered last quarter.

Unfortunately, Our society is built on rotten foundations.

------
benrawk
I am a social scientist studying human behavior, and this is a huge problem in
the field. Myself and my statistician friends who analyze the literature have
basically concluded that most extremely "novel" and "surprising" findings in
the literature aren't even worth trying to replicate (remember, replications
cost money to run, so before you start you have to make some judgment about
the likelihood of success.) This is especially true of the "sexiest" sub-
topics in the field, like social priming and embodied cognition. If you want
to learn more about this, the place to look is Andrew Gelman's blog:
[http://andrewgelman.com/](http://andrewgelman.com/)

~~~
angry-hacker
Thank you. At first I thought puzzagate?! - I don't need that bs, but this is
very different pizzagate.

I once found a good blog about mental health and science, a lot of snakeoil
shown about srris, adhd etc. but I'm unable to find it now. Can anyone help me
out?

~~~
Fenrisulfr
Is it thelastpsychiatrist.com ?

------
AlexB138
This is ego, politics and career ambitions undermining modern science.
Unfortunately, the fact that this is occurring so rampantly will bolster anti-
intellectuals and give them a very potent argument to point to when presented
with facts. This is a systemic failure of basic ethics that will hurt us all.
The success-at-all-cost career mindset is toxic in all tracks, but this is one
of the most dangerous for it to take hold in.

~~~
mistermann
> will bolster anti-intellectuals and give them a very potent argument to
> point to when presented with facts

Case in point: the very first thing I thought of is, does this have any
relevance to the field of climate science!

So....does it? Because we're told the reason we have to get on board with the
program is because the people telling us the facts are _scientists_ , and
scientists are _smart_ and _trustworthy_. However, we know this is not always
true, don't we.

So what is a deliberately skeptical person to think?

~~~
sambe
Is this an intentional straw man? No, you're not told to trust the people
because they have a certain job title and trustworthiness. You're told to look
at the data, which is overwhelming and consistent across multiple years and
teams. It is the exact opposite of lack of reproducibility.

~~~
mistermann
> Is this an intentional straw man? No, you're not told to trust the people
> because they have a certain job title and trustworthiness.

Are you joking? Are you seriously making the claim that one of the persuasive
approaches used in the "public realm" (media, discussions, etc) _isn 't_ that
we should fight climate change because scientists have almost unanimously
decided it is a real thing and we must do something?

If scientists are telling us something, we sure as hell _should_ listen, at
least two reasons being they are the experts on the subject (why _wouldn 't_
you listen to experts), and the subject is so immensely complicated that an
average non-scientist person wouldn't have a chance of "looking at the data"
and forming a reasonably correct opinion.

But now you are telling _me no one is suggesting I listen to scientists_? I
could _easily_ google thousands of articles/papers/blog posts/internet
discussions where people are doing just that, but you are telling me no, that
content does not exist.

What is it about this topic where otherwise reasonable people seem to go off
the rails?

~~~
sambe
Your comment suggested that the title and reputation of a scientist was the
fundamental reason you are "told" (in a somewhat conspiratorial big-brother
fashion) to listen to them. And that - because sometimes mistakes are made -
you can't trust an overwhelming consensus. That's obviously not true, and
furthermore it's not the _fundamental_ reason to listen: the fundamental
reason is that it checks out. People have done gone and checked the
papers/data. There have been multiple systematic reviews of other existing
studies. It's not a single novel result. The massive consensus on this issue
_is_ the replication.

If you're not an expert and don't want to invest in becoming one, it's totally
rationale to trust a network of experts to - roughly speaking - do their work
properly. I'm sure you can find plenty of people advocating that. But my
default position would be not to trust a single novel result, regardless of
how smart or prestiged the authors were. Strong claims require strong
evidence. I rarely hear any scientist or advocate saying otherwise.

------
BickNowstrom
I am somewhat disappointed with the lack of replicability in the field of
machine learning and computer science. I think there is not much excuse for
releasing a ML paper on a new algorithm or modeling technique without a link
to a source code repo. Sure, your research code may not be pretty, but that
should not be a deal-breaker. I hope reviewers start rewarding papers with
links to source code. This should also stimulate refactoring, documenting, and
cleaning up the linked source code.

Also a standard unified process for replicability, reproducibility, and reuse
is needed. Dock points for not stating random seeds, hardware used, metadata,
etc.

~~~
Tsagadai
I have tried and failed to reproduce some findings on ML papers. Sometimes
graphs are being significantly smoothed or filtered which makes results look
better, other times core components of algorithms are not described and the
findings cannot be reproduced at all.

Source code, or at the very least proper pseudocode, should be mandatory for
all published computer science research.

------
godelski
So I can't be the only one who has noticed the correlation between this and
the field at question. As soft sciences like soc, psych, and medicine seem to
have the most problems with it. I'm not saying hard sciences like physics
don't, but it is less common.

The math for the soft sciences isn't as concrete and doesn't provide a good
foundation. I think there are also major problems with the use of p values. It
is too easy to manipulate and a lot of incentive to do so. Teach a science
class (even the hard sciences) and you'll see how quickly students try to
fudge their data to match the expected result. I've seen even professionals do
this. I once talked to a NASA biologist who I was trying to get his chi-square
value and took a little bit of pressing because he was embarrassed that it
didn't confirm his thesis (it didn't disprove it though. Just error was large
enough to allow for the other prevailing theory). As scientists we have to be
okay with a negative result. It is still useful. That's how we figure things
out. A bunch of negatives narrows the problem. A reduced search space is
extremely important in science.

The other problem is incentives in funding. There is little funding to
reproduce experiments. It isn't as glorious, but it is just as important.

~~~
BeetleB
>So I can't be the only one who has noticed the correlation between this and
the field at question. As soft sciences like soc, psych, and medicine seem to
have the most problems with it. I'm not saying hard sciences like physics
don't, but it is less common.

It is a problem in physics, although a "different" problem. See my comment:

[https://news.ycombinator.com/item?id=13715197](https://news.ycombinator.com/item?id=13715197)

~~~
godelski
Yes, but there is a huge difference in degree of problem. That's what I'm
getting at. It exists, but in the soft sciences it is much more rampant.
Compound that with the weaker analysis and the problem starts becoming that
you have to become skeptic of any result from the field.

Different degrees of the same problem.

~~~
lutusp
Another important distinction between physics and, say, psychology is that the
latter studies aren't testing a theory, they're testing an observation. A
particular observation sometimes leads to the widespread assumption that a
particular effect exists, but without anyone trying to shape a theory about
its cause, only that it exists. In physics by contrast, it's all about fitting
an observation into existing theory.

------
dekhn
So. Having been a scientist, I observed an interesting phenomenon, several
times. It's almost as if scientists enjoy leaving out critical details from
the methods section, and other scientists enjoy puzzling out what the missing
details are. I think there's this sort-of assumption of competence that for
any reasonably interesting paper, the people in the field who are reading it,
they have the level of skill to reproduce it even with missing information.

~~~
mattnewton
Do you think that has anything to do with the incentives for reviewers, who
want shorter papers with a length limit and prefer more details on the
impact/why of the experiment?

Does the same method-hiding hold true in journals without length limits or
different review processes?

~~~
cowsandmilk
I think length limits cause problems.

In the software world, Bioinformatics has 2 page application notes[1]. That is
nowhere near enough room to have a figure, describe an algorithm, and describe
results. In cases where the source code is available, I've found the actual
implementation often has steps not described at all in the algorithm. And
these differences make a clean room implementation difficult to impossible if
you want to avoid certain license restrictions.

Since it has been a decade since I worked in a wet lab, I'm less familiar with
examples in that world, but I know not offending chemical vendors is a concern
for some people in the synthetic chemistry world. At a poster session, they'll
tell you that you shouldn't buy a reagent from a particular vendor because
impurities in their formulation kill the described reaction. They won't put
that in a paper though.

[1]
[https://academic.oup.com/bioinformatics/pages/instructions_f...](https://academic.oup.com/bioinformatics/pages/instructions_for_authors)

~~~
wespisea
I just heard of a new NGS file format that should fix this

------
zeristor
Hmmm.

Software Engineering has Continuous Integration, since it is so expensive to
fix software later in the day.

Is there any such thing as Continuous Reproducibility?

Constantly checking that the science can be reproduced?

How prevalent is this in different branches of Science?

~~~
thearn4
In applied mathematics, the idea of having a standard platform for releasing
numerical experiments and standard datasets have come and gone over the years.
My advisor said that in the early 2000s, there was a push in some areas to
standardize around Java applets for this in a few journals, but never really
took hold. Nowadays I would think some form of VM or container technology
could probably do the trick while avoiding configuration hell. Commercial
licensing for things like MATLAB or COMSOL etc. would be the real challenge
for totally open validation in a lot of disciplines. Proprietary software is
way more prevalent in scientific and engineering disciplines than I think many
general software developers realize.

The good news is that you can't really fake proofs or formal analysis. But the
truth is, many folks in the area do cherry pick use case examples/numerical
validation as much as you see in other disciplines. Perverse incentives to
publish, publish, publish while the tenure clock is ticking keeps this trend
going I think.

~~~
SilasX
>The good news is that you can't really fake proofs or formal analysis.

It's my understanding that most published mathematical proofs aren't "hey look
at this theorem in first order logic that we reduced to symbol manipulation";
rather, they present enough evidence that other mathematicians are convinced
that such a proof could be constructed.

Is that incorrect?

~~~
tnecniv
I'm not a mathematician, but I had a math prof tell me that most publications
only contain proof sketches, not full proofs.

Can someone in the field comment?

~~~
kxyvr
I work as an applied mathematician. In general, I would say that this is
incorrect. Virtually all of the papers that I read I would contend have full
proofs. That said, I can sympathize with the sentiment in a certain sense.

Just because a paper contains a proof doesn't mean that the proof is correct
nor that it's comprehensible. Further, even if a paper went through peer
review, it doesn't mean that it was actually reviewed. I'll break each of
these down.

First, a proof is just an argument that an assertion is true or false. Just
like with day to day language, there are good arguments and bad arguments.
Theoretically, math contains an agreed upon set of notation and norms to make
its language more precise, but most people don't abide by this. Simply, very,
very few papers use the kind of notation that's read by proof assistant tools
like Coq. This is the kind of metalanguage really required for that precise.
Now, on top of the good and bad argument contention, I would also argue that
there's a kind of culture and arrogance associated with how the community
writes proofs. Some years back, I had a coauthor screaming at me in his office
because I insisted that every line in a sequence of algebraic reductions
remain in the paper with labels. His contention was that it was condescending
to him and the readers to have these reductions. My contention was that I, as
the author of the proof, couldn't figure out what was going on without them
and if I couldn't figure it out with all those details that I sincerely doubt
the readers could either. Around the office, there was a fair amount of
support for my coauthor and removing details of the proof. This gives an idea
of the kind of people in the community. For the record, the reductions
remained in the submitted and published paper. Now, say we removed all of
these steps. Did we still have a full proof? Technically yes, but I would call
it hateful because it would require a hateful amount of work by the readers to
figure out what was going on.

Second, peer review is tricky and often incredibly biased. Every math journal
I've seen asks the authors to submit to a single blind review meaning that the
authors don't know their reviewers, but the reviewers know the authors. If you
are well known and well liked in the field, you will receive the benefit of
the doubt if not a complete pass on submitted work. I've seen editors call and
scream at reviewers who gave "famous" people bad reviews. I feel like I was
blacklisted from one community because I rejected a paper from another
"famous" person who tried to republish one of their previous papers almost
verbatim. In short, there's a huge amount of politics that goes into the
review process. Further, depending on the journal, sometimes papers are not
reviewed at all. Sometimes, when you see the words "communicated by so-and-so"
it means that so-and-so vouched for the authenticity of the paper, so it was
immediately accepted for publication without review. Again, it varies and this
is not universal, but it exists.

What can be done? I think two things could be done immediately and would have
a positive affect. First, all reviews should be double blind, including to the
editor. Meaning, there is absolutely no good reason why the editor or the
reviewers should know who wrote the paper. Yes, they may be able to figure it
out, but beyond that names should be stripped prior to review and readded only
at publication. Second, arbitrary page limits should be removed. No, we don't
need rambling papers. If a paper is rambling it should be rejected as
rambling. However, it removes one incentive to produce difficult to follow
proofs since now all details can remain. Virtually all papers are published
electronically. Page counts don't matter.

In the long run, I support the continued development of proof assistant tools
like Coq and Isabelle. At the moment, I find them incredibly difficult to use
and I have no idea how I'd use them to prove anything in my field, but someday
that may change. At that point, we can remove much of the imprecision that
reviewers introduce into the process.

~~~
tnecniv
Thanks for the insight!

------
analog31
While I understand that the life sciences dominate science research right now,
it still annoys me when I read a headline about "most scientists" and the
article is exclusively about the life sciences. Even if the physical sciences
have reproducibility issues of their own, those issues may be different enough
to frustrate lumping all of the sciences together.

I suggest that future articles about the reproducibility crisis should either:
a) Specify "life science" in the title, or b) demonstrate that the
generalization is justified.

My field (physics) is certainly not perfect, but we do have a reasonable body
of reliable knowledge including reproducible effects. I work for a company
that makes measurement equipment, and we are deeply concerned with
demonstrating the degree to which measurements are reproducible.

------
tinco
At this point, since papers like Nature and Cell are so important to
scientists, could it be feasible for them to simply require any submitted
paper to them would only qualify for publication if the results were
independently replicated?

They could even smoothen the process by giving the draft a 'Nature seal of
approval' that the authors could use to get other institutions to replicate
their work, and add a small 'Replicated by XX' badge to each publication to
reward any institution that replicated a study.

Funders of studies might improve the quality of the research they paid for by
offering replication rewards. I.e. 5% of all funding goes towards institutions
who replicated results from research they funded.

Of course there would still be some wrinkles to iron out, but surely we could
come up with a nicely balanced solution?

~~~
virusduck
I think you vastly underestimate how much a lot of scientific studies cost. It
would be ideal to be able to have studies replicated by a separate group, but
there are many times labs that have specialized equipment and engineering that
make replicating studies by some random group unfeasible.

In addition, if you spend too much time trying to replicate others' work, you
have no time to work on the things that will actually continue to fund your
own research.

The best thing is to have a healthy, skeptical, but collegial, competition in
the field. That still requires more funding though!

~~~
RodericDay
> I think you vastly underestimate how much a lot of scientific studies cost

Do you have data for this claim? There's tons of extraordinarily expensive
experiments through and through, but there's also stuff with incredibly high
and time-consuming up-front design and exploration costs that is actually
almost trivial to replicate on a per-unit basis.

------
binaryzeitgeist
Specifically to the CS context, I think some version of double blinded peer
code-review should be made mandatory for a publication.

I've seen authors skip quite a many details that are quintessential to the
replication process.

In short, if research is not replicable by the peer community it just useless,
that's what it is.

~~~
jedberg
CS is the craziest of them all. Those should be the easiest to replicate.
"Here is the code, here is a manifest of the environment/container/disk
image/etc." You should be able to take that and run it and get the same
results.

Or are you saying that the code itself is the problem and that they've done
the equivalent of "return True" to get the result they want?

~~~
tnecniv
In my other comment I mentioned the CS results I've largely struggled to
reproduce is because they include enough detail for you to get the gist of how
it works, but not enough to avoid going down some rabbit holes. Also, not all
publications include code. Many venues don't require it.

------
freddref
This is a very unfortunate state of affairs for science in general, giving the
amount of tools we have to make an experiment more replicable.

Replication is perhaps simplest in the field of computer science, yet many
papers do not release the associated source code and configuration used in
experiments. It's very easy to make the code available to all so I find it a
bit dishonest and unscientific not to share an experiment.

There may be other things at play which prevent methodology being fully
shared, it may get too personal, for example experimental code is sometimes
developed quickly, and people may be reluctant to share their own messy code
for fear it might reflect badly.

------
JumpCrisscross
You can back the Center for Open Science, which runs these reproducibility
projects here [https://cos.io/about/our-sponsors/](https://cos.io/about/our-
sponsors/)

~~~
BDGC
Seconding the support for COS. When they test reproducibility, they actually
pair labs that are proficient in the particular technique with the original
authors. The two labs (the originator and reproducer) work together to try to
replicate the findings in the new environment. It's a smart way to work around
the problem of labs trying (and failing) new techniques.

------
DanBC
Submitting this because it's come up on HN before and there's a few people who
think it's limited to just social psychology or similar. But this report
include eg cancer treatments.

------
TheAceOfHearts
This isn't that surprising, at least based on my limited experience from
reading computer science research papers. My experience has been that there's
usually not enough information for you to implement something. Am I an
outlier, or have others experienced the same?

~~~
charles-salvia
I've spent hours/days/weeks implementing algorithms or data structures from
journal articles. No, you're not an outlier. A lot of times they don't include
source code, even in the age of Github. Certain details or assumptions are
often glossed over, and often the wording of a crucial technical detail is
extremely ambiguous. Sometimes the only way I'm able to successfully implement
the algorithm/data structure is through sheer luck - i.e. stumbling upon some
piece of information elsewhere that gave me the necessary insight to
understand what the author of the journal article meant.

I mean, I can understand academic time pressure and everything, but not
providing a link to source code in this day and age is almost absurd. At the
very least, it certainly doesn't encourage anyone to actually use your
research in industry.

------
saywatnow
imho the headline should be the other way around:

> Most scientific studies cannot be replicated by peers

.. which is more to the point.

~~~
StavrosK
I was looking for this. The current title makes it sound like most scientists
are incompetent.

------
maverick_iceman
I'm not sure that this is necessarily a problem. Most working scientists know
that preliminary studies are often wrong and just because it is published in
some journal doesn't mean it's true. If there's merit to it then the
scientific method will eventually settle upon the correct answer. This looks
like someone outside the field just came across this news and had a visceral
reaction, "Gasp! Most journal articles are wrong! We can't trust science!"

(Note that I'm talking only about hard natural sciences. Social sciences are
another whole can of worms.)

------
choxi
I thought the reproducibility crisis was limited to the social sciences and
some areas of health/medicine, this is the first article I've seen that claims
it is a general problem through all of academia.

The Wiki article on the Reproducibility Crisis cites a Nature survey that
makes it seem like the issue is widespread through every industry, including
the hard sciences like physics and engineering:
[https://en.m.wikipedia.org/wiki/Replication_crisis#General](https://en.m.wikipedia.org/wiki/Replication_crisis#General)

~~~
neffy
Oh it's general. Apart from anything else, a majority of all scientific papers
rely on some form of software, be it simple R statistical scripts, to complex
programs. None of those programs ever have bugs in them, right?

Then to be fair to the authors, scientific papers follow a fairly fixed
format, often with hard limits on paper length (cf. Nature, etc.). Putting all
the detail necessary for replication, and a decent literature review, and an
overview , is simple not possible.

And then the meta-problem - which is general - every aspect of science from
hiring to grant writing, to managing a phd. farm, to goofing off on
ycombinator..., essentially works against anybody trying to do detailed,
methodical, provable work, no matter how brilliant they are, because doing all
aspects of science properly is incredibly time consuming in a world with an
ever reducing attention span.

~~~
miles7
It is not a completely general problem. In the field I work in (computational
physics) we very often reproduce results published by other research groups.
It's very common in our field to see plots in a new paper overlaid with data
from old papers showing where our results agree (where they disagree is not a
reproducability issue but is instead due to our ability to do more accurate
calculations).

Here is a paper where some colleagues of mine wrote a benchmark paper
explicitly to check various methods against each other (with good agreement):
[https://journals.aps.org/prx/abstract/10.1103/PhysRevX.5.041...](https://journals.aps.org/prx/abstract/10.1103/PhysRevX.5.041041)

------
ZeroGravitas
_" Science is facing a "reproducibility crisis" where more than two-thirds of
researchers have tried and failed to reproduce another scientist's
experiments, research suggests. "_

Ironically, it says this as a bad thing, but in an ideal world this would be
100%.

It would be like saying "2/3rds of coders have reviewed their colleague's code
and found bugs". Since bugs are basically unavoidable, the fact that 1/3
haven't found any points more in the direction that they're not looking hard
enough.

edit: pretty much everyone seems to have taken this the opposite way to how I
intended it, but re-reading I can't figure out why that is the case. I'll try
to re-phrase:

Science cannot be perfect every time. It's just too complex. This is why you
need thorough peer review including reproduction. But if that peer
review/reproduction is thorough, then it's going to find problems. When the
system is working well, basically everyone will have at some point found a
problem in something they are reproducing. This is good because that problem
can then be fixed and it will become reproducible or be withdrawn. The current
situation is that people don't even look for the problems and no-one can trust
results.

edited again to change "peer-reviewed" -> "peer-reviewed including
reproduction"

~~~
endorphone
It is a terrible thing, and it is absolutely nothing like finding bugs.

Reproducibility is a core requirement of good science, and if we need to
compare it to software engineering, the reproducibility crisis is like the
adage "many eyes make all bugs shallow", when the assumption that there is
many eyes even looking is often untrue. Most studies are never reproduced, but
are held as true under the belief that if someone tried they could.

EDIT: You claimed that in an ideal world, 100% of experiments/studies would
not be reproducible. This denotes a profound misunderstanding of the
scientific process, or the whole basis of reproducibility. In an idea world,
100% of studies would be vetted through reproduction, and 100% of them would
be reproducible. This is essentially the fundamental assumption of the
scientific process.

~~~
ZeroGravitas
No, I claimed that all scientist would have had the experience of not
reproducing something. Because if they do it a lot, as part of a regular
process then they will eventually find something that doesn't work because the
original scientist didn't document a step correctly or misread the results or
just got lucky due to random chance.

Just like all developers will eventually find a bug in code they code review.
This is different from all code they review having bugs.

~~~
endorphone
While the wording may be vague, they aren't talking about the experiences of a
subset of researchers -- they are saying that of the experiments they tried to
replicate, 2/3rds weren't reproduceable. That is terrible, and has absolutely
nothing to do with finding bugs.

------
lutusp
Quote: "The issue of replication goes to the heart of the scientific process."

Indeed it does, but some fields don't think it's very important. For example,
there's a surprisingly widespread attitude among psychologists that it's
either a waste of time or an attack on the integrity of one's colleagues.

Example:
[http://jasonmitchell.fas.harvard.edu/Papers/Mitchell_failed_...](http://jasonmitchell.fas.harvard.edu/Papers/Mitchell_failed_science_2014.pdf)

Author: Professor Jason Mitchell, Harvard University.

Quotes: "Recent hand-wringing over failed replications in social psychology is
largely pointless, because unsuccessful experiments have no meaningful
scientific value."

"The field of social psychology can be improved, but not by the publication of
negative findings."

"Whether they mean to or not, authors and editors of failed replications are
publicly impugning the scientific integrity of their colleagues."

"Because experiments can be undermined by a vast number of practical mistakes,
the likeliest explanation for any failed replication will always be that the
replicator bungled something along the way."

It seems not to have occurred to Professor Mitchell that the original study's
result might also have resulted from someone bungling something along the way.

------
blakesterz
I have a high level question that I don't see answered, but maybe I missed it?

Are most studies conducted by folks in academia _before_ they get tenure? That
is, are most of these results that they're trying to replicate or study done
by people who are rather new to doing studies? Is this even possible to know?
My guess would be _yes_ , but really I don't have much to base that on. And if
that is a _yes_ could that have something to do with the problem here?

~~~
wycx
Most studies are done by people who need high impact publications to secure
funding to do more studies to produce high impact publications to secure
funding to do more studies...

Not having a permanent job just serves as additional motivation to get high
impact publications to secure a job.

------
makecheck
Replication needs more incentive, too: considered on par with, or even more
significant than, publishing any new results in a field. The incentive should
come on both sides (rewards for labs that reproduce results of other labs, and
rewards for scientists that publish results that are paired with clear
methods).

I’m afraid that research is starting to descend into a fight for a few measly
dollars, at any cost. If the results don’t really matter, you start seeing far
less important measurements like “number of publications” taking precedence,
which is a huge problem. At some point, if your lifeline depends on bogus
metrics and all the competing labs are publishing crap that no one reads and
no one can reproduce, are you forced to _also_ publish the hell out of
everything you can think of just to “compete” and stay funded? And at some
point, are you spending more time publishing useless papers and writing grants
begging for money, than time spent doing useful research? It’s a race to the
bottom that will harm the world’s library of scientific data.

------
marknutter
This is an especially troubling problem in the soft-sciences. I recently
learned about [post-treatment
bias]([http://gking.harvard.edu/files/bigprobP.pdf](http://gking.harvard.edu/files/bigprobP.pdf))
which is one of the bigger problems plaguing the social sciences. Avoiding
biasing an experiment when choosing the variables you control for is insanely
difficult to the point where I wonder if it's even possible.

The scary thing is that people will regularly cite soft-science publications
that align with whatever political agenda they may have, and anyone who dares
contradict the authority of those studies are shouted down as "anti-
intellectual" or other such nonsense.

I used to abhor people for leaning on their bibles to push their agendas but
I'm starting to see how secular people are basically doing the same thing.

------
jamesash
I wish they used more specific language than the blanket term "scientists".
Reproducibility is highly field-dependent. In organic chemistry, for example,
the journal Organic Syntheses only publishes procedures that have been checked
for reproducibility in the lab of a member of the board of directors.
[http://www.orgsyn.org](http://www.orgsyn.org) A few years ago someone
published a paper in a highly esteemed journal that was largely thought to be
bunk (oxidation of alcohols by sodium hydride). A prominent blogger live-
blogged his (unsuccessful) attempt to reproduce the experiment, as did others.
The oxidation was in fact due to the presence of oxygen, not due to sodium
hydride. The paper was retracted.

------
abledon
And so many arguments here on HN demand people provide links to studies to
back up their discussion sigh.

------
j_m_b
One heuristic I use for scientific papers is "Does it have supplementary
materials and what are their quality?"

Supplementary materials is where you put raw data and the 'howtos'. It is not
just a place to cram in extra figures that wouldn't fit.

~~~
wycx
Why is it that journals find it impossible to append the pdf supplement to the
pdf of the paper?

I recall seeing only a couple of journals than manage to do this for errata.

------
Akarnani
An interesting solve here is structuring research grants for outcome
replicability. The article implies a real % of grant money is wasted on
results that cannot be replicated.

Could look like X0% of grant money is held for future researchers who work on
replicating the outcome. And, wouldn't it be cool if Y% of the grant was held
back to be awarded to the researcher whose results were replicated?

This spend, while increasing grant sizes, has the effect of creating
replicable science which does more for Science and Society than small grants
that end up creating non replicable results.

------
digitalmaster
This is what happens when you ignore the incentives. I think forcing
professionals in any field to "just be moral" \-- "just do the right thing"
\-- "just ignore the inverse financial incentive"... I think this expectation
is in itself immoral; human will power has its limitations and just like
everything else, we are very good a justify immoral behavior when being moral
contradicts our short term needs.

Regulation is indeed complicated and imperfect.. but the solution isn't a
simple one -- its certainly not complete deregulation.

------
intrasight
Important scientific results should be replicated. My understanding is that
the scientific community now recognizes this and is trying to put in place the
institutional frameworks to make this possible. Assuming that gets put in
place, the next task is obviously choosing which experiments are important
enough to justify replicating. Would be cool to apply some AI so as to remove
the politics from that decision.

------
pmontra
How about adding reproducibility to the h-index? Let's say that a reproducible
paper is worth at least three times as much as a non reproducible one, maybe
more (because it's going to take more time to write.) Would that give
scientists an incentive to detail their methods so other scientists can
reproduce their findings? Would that give rise to reproducibility rings to
game the system?

------
dgellow
Are there projects (companies, organizations, something else) trying to solve
the problem or improve the situation? Maybe focus on a specific field as I'm
sure there is a lot of domains where reproduction is just too expensive or too
complex.

Reproducibility is a very important part of the scientific method, I would
love to contribute to or work on that kind of project.

------
MikeGale
So these results can't be replicated in controlled situations. Some of this
applies to how our bodies work, with many more "uncontrolled variables" than
experimenters even thought of.

In which universe should we guide our behavior and believe things based on
"science" like that.

Thought Experiment: Compare a mass murderer to a scientist in cases like this.

------
hellofunk
I noticed this phenomenon recently when reading a variety of articles on
evolutionary programming techniques. Each paper would present its benchmarks
in comparison to other techniques, which were also described in other papers.
Often, a paper would say "we could not reproduce the results in X paper."

------
abhirag
NPR money did a podcast on this --
[http://www.npr.org/sections/money/2016/01/15/463237871/episo...](http://www.npr.org/sections/money/2016/01/15/463237871/episode-677-the-
experiment-experiment)

------
vivekchandsrc
As a experimental researcher for the last 16 years, I don't find this
surprising. Publishing industry has becoming a scam. Wherein journals like
Nature or its sister journals are charging $1000s to be open access. They
don't care whether it is reproducible....

------
arca_vorago
It's because there is a huge problem with the scientific publishing business
but because people don't want to admit it because it makes science seem "weak"
these issues are largely ignored because of other conflicts of (self)interest.

I'm not a scientist in anything but the colloquial term used as description
for a curious and interested person, but when I spent time as a sysadmin at a
genetics lab I actually had to read papers as part of the job.

I had previously held "science" up on a pedestal, but I quickly learned that
bad science abounds even in reputable publications, and is rarely called out
(mostly because scientists use publication to further careers largely based on
name-on-paper count).

These days, every time I hear some scientist say "I've been published
$largenumber of times," I think to myself 1/3 are probably impossible to
reproduce, and 1/3 are probably "I developed this field specialized technique
so I get a name drop but didn't actually participate in the study."

------
EternalData
Information is always incentivized one way or another, and people will keep
things that contradict their narrative and incentives out of studies. That's a
key flaw in peer-review research that I think needs to be addressed.

------
arbuge
And even if they could, they have no time or funding to do so. You get your
PhD / postdoc paycheck / tenure / etc by publishing new and original research,
not by attempting to replicate previous results.

------
astevic
I think [https://www.sevenbridges.com](https://www.sevenbridges.com) is trying
to solve this problem..

------
daughart
Direct replication is a waste of time. Move forward, later when there is
better technology/data, it will either agree or disagree.

~~~
physicsyogi
Direct replication can be very important, at least in math and the hard
sciences.

In physics labs, students conduct experiments that once warranted a Nobel
prize. And some of the problem sets that physics grad students work on repeats
work that once won a Nobel prize.

When I was in physics grad school, a scandal erupted when Jan Hendrik Schön's
"breakthroughs" results on semiconductor nanostructures couldn't be
replicated. [1] He'd received a fair bit of acclaim, had one a couple of
prizes, and I heard there was even Nobel buzz for him. His papers were in
Science, Nature, and Phys. Rev. Letters. Several groups tried and failed over
and over to confirm his breakthroughs. It turned out that he had falsified
data. 28 of his journal articles were retracted by the publishers and others
are still considered suspect.

When a theory or experiment comes along that generates the kind of excitement
and interest that can lead to new technology or prestigious grants and awards,
replication is important. Science stands on the shoulders of what has come
before. We need to know that we're building on a solid foundation.

[1]
[https://en.wikipedia.org/wiki/Schön_scandal](https://en.wikipedia.org/wiki/Schön_scandal)

------
ylem
I think this can be very field dependent. I am a physicist and perhaps I am
overly optimistic. However, I find that the existence of supplemental
materials is helping things a lot. Generally, in an paper (in experimental
physics), you are trying to tell a story about your measurements and the
understanding that you gain from them. So, it's nice to be able to tell that
story, where you focus on the most relevant data. However, with supplemental
materials, you can add a lot of the sanity checks that you did (sometimes at
the prompting of referees) in the supplemental materials. For example, maybe I
want to tell a story about how in some material that by applying a voltage, I
can influence the magnetic properties of that material. There are a lot of
baseline measurements that you perform these days to determine that. In the
past, with page limits, you couldn't include all of that information, but now,
you can place a great deal more in.

In my particular field, my raw data is online, so that can be checked--though
without more meta data, it would be of limited use to someone without my
notebook and annotations. A lot of the code for the reduction tools that we
use is also open sourced as well. There have been some moves in science
towards reproducible data analysis, but the problem is
infrastructure/ecosystem. For example, let's suppose that I use a python code
to do some statistical analysis on my data, it could be hard for someone else
to try to reproduce that say 20 years from now--because they won't just need
the library I wrote, but the entire ecosystem that it lives in--I don't have a
good answer for that.

But, I think that for high profile results (again, I'm optimistic) in physics,
there's an incentive for other groups to try to reproduce them to see if they
hold or not. There have been cases where someone honestly thought that an
effect was there (and it was reproducible), but it was found out later that it
was due to something extrinsic--for example, many sources of Boron that you
buy commercially have trace amounts of magnetic impurities, so it took some
time before someone realized that this was the cause for a weak signal that
people were seeing.

In some communities, such as crystallography, you have to submit a file which
shows your results and it is automatically checked for chemical consistency. I
think this can help weed out some sloppy errors. But, it is still possible to
make mistakes.

Also, with journals like Nature Scientific Reports, it becomes feasible to
publish results that aren't so exciting, but are technically correct (it takes
a lot of time to write a paper and an even longer time to publish it and the
cost benefit analysis makes it difficult at times to publish everything, so
lowering the barrier to publication to technical correctness rather than
excitement helps people to publish null results so other people don't waste
their time).

There's also the question of where to draw the line as a referee. If someone
is publishing a methods paper where they have made a new analysis technique
that is going to be implemented in software that most people are not going to
check, then I feel obligated to check their derivations, reproduce integrals,
etc. For other papers, looking for consistency between different measurements
and the literature is probably sufficient.

There's still a lot of work to do, but I don't think things are completely
disastrous in physics. I recently had a colleague retire and he was very proud
of the fact that his data had never been wrong. The interpretation might have
changed with time, but the data itself was right. I think that's the best we
can hope for...

------
Tloewald
Click bait from the BBC. 2/3 of scientists have failed to replicAte a study is
not a statement that reinforces the headline. If 2/3 of people have had
trouble sleeping that does not mean most people can't sleep.

If you've ever tried a simple science experiment out of a book, it's easy to
screw one up no matter how sound the underlying science because you messed
something up. A bleeding edge study is several degrees of difficulty beyond
that.

~~~
lutusp
> ... it's easy to screw one up no matter how sound the underlying science
> because you messed something up.

Yes, but when a replication fails, the question one wants to ask is whether it
was the original study or the replication effort that caused the problem. J.
B. Rhine's seemingly solid psychic-abilities studies couldn't be replicated,
but this was because he kept tossing results that didn't fit his expectations,
something that didn't come out until after he passed on.

------
ddebernardy
Might this be solvable if high profile journals began to only publish
_reproduced_ results (by independent teams)?

------
DrNuke
Way for a startup offering alternative / unattached peer reviewing maybe?

------
huula
It's schrodinger's cat. When you observe, its state collapses.

------
SFJulie
Why are they called scientists then?

If you can't write a good modus operandi you are just a fraud.

If you cannot reproduce a well writen modus operandi, you are a fraud.

Thus take the failures divide it by 2 (odds wrong articles are as much as
wrong manipulators) and you got a 33% of lousy scientists

------
return0
I think the problem is concentrated in life sciences, other disciplines like
physics have way more strict statistical significance criteria. I wonder if a
quick-but-dirty solution could be to just require stricter statistical
significance criteria.

------
edblarney
" more than two-thirds of researchers have tried and failed to reproduce
another scientist's experiments, research suggests."

I understand there can be many reasons for this - but that doesn't take away
from how incredibly damning this is.

------
_Codemonkeyism
Simple. You can only publish to Nature etc. if your results are replicated
twice by others and you open up all your raw data.

Short time later only real science will be published.

The amount of published results will massively go down and Nature and other
will not makes as much money.

Now you know the reason why publishers publish results that can't be
replicated.

Or listen to anything from Feynman about real science e.g.

[https://www.youtube.com/watch?v=tWr39Q9vBgo](https://www.youtube.com/watch?v=tWr39Q9vBgo)

~~~
_Codemonkeyism
This comment got voted down.

Days later Nature posts "No publication without confirmation"
[http://www.nature.com/news/no-publication-without-
confirmati...](http://www.nature.com/news/no-publication-without-
confirmation-1.21509)

------
droopyEyelids
>The reproducibility difficulties are not about fraud, according to Dame
Ottoline Leyser, director of the Sainsbury Laboratory at the University of
Cambridge. That would be relatively easy to stamp out. Instead, she says:
"It's about a culture that promotes impact over substance, flashy findings
over the dull, confirmatory work that most of science is about."

Well, maybe I'm too much of a layman, but that doesn't quite seem to add up.
Is not calling it fraud about protecting people's egos and saving face?

Or is it like if an accountant completely screwed up all his work and got the
numbers wrong, but it was because they were a buffoon- not a fraudster? I
guess that would need a different word than fraud.

~~~
Thriptic
A lot of it is people over hyping their results and cherry picking their data
to fit a narrative. Can you blame them? You can literally build a career off a
paper or two published in Science or Nature.

Meanwhile, no one controlling funding sources or faculty appointments cares
that you did amazing, rigorous work if it leads to less interesting
conclusions. This is especially true if you generate null results, even though
this work may have advanced your field. This puts in place a dangerous
incentive system.

Another thing which is not mentioned is that the level of detail provided in
many methods sections in papers is not sufficient for adequately reproducing
the work. This can be due to word limit constraints or because people forget
to include or aren't aware of key steps which are impacting their results.
I've been on projects where seemingly irrelevant steps in our assay prep
significantly impacted the resulting experiment outcomes.

~~~
droopyEyelids
Do your motivations matter if you do incorrect work? In the example I gave, an
accountant can definitely face _heavy_ pressure from his employer to "make the
numbers work".

But if the numbers superficially "work" without adding up, who cares what the
motivation was? That is buffoonery or fraud.

~~~
DanBC
> Do your motivations matter if you do incorrect work?

YES! If I'm careless my results don't match the data and someone can catch my
mistake. If I'm trying to defraud you I'll fix this problem by making the data
fit, and it's much harder for people to find the mistake.

If I'm a careless accountant we can audit my spreadsheets and find the errors.

If I'm a crooked accountant I'll have deliberately hidden the "error" in shell
companies or offshore accounts, and this will be resistant to lower levels of
scrutiny.

------
lngnmn
Publications, funding and status are the goals, not validity. Science became
theology and metaphysics - joggling with fancy concepts and theories,
producing chimeras from "advanced statistics" and probabilistic models, etc.
The main assumption is that no one would even try to validate or reproduce the
results, because they are too messy and too costly (labor-intensive).

"Scientist" and "researches" do not even realize that things like "dimensions"
are mere concepts and does not exist in reality. Most of "scientists" could
not tell an abstract concept from an existent phenomena. Most of them cannot
explain the Correlation Is Not Causation principle. They believe that time
really exist and could be slowed down or accelerated. Or it could be part of a
crystal.)

The "studies" contains naive logical flaws, non-validated references, use of
statistical inferences as logical inferences, use of statistics to establish
causality and even Excel errors (unwanted auto-correction, rounding etc.)

This is only the tip of the iceberg - there will be turtles^W unreplicable
metaphysics-like crap all the way down.

