
Survey of scientists sheds light on reproducibility crisis (2016) - nabla9
https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
======
keldaris
As a working scientist, I feel both sides of the problem every day. Most
papers I come across turn out to be difficult if not impossible to reproduce,
and I'm sure some of my own papers have fallen into that group at some points
(although given that I'm a theorist, what "reproducible" means can be a bit
fuzzy sometimes). At the same time, I'm confused every time I see people
wondering about the scale of the problem or what to do about it. There is
absolutely no mystery here whatsoever.

Scientists are generally fairly smart people. Put smart people in a
hypercompetitive environment and they will quickly identify the traits that
are being rewarded and optimize for those. What does academia reward? Large
numbers of publications with lots of citations, nothing else matters. So,
naturally, scientists flock to hip buzzword-filled topics and churn out papers
as quickly as they can. Not only is there absolutely no reward for easy
reproducibility, it can actually harm you if someone manages to copy you
quickly enough and then beat you to the next incremental publishable discovery
in the neverending race. This is absurd and counterproductive to science in
general, but such is the state of the academic job market.

Anyone who purports to somehow address the situation without substantially
changing the fundamental predicates for hiring and funding in the sciences is
just wasting their breath. As long as the incentives are what they are,
reproducibility will be a token most scientists pay tribute to in name only.

~~~
cgiles
Biology postdoc here, seconded.

But what really gets me is the disconnect between "most scientists agree there
is a reproducibility crisis" and "most scientists believe most of the papers
they read are basically true and reproducible". This was mentioned in the
survey and it conforms to my informal experience of attitudes.

I do not know how you square that circle. Maybe, because of the pressures you
mention, we are all supposed to engage in an informal convention of pretending
to believe most previously published work is true, was done correctly, and is
reproducible even if we know damn well how unlikely that is. I find it hard to
do.

One day the public is going to cotton on to all of this. I cringe every time I
hear extremely authoritative lectures on "what scientists say" about highly
politicized public policy matters. These are not my fields, but if they are as
prone to error, bias, irreproducibility, etc, as my own, I'd exercise some
humility. It is one thing for we scientists to lie -- errr, project
unwarranted confidence -- to each other for the sake of getting grants, but it
is quite another to do it directly to the public.

But when the public _does_ figure it out, what do you think will happen to
funding? It will get tighter, and make the problem even worse. We need reform
from within before the hammer falls, and quickly.

~~~
irq11
_”One day the public is going to cotton on to all of this. I cringe every time
I hear extremely authoritative lectures on "what scientists say" about highly
politicized public policy matters....I'd exercise some humility.”_

You shouldn’t be cringing. You should be educating people about how science
actually works, and how it simply _doesn’t matter very much_ whether any
particular paper is reproducible. It’s a straw-man argument, because most
papers aren’t _worth_ reproducing. I don’t know many good scientists who take
what they read (in any journal) at face value. If you do, you’ve been mislead
somewhere during your training (fwiw, I also have a PhD). At best, papers are
sources of ideas. Interesting ideas get tested again. Most get ignored. Even
if a few hokum theories become popular for a while, eventually they’re
revealed for what they are.

The tiny percentage of subjects that rise to the level of public policy
discussion end up being so extensively investigated that reproduction of
results is essentially guaranteed. And yeah, you hear lots of silly noise from
university PR departments, but that stuff is a flash in the pan.

For example, nobody legitimate is in doubt of the broader facts of global
climate change or evolution or vaccination, even if 95% of (say) social
science reaults turn out to be complete bunk. Yet climate deniers, anti-
vaxxers and “intelligent design” trolls absolutely _love_ it when this
distinction is ignored, because it allows them to confuse the public on the
legitimacy of science _as a process._

~~~
inimino
It's true that science doesn't value every published paper equally, but it's
also true that publish or perish is creating ever-growing mountains of
worthless papers. This is a real problem, and drags the quality of everything
down.

Besides, the fact that there isn't one reputable journal in most fields that
remains untarnished by the replication crisis is both a practical problem and
a problem of public trust. A lot of this BS science is paid for directly by
the public's tax dollars, or else by their student loans. I wouldn't expect
the public to be so forgiving if 95% of it is bunk.

~~~
irq11
_“it 's also true that publish or perish is creating ever-growing mountains of
worthless papers.”_

Is this true? Prove your claim.

 _”This is a real problem, and drags the quality of everything down.“_

Let’s assume your first assertion is true. Is it automatically true that your
second claim follows? Why?

I see no evidence that the individual productivity of scientists has changed
much in the last 30 years, nor do I notice much of a change in the aggregate
quality of science. Crappy science existed hundreds of years ago, and it
continues to exist today. The main difference, as far as I can tell, is that
we have a lot _more_ scientists now.

In any case, these are just assertions, not arguments.

~~~
inimino
There have always been weaker scientists, but there hasn't always been the
economic incentive to publish in order to maintain a teaching position. This
is a relatively recent (few decades) thing and is due to structural factors in
academia and society. If you require proof, I'm not really sure what to say to
you, as evidence is not hard to find, but if you don't already see it, I'm
unlikely to change your mind.

~~~
irq11
If you want to claim that “publish or perish” (which, btw, has been a part of
academic life essentially forever) is somehow _recently_ affecting the volume
of papers being produced, you should be able to provide evidence of that in a
straightforward manner. One obvious test: is the per-capita rate of
publication increasing? (my experience says “no”, but I’m open to contrary
evidence.)

You have a hypothesis of what’s going on, but you’ve provided no evidence for
that hypothesis, and when challenged to provide some, you tell other people
it’s _their_ job to do it for you.

It’s not my job to prove your extraordinary claims.

~~~
inimino
The number of publications per year per academic seems to me to have increased
over the last 50 years. I don't have a citation.

Regardless, my original claim was that the absolute number of papers is
growing, and most of them are trash. I think the sheer volume of trash has
consequences that were not so serious 100 years ago, even if the percentage of
trash was the same. I strongly suspect the percentage of trash has been going
up, as well.

My argument is that "publish or perish" makes less and less sense the more
active scientists and researchers there are, even if the average quality and
the rate of publication per academic were constant, because the appetite and
rate at which research can be assimilated by society is limited, and does not
scale with population, while the number of scientists does.

I don't think these claims are extraordinary, and if you do, I'm not going to
go looking for extraordinary evidence to try to convince you. I don't think
I'm the only that sees these effects, however.

~~~
irq11
_”The number of publications per year per academic seems to me to have
increased over the last 50 years. I don 't have a citation.”_

Yeah, that’s not evidence.

The absolute number of papers _is_ growing - along with the number of working
scientists. There’s been huge growth in academic science since the 1970s.

------
javajosh
One factor that might increase reproducibility is to enhance negative
reputational effects of publishing an un-reproducible study. That is, it
should be so rare, so _outré_ , to publish something unreproducible, your
professional reputation is severely damaged, and if you do it again, you will
leave academia.

It sounds harsh, but severe self-regulation is what prevents the catastrophic
failure of any institution that is really only answerable to itself.
Professors aren't doing any favors by rubberstamping weak papers, or giving
weak students PhD's (or BS's for that matter).

EDIT: down-votes and no replies, as expected. Hey academics, sometimes to save
the patient you have to amputate the limb. I care about science and want it to
survive, so you guys continuing to be nice to each other even when you publish
BS isn't serving that purpose.

~~~
flukus
The problem is not being reproducible is not necessarily a failure and does
not necessarily indicate a bad actor, it may be bunk but it may show a
worthwhile direction for others to investigate further. The problems come when
an non-reproducible study gains some authority, citations in other studies or
published in the media without proper vetting.

It's probably worth punishing people that consistently publish UN-reproducible
work, but it shouldn't a rare career destroying event like you are suggesting.

~~~
javajosh
_> not being reproducible is not necessarily a failure and does not
necessarily indicate a bad actor_

I disagree with the first part and agree with the second. I would argue that
failure to reproduce is perhaps the worst kind of scientific failure, because
the activity cannot be called science anymore. And no, I don't mean anything
personal when I say 'failure', you don't have to be malicious to do bad
science -- after all 'doing bad science' is the _default_ human condition.

An analogy with classical music: A musician that fails to reproduce the score
with his violin consistently, will lose his job quickly. It doesn't mean he's
a bad person, but it does mean he's not good enough for the orchestra. The
metaphor for the "reproducibility crisis" in an orchestra is what happens if
they let bad players stay, the orchestra sounds worse and worse, and finally
the audience stops coming. In the apocalyptic scenario, cultural forces cause
ALL orchestras to stop honestly evaluating the skill of their players, and all
orchestras simultaneously lose their ability to accurately play any but the
simplest music.

Standards are painful for those that can't meet them, but they make the world
better, overall.

~~~
inimino
You can judge a musician in a 20-minute audition, but you can't tell if
research will reproduce without doing the work. However, there are papers that
are so vague and poorly written that they cannot possibly be reproducible, and
this is becoming the norm in some fields, so I think you're right that
standards would help there.

Also, sometimes strange things just happen randomly (like the CERN faster-
than-light neutrinos) and there's no shame in publishing results that aren't
correct as long as you're honest about it. Some early-stage, low-power studies
are going to randomly show results that don't reproduce, and that's also fine.
Suppressing these results would hurt progress just as much as making the
opposite error.

------
filleokus
Kinda-OT but: During my studies I've read papers from everything to
organisational theory (MBA stuff) to pretty "pure" CS (like compiler
construction etc), and have in that process read so many papers in many
different fields that just don't provide any insight or are barely
comprehensible.

Of course, sometimes it can probably be blamed on the reader (me) for not
having enough insight, but many times the papers are just crap.

My naive imagination of academic papers before university was that they were
like those I now read in the __top__ journals / conferences in each field:
concise understandable writing with actual insights and clear methodology that
can be repeated. But papers like these are probably something like less than
1% of everything published. Many papers seem to be almost approaching levels
of the Bogdanov thesis's [0].

To relate to the topic: I'm wondering if we'll see a shift in the academic
culture and system, where the pressure to publish is lowered (somehow), and
where focus is on quality and actually providing insight and reproducibility.

The worst possible thing is that bad science gets influential on false
grounds, which needs to be avoided. But increasing the signal-to-noise-ratio
is probably not a bad idea either. And I suspect that they will both turn out
to be improved by fixing the underlying incentive problems.

[0]:
[https://en.wikipedia.org/wiki/Bogdanov_affair](https://en.wikipedia.org/wiki/Bogdanov_affair)

~~~
allovernow
The bar for publication (like much of our media) used to be far higher.
Particularly with the push to get everyone through college, even at the cost
of lowering standards, there's a sort of regression to the mean with respect
to publication quality. It happens with any difficult, technical field that
initially attracts the more capable in society.

Hell, look at what the internet was back when it was still mostly needs with
passion.

~~~
watwut
College students don't produce papers. Published papers and how many students
go through college are unrelated.

~~~
alexgmcm
But you have to go through college to do a PhD so I guess they are correlated.

------
allovernow
I really think there's lost opportunity in the typical American master's
program. Typically for a master of science you need to perform original
research for your thesis, ostensibly to prepare you for the process of
publication. But I think a master's could be a far more useful stepping stone
between undergraduate academics and PHD or industry research if it were taken
as an chance to have graduates reproduce important research. The grade will be
based on the merits of the reproduction, which will be an excellent final
introduction to research work, while also hugely benefiting the community by
verifying previous research. Right now there simply isn't any incentive to
reproduce anything but the biggest results.

------
ne0ncowb0y
Worth mentioning that scientific research, even if difficult to reproduce, is
still better than, say, superstitious crap. Looking at you, chem trail antivax
climate deniers.

~~~
hyperdunc
Is it though? Some ideas are so obviously nonsense to most of us, whereas bad
research can have a more robust veneer of legitimacy - which makes that
research a potentially useful tool for bad actors. The internet is full of
authoritative sounding tripe backed up by "science".

~~~
asdff
Bad research doesn't usually hold up. Even good research doesn't always hold
up over time. Was Bohr a bastard for getting the atomic model wrong, or just
offering his best interpretation given what was known at the time?

~~~
pdonis
_> Bad research doesn't usually hold up._

Your faith in the reliability of the scientific process is touching, but
naive. Bad research is driving public policy in many areas.

 _> Even good research doesn't always hold up over time._

In the sense that it gets superseded by better models (as Bohr's original
model of the atom did), sure. That's to be expected in a healthy scientific
field. But one of the things that's needed to keep a field healthy in that
respect is the ability to do controlled experiments with high accuracy. That's
how scientists figured out that Bohr's original model of the atom wasn't right
--it couldn't match the results of the experiments as they got more accurate.

------
macawfish
My feeling is that this reproducibility crisis points to deeper stuff just
waiting to be collectively understood about how to use math in experimental
sciences.

So many researchers rely on assumptions of linear dynamics. So many
experiments and studies are designed without consideration of observer
effects.

Is it any wonder that a model might fit one day but not the next?

There is so much to be learned from applied topology, dynamical systems theory
and (quantum) information theory, but methods from these disciplines are only
just barely starting to become more widely accessible.

~~~
skybrian
Making the model more complicated doesn't add external validity. It seems like
the problem often comes from trying to make do with collecting a minimum
amount of data, collected in the most convenient way. (For example, doing
experiments on local college students.)

It seems like a practice of at least collecting the data at two different
colleges might help? But this would require more coordination.

~~~
macawfish
Just because alternative methods aren't widespread, it doesn't necessarily
mean they are "more complicated", just that rely on different assumptions.

I'm really excited, for example, about "model-free" time series methods
emerging from non-linear topological data analysis. Takens' embedding theorem
shows that low dimensional "shadows" of high dimensional attractors can be
constructed from a time series alone, and that they can reveal deep, useful
facts about system a time series is part of, even if that system defies
geometric modeling. These kinds of methods are radically different from
conventional statistical methods.

see here:
[https://www.youtube.com/watch?v=NrFdIz-D2yM](https://www.youtube.com/watch?v=NrFdIz-D2yM)

Previously, researchers found themselves flailing around with geometry, trying
to come up with models that accurately described the dynamical nature of the
systems they were studying (e.g. fish populations). A model might work well
for a few years, then stop working. These topological methods cut through and
get straight to the underlying relationships that drive various observables in
a complex system, without relying on fickle geometric assumptions about how
exactly those relationships will be expressed.

------
stewbrew
IMHO the simple problems with reproducibility (of statictical studies) can be
solved with pre-registration, which should become a standard. The rest really
is a system crisis.

Imagine: Scientist A uses (a statistical) method M to assert X. People praise
A until scientist B uses method M and cannot assert X (which BTW usually
doesn't imply X is wrong). People wait for scientist C to use method M and re-
check assertion X. People then take the majority vote (or meta-analysis) for
true.

That's the way much of science works today. It becomes a problem only in the
following situations:

0\. People hunt for statistically significant results.

1\. X is mostly irrelevant to anyone who does not belong to the in-group of A,
B, C, and peers. So nobody else really notices or cares about X.

2\. There is no other way to observe any meaningful consequence of X other
than by method M -- which eventually results in #1.

3\. Method M is really expensive and complex, it's an almost impossible
undertaking so that B or C most likely won't get a funding.

4\. Everything is ok, actually, but the people who pay for the study don't
care about A's methodological fineprint as long as the results play well with
their other goals.

The presented list of "factors that build reproducibility" focuses on #0, for
which pre-registration is a simple and clean solution. IMHO the list is much
to narrow and focused on academic practice, though.

------
ecmascript
Maybe because some of the "science" today is more about spreading some
political view rather than anything else.

For example, there are a lot of gender studies that are presented as a science
in my country which I believe is complete bullshit and an ideology.

People are pushing shit like that today and I believe we need more of the hard
sciences and less of the soft sciences.

------
m0zg
Another thing I'd like to ask from my fellow colleagues: please at least to
some extent detail in your papers the things that you've tried that didn't
work. I see this in my field (computer vision / deep learning) from time to
time, but very rarely.

There are typically ablation studies which aim to determine how much each of
the _successful_ improvements contributes to the result, but there's almost
never any mention of things that looked promising on paper but didn't pan out
in practice, nor is there any discussion of the reasons why, even though the
authors often have a good idea post-facto.

------
analog31
Reproducibility is referred to as a "crisis," but I'd like to know if it's
really a new thing. What if we tried to replicate the studies that surrounded
great developments such as electromagnetism or thermodynamics. Did those
advances emerge from an unbroken series of reproducible studies, or from a
tangle of good and bad results?

Before looking for root causes that invariably turn a cynical eye towards the
motivations of scientists, let's make sure the effects don't precede the
causes.

~~~
javajosh
I really like this idea of having students (not just advanced ones) reproduce
classic experiments (ideally _before_ getting the theory, so the class can
attempt to puzzle out what's happening). My sense is that early science was
very hands-on and readers of such results were also doers. Plus, doing the
experiment before theory puts them in the same spot as the scientist, and
might give them a little more respect for an achievement that is really quite
magical (and too often is taught in a kind of arrogant hindsight that implies
the discovery is/was sneeringly obvious to any human that draws breath)

------
Gatsky
Hmm... 40% of irreproducibility is due to fraud? Who in their right mind would
put in all the effort that science requires if they think everyone else's work
is commonly malicious bullshit?

Obviously this survey is biased towards people with a certain outlook.

~~~
fullshark
Academics thinking little of other academics is not surprising.

------
pochamago
Is there room for a genre of "failed to reproduce" journals? Contrarianism
seems like it might be powerful enough to support a journal whose only purpose
is to dispute major findings.

~~~
oldgradstudent
The problem is that's it's very easy to fall to reproduce a result.

Maybe even easier than the p-hacking used to produce an irreproducible result.

------
mensetmanusman
This is why I want to industry after MIT, science is really real when it
manifests as a useful technology that someone will pay for. Also, I didn’t
want to spend so much time writing grants...

------
JumpCrisscross
Is a 30% reproduction rate unusual?

We know 19th-century science was productive. Was it similarly plagued by
irreproducibility?

I’m sceptical of this argument. But we need a baseline to render judgement.

------
chrisbrandow
As a chemistry PhD, this sure looks a lot like the reproducibility is directly
proportional to the complexity of the underlying phenomena being studied.

------
sabas123
As somebody only used to CS research, can any explain to me how much the costs
of reproducing is considered into unreproducibility?

------
neutronman
There is a startup called [http://www.myire.com](http://www.myire.com) that is
working on this issue. They have a platform that provides an A-Z approach to
publishing research.

Their marketing is probably the worst in class but the CEO is a passionate
developer who has worked hard on this.

~~~
vermilingua
Really doubt that this is an issue that can be solved by one startup. It's a
procedural and cultural issue that affects the whole scientific world; not a
problem that can be solved by business or technology.

------
TheRealNGenius
From my experience, math/computer-science type research is almost always
exactly reproducible.

~~~
dragonwriter
> math/computer-science type research is almost always exactly reproducible.

Math, including computer science, is almost entirely pure logic, not empirical
science (despite the name of the latter), so it's not even in the same
epistemological domain where reproducibility is conceptually a concern.

------
naveen99
On the other hand we have the government saying they don’t want other
countries reproducing our results in AI, cryptography, genetics...

------
lowdose
How would the state of social science progress if every academic had all
Facebooks data available to them?

------
throwno
Weird how there can be a reproducibility crisis, but also climate change is
100% unquestionable science.

------
agumonkey
there were a few articles quoting the use of software like git, simple
formats, Jupiter notebooks, etc etc

is there anything of the sort that helped a bit ?

very lightly funny to see this when nix and guix (and pure fp) poping at the
same time in a very different context.

------
tomlockwood
I wish more people would map their understanding of the reproducibility crisis
to things like say, the sokal squared hoax. The current pressure to publish
regularly in academia is very high, and it results in poor work and journals
that don't do their due diligence, in every field.

------
m3kw9
Is there an incentive to make certain experiments harder to reproduce.

~~~
JorgeGT
One explanation that I've heard (often in terms of global south v. global
north) is to prevent more well-funded labs to quickly mirror your setup/method
and, given their higher means/staff, deplete that line of research leaving
your lab with no publications, patents, etc. and therefore no money. Science
as a whole may benefit, but individual scientists still usually appreciate
keeping their jobs.

~~~
asdff
Big labs don't work any faster than small labs, they just have more people to
chase more projects; generally it's the same amount of people per project as a
small lab. Chances are if you got scooped it was just because someone got
started with the idea before you did. After all, the future directions at the
end of the paper are usually active research by the time you are drafting and
submitting for review.

Getting 'scooped' isn't so bad in biology at least. If you were thinking about
XYZ and someone publishes X and Z, great. Cite that paper, now you don't have
to bother so much validating X and Z and can focus on strengthening your Y
argument or adding argument W to your paper.

~~~
JorgeGT
> Big labs don't work any faster than small labs

Not true, at least in my field. If $BigLab can throw 100k CPU cores at a
simulation it is going to finish way faster than $SmallLab 1k simulation. If
$BigLab builds 10 dedicated test rigs they are going to finish that parametric
study way waster than $SmallLab with 1 shared test rig.

Getting scooped may not a big problem for papers (except when you start
getting rejections because lack of novelty), but it is a big problem for
patents, licenses, royalties and such things that allow small labs to survive
when their governments have little to no money to spend.

------
cft
Note a spike in Earth science at 30% confidence. Probably reflects a lot of
vested interest funding politically beneficial results in this discipline.

------
MilnerRoute
This is from 2016.

Can we get one of the moderators to add (2016) to the headline?

~~~
tlb
Done, thanks.

------
jokoon
I guess it's time to evaluate and grade scientific findings?

It would be interesting to see how reproducibility varies by field,
university, country, etc. Although I guess scientific reputation would already
give a clue over the quality of scientific work?

