

Psychologists Strike a Blow for Reproducibility (2013) - jcr
http://www.nature.com/news/psychologists-strike-a-blow-for-reproducibility-1.14232/

======
a_bonobo
This article is a bit old (26 November 2013), for recent news on
reproducibility check the Reproducibility Initiative.

It's a movement by several companies and groups

Nature recently joined it, so if you want to publish a paper next year with
them you have to at least complete the checklist to make reproducibility
easier:
[http://www.nature.com/nnano/journal/v9/n12/full/nnano.2014.2...](http://www.nature.com/nnano/journal/v9/n12/full/nnano.2014.287.html)

They have also started to reproduce publications of voluntaries, here's a very
recent paper detailing their replication efforts:
[http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjourna...](http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0114614)

Article summarizing the paper: [http://www.nature.com/news/parasite-test-
shows-where-validat...](http://www.nature.com/news/parasite-test-shows-where-
validation-studies-can-go-wrong-1.16527)

------
Fede_V
Yeah - this is a sample of the 'gold standard' of psychology papers, and only
10/13 could be reproduced.

The reasons for shoddy reproducibility are p-value hacking, intense pressure
to publish at all costs, and a premium on 'gladwellesque' results where a
simple theory seemingly explains a lot.

Gellman and Uri Simon-Johnson have both written a lot about this.

~~~
ekidd
_Yeah - this is a sample of the 'gold standard' of psychology papers, and only
10/13 could be reproduced._

Having watched scientists at work, reproducing 10 of 13 high-profile studies
sounds pretty reasonable. Given p≤0.05, you'd expect 19/20 to be reproducible.
Then you add in other factors:

\- Surprising positive results get published more readily than negative ones.

\- Some results may be sensitive to tiny changes in methodology.

\- And more: "Why Most Published Research Findings are False"
[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/)

Some of these effects can be reduced with pre-registration of studies, and
with other methodological improvements. But in general, even when everybody
plays by the rules, a fair bit of garbage is going to slip through, because
it's _hard_ to eliminate all sources of error and bias.

So when a scientific field can say, "Hey, 3/4ths of our really interesting
results are real!", that's about what I expect when the processing is
_working_.

~~~
capnrefsmmat
> Given p≤0.05, you'd expect 19/20 to be reproducible.

No, that's not the case; follow your link to "Why Most Published Research
Findings are False" to see why that's not the case. The "positive predictive
value" of a discovery, even without any bias, is very different from what the
p value would imply.

~~~
ekidd
Yes, this is an excellent point. To take an extreme example, if there are no
true, interesting results to be discovered in a field of research, then any
study claiming such a result is by definition false. Analogously, if none of
your employees do drugs, all positive drug tests are false positives.

------
alexandros
While I'm very happy to see more attempts at replication, I am quite shocked
to hear the grandstanding that "reproducability is not as much of a problem".

Seen as a meta-experiment it is incredibly weak, with a tiny, biased sample.
High-profile results that have a simple enough procedure so they can be
combined.

As such, it can't get even close to support what they're (whoever -they-
actually is, it may not be the actual researchers) trying to claim. What about
less high-profile stuff? What about more complex setups? If 20% of your most
rock-solid results are not reproducible, I wouldn't be so quick to celebrate.
Imagine if 20% of basic physics or maths results weren't actually true...

~~~
tmalsburg2
Why did they include studies that have already been replicated innumerable
times? There's is no value in doing so apart from biasing the results toward
an outcome more favourable for psychology.

~~~
darkxanthos
It's important to read the study's goals... But I agree. If we really want to
understand how well the studies are being conducted there should be a random
sampling of studies from different behavioral affects.

------
riffraff
This could be equivalently titled "20% of the most well known psychology
results are impossible to reproduce".

Also, among the ones that did, there are the Kahneman ones, and _he_ was the
one to point out that most experiments are never reproduced so there was a
higher chance that his results would be reproducible.

~~~
stdbrouw
Your equivalent title would be equally misleading. It's not because an
experiment does not, in one particular attempt at replication, lead to
identical conclusions, that a finding is therefore "impossible to reproduce"
or bogus.

~~~
riffraff
I understood the point was these was not "one particular attempt" but rather a
larger set of attempted reproductions, have I misread?

~~~
stdbrouw
I guess it's debatable. You could consider a single experimental design,
administered by many different labs across many different countries, to be
"many replications", or you could argue that by using the exact same
questionnaire and keeping other conditions as comparable as possible, it's
just one big geographically dispersed replication.

(If you look at [http://www.talyarkoni.org/blog/wp-
content/uploads/2013/12/ma...](http://www.talyarkoni.org/blog/wp-
content/uploads/2013/12/manylabsresults.png) you see that the results for the
3 "failed" experiments did see some effect at some labs. It's only when
tallying up all the results that they had to conclude that replication had
failed for these three.)

This isn't just semantics, though. For example, if one question on the
questionnaire influences how survey takers answered subsequent questions, then
we might well fail to replicate the results of an older study. You might then
conclude that the older study isn't as generalizable as you thought it would
be and that you need more research to figure out when exactly the effect
occurs or not. You might even say that it's likely that the original study was
just a fluke. But it doesn't mean the original hypothesis has been definitely
and unequivocally refuted.

~~~
darkxanthos
To the idea of replicating the experiment, replicating it as closely as
possible is exactly what the statistical methods assume. The other concept
you're speaking to seems more like establishing causation or just generally
understanding when the observed effects do or do not result. That seems better
as a separate study altogether.

~~~
stdbrouw
You're right. I was referring specifically to the fact that in this
replication study, they've actually bunched together many different studies
into a single questionnaire that tests them all. So it's not a straight-up
replication, but then, in the social sciences replications almost never are.

------
tokenadult
The submission is from 26 November 2013. (I read the article when it was first
published and have read related articles about the Many Labs Replication
Project before.) The article kindly submitted here is by experienced science
writer Ed Yong and links to some helpful background reading, including his
report about Daniel Kahneman's open letter from 2012.

 _The Journal of Open Psychology Data_ published findings of the replication
study mentioned here,[1] and the PsycNET site of the American Psychological
Assocation provides a citation to the published version of the study findings
in a psychology journal.[2] Improving replicability in science is an ongoing
effort not just in psychology but in most branches of science, and is
critically important in medical studies.

[1]
[http://openpsychologydata.metajnl.com/article/view/jopd.ad/1...](http://openpsychologydata.metajnl.com/article/view/jopd.ad/11)

[2]
[http://psycnet.apa.org/journals/zsp/45/3/142/](http://psycnet.apa.org/journals/zsp/45/3/142/)

------
darkxanthos
All of their data and methods are all online here:
[https://osf.io/ebmf8/](https://osf.io/ebmf8/)

This is a very well done study and almost every criticism I've read in the
comments is addressed in it if you read the write up.

------
A_COMPUTER
The reproducibility initiative and its supporters have been called
"replication bullies."

[http://www.sciencemagazinedigital.org/sciencemagazine/23_may...](http://www.sciencemagazinedigital.org/sciencemagazine/23_may_2014?folio=788#pg16)

I kind of see their point, but in the end if your study can't be replicated,
you have to take your lumps.

------
UhUhUhUh
<rant> And I think that psychology began to die when it became obsessed with
statistics. Researching the mind has become an endless, bottom-up process with
very few ideas and zero grand idea. I'm not even talking about practice...
Being a psychologist myself, I'm bored through my skull with this illusion of
objectivity (the general linear model is only a theory) and the pathetic
little results it methodically crank out. The Rorschach is a test with very
low statistical properties. I love it, it is useful. And I can prove it one
case at the time. That's why I am learning computing and leave the tedious
gravy train of meta-analyses, rotated factor-analyses and manualized,
empirically validated methods behind me.</rant>

~~~
jules
Great idea, lets go on our gut feelings rather than science. Lets apply this
to other fields too. Driving a car will become much more exiting when it can
explode any second. Homeopathy is medicine with very low statistical
properties, but it's so much fun! I love it, it's so useful! And the car
engineers and pharmaceutical scientists will no longer have to do that boring
math and statistics. Lets leave empirically validated methods behind us!

Hint: if the statistics show that the results are pathetic, then the results
_are_ pathetic. Ignoring statistics merely sweeps that under the rug, it
doesn't fix it.

~~~
UhUhUhUh
An even better idea: Let's live according to probabilities. That will be fun
too. And there are tons of positive results that are pathetic. Study shows
that impact of hammer on foot causes sensation of pain compared to control
group (p<.001). Obsession with statistics is the culprit not statistics. It's
just a tool like another. Not a TOE.

~~~
pmoriarty
Like many "soft-sciences", psychology has had a serious case of physics-envy
since at least the days of Freud, with it only getting worse since. There were
some brief respites now and again, with the likes of Jungian and Humanistic
psychology. But these have largely been stamped out by scientistic,
mechanistic, reductionist approaches.

