
The Unintended Consequences of Trying to Replicate Research - tokenadult
http://www.slate.com/articles/technology/future_tense/2016/04/the_unintended_consequences_of_trying_to_replicate_scientific_research.html
======
aab0
Doesn't wash. If replication attempts are subject to publication bias, we are
still better off because the subsequent meta-analyses overcome sampling error
better and are able to show publication bias using funnel plots or other
techniques like p-uniform, and sometimes can even correct the bias to give you
a less biased and more accurate estimate. In contrast, you cannot show
publication bias for a particular novel result nor can you easily correct for
it. So given the choice between 5 studies on a single hypothesis (all
afflicted by publication bias) and 5 studies on 5 hypotheses (with publication
bias), you are better off in the former scenario. (In the latter scenario, you
could try to use informative priors estimated from field-wide demonstrations
of publication bias, but at least so far, this is an extremely unpopular
approach.)

~~~
jessriedel
Yes, a much better headline would be "The value of study replications is
positive but suppressed without some combination of pre-registration and
result-blind refereeing".

~~~
frozenport
Pre-registration doesn't help because it simply gives the author's a narrative
to fake. As pointed out in the article, result-blind refereeing is officially
claimed by most journals, although from the rejection I have received this
does not appear to be the case in practice.

~~~
jessriedel
> Pre-registration doesn't help because it simply gives the author's a
> narrative to fake.

Like timroy, I don't know what your objection to pre-registration is. I'm
definitely talking about pre-registering the analysis, not just the data
collection.

> As pointed out in the article, result-blind refereeing is officially claimed
> by most journals,

Did I miss something? That doesn't seem to be what the article is saying:

> One proposed remedy is to modify the peer review process so that reviewers
> grade manuscripts on the quality of their introductions and methods sections
> rather than the novelty of the findings. Similarly, journals could assess
> submissions based solely on the rigor of their methods—as does Plos One.

PLoS One is an anomaly. This is not normal.

------
jessriedel
Tangential point: Because of frictions and biases, the need for high-power and
replicated research is even larger if you want to translate the science into
real-world action. I highly recommend this post by Holden Karnofsky of
GiveWell (and formerly of Bridgewater):

[http://blog.givewell.org/2016/01/19/the-importance-of-
gold-s...](http://blog.givewell.org/2016/01/19/the-importance-of-gold-
standard-studies-for-consumers-of-research/)

GiveWell is an organization that tries to estimate the impact per
philanthropic dollar of various charities, and they partially rely on
controlled trials of health interventions in the developing world.

> Chris Blattman worries that there is too much of a tendency toward large,
> expensive, perfectionist studies, writing: "...each study is like a lamp
> post. We might want to have a few smaller lamp posts illuminating our path,
> rather than the world’s largest and most awesome lamp post illuminating just
> one spot. I worried that our striving for perfect, overachieving studies
> could make our world darker on average."

> My feeling – shared by most of the staff I’ve discussed this with – is that
> the trend toward “perfect, overachieving studies” is a good thing...

> _Bottom line_. Under the status quo, I get very little value out of
> literatures that have large numbers of flawed studies – because I tend to
> suspect the flaws of running in the same direction. On a given research
> question, I tend to base my view on the very best, most expensive, most
> “perfectionist” studies, because I expect these studies to be the most fair
> and the most scrutinized, and I think focusing on them leaves me in better
> position than trying to understand all the subtleties of a large number of
> flawed studies.

> If there were more diversity of research methods, I’d worry less about
> pervasive and correlated selection bias. If I trusted academics to be
> unbiased, I would feel better about looking at the overall picture presented
> by a large number of imperfect studies. If I had the time to understand all
> the nuances of every study, I’d be able to make more use of large and flawed
> literatures. And if all of these issues were less concerning to me, I’d be
> more interested in moving beyond a focus on internal validity to broader
> investigations of external validity. But as things are, I tend to get more
> value out of the 1-5 best studies on a subject than out of all others
> combined, and I wish that perfectionist approaches were much more dominant
> than they currently are.

------
itsdrewmiller
Seems like replication attempts should have reverse publication bias by
default - they are much more interesting when they contradict the original
paper.

------
egjerlow
I really look forward to the day (because it seems it must come) when the
whole publication system is restructured into an open system. I have no idea
what it will look like (I.e. how merit will be calculated in such a system),
but for the reasons mentioned in this article and several others, I think it
will be a glorious day for science!

------
frozenport
At the heart of the problem is that "replication" experiment have the same
"novelty" or "sensationalist" demands required of regular research papers.

It's like the difficulty in creating "anti-art" and finding it exhibited in a
museum.

We need to get rid of this format and focus on deliverables, and more modern
forms of discourse such a collective knowledge curating like wiki pages, git
repositories, or patents.

~~~
collyw
I was thinking that this might cause some sort of self correcting mechanism to
come into place. If a study is useful, and others want to build upon it, it
will need to be replicated.

If on the other hand its just a novelty study that doesn't lead to any further
studies (and happens to be flawed), it will be forgotten about.

Curious if other think this would be the case?

------
ewjordan
An overaggressive pattern recognizer coupled with an overaggressive double-
checker (something that pattern matches for incorrect results) is still a
stronger system than the overaggressive pattern matcher alone. And depending
on the shape of the data, it can be far more effective than a single
statistically correct pattern recognizer. There's an interesting bit of math
here that I'm not sure has been laid out in any accessible fashion yet, but
I'd have to check around a bit.

The human brain more or less works because of this - it overtrains on every
signal it takes in, recognizing patterns that it has no statistical
justification to pick out. But since it's simultaneously (over)training the
process of _rejecting_ those false patterns, it all works out okay, and it's
actually far more effective than if it only made "proper" inferences from the
data coming in, at least in the world we live in.

~~~
furyofantares
> There's an interesting bit of math here that I'm not sure has been laid out
> in any accessible fashion yet, but I'd have to check around a bit.

I'd love it if you would.

------
nonbel
This is more of a problem if you use a weak definition of replication such as:
"statistically significant effect in the same direction". If instead the
effect sizes need to be similar to each other (+/\- some uncertainty), then a
correspondence between the multiple results means much more.

Of course, if people will just game this system to only publish results that
fit _whatever_ criteria then you will get a biased literature. Requiring a
more precise range of results to qualify just makes it more difficult.

I don't see how people gaming the system is a consequence of replicating each
others results though. That seems like it is due to deeper cultural problems.

