
Scientists Replicated 100 Psychology Studies,Fewer Than 50% Got the Same Results - workerIbe
http://www.smithsonianmag.com/science-nature/scientists-replicated-100-psychology-studies-and-fewer-half-got-same-results-180956426/
======
mc32
One of the things that bother me about the news, say, npr, is that they often
cite studies to bolster their positions, but they never mention the confidence
in the studies nor alternate theories which may explain some of the undesired
state. No, it's always in terms of things which will whet their middle brow
supporters and pander to their pet agendas. Be it water rationing,
gentrification, organics, immigration issues, taxation, etc.

I mention npr because I listen to them but I'm assured by others it's the same
case with news on the right and abortion, religion, guns and the like.

Is there a way to escape the echo chambers?

~~~
joshstrange
If you find a way then let me know...

I find the best I can do is listen to/read multiple sources (even ones that I
strongly disagree with most of the time) and try to separate bias from facts.
This is by no means a guaranteed way to get the real story as I'm sure even
while trying my personal biases push me to believe some things as "facts" that
others would not. It reminds me of a character in the book "The Foundation"
who says he reads various historians accounts of things and from that tries to
decide what really happened and when pushed on this and asked why he doesn't
visit the places himself to see which is right he is taken aback and responds
with something along the lines of "how crude" while he clings to the notion
that what he does is the "scientific method".

Great things can be done by building on those who come before you (I do it
daily when I write code while I couldn't begin to write ASM nor do I have a
good mental model of the lower level stuff up to where I work) but there is a
danger in that as well. I worry about the "echo chamber" quite a bit (as I
used to hold wildly different views when I was growing up and living in the
echo chamber that was my household/parents) and I worry that the same thing
could be happening again.

Sometimes I wonder if Aaron Swartz had it right by ignoring the news [0]
altogether... Voting is where I disagree with him as while a "voting guide"
would be nice, finding a completely unbiased one is probably impossible. The
closest I think you can get is a quiz/survey on stances you personally take
and then matching you up with politicians that vote the same way
(consistently, which is a whole other issue) like ISideWith [1].

[0]
[http://www.aaronsw.com/weblog/hatethenews](http://www.aaronsw.com/weblog/hatethenews)

[1] [https://www.isidewith.com/](https://www.isidewith.com/)

~~~
marcosdumay
> I find the best I can do is listen to/read multiple sources (even ones that
> I strongly disagree with most of the time) and try to separate bias from
> facts.

Journalism is an area where past performance is relevant. Verify some of the
reported facts yourself, and you'll have a nicely selected set of sources that
you can trust to some degree.

~~~
romaniv
This is why we need effective media watchdogs that record past performance in
a consumable fashion. I don't know (m?)any websites that do it well, though.

------
arca_vorago
After spending time sysadmining in biotech and as a side-effect reading lots
of papers in the field, my confidence in the validity of many scientific
papers has shot way down. Coming from outside the scientific/academic arena,
my view is that the system has become a resume/job padding tool for far too
many people, who have simply learned how to A) hit all the main logic points
the publishers/journals look for so they get published, and B) find ingenious
ways to get their names attached to papers even if they were barely involved
in at all.

So tired of hearing X has been cited in over Y Z-journal papers! as a way of
touting their scientific merit.

~~~
DanAndersen
I suppose it's like any other metric, like software developers preening their
GitHub activity logs so they always look active and constantly working). With
so many PhDs and not enough positions, those in charge of hiring need ways to
quantify the relative worth/ROI of applicants. This leads to a focus on things
like impact factor, which correlate with but don't equate to what we actually
care about. As with everything, you get what you measure.

I'd like to learn more about the history of the Scientific Revolution, the
Royal Society and all that. Probably the politics and human-nature is constant
(Newton/Leibniz conflict is well-known), but I wonder if there was an impact
on the quality of science by the fact that a lot of the research was done by
old rich aristocrats whose livelihoods didn't depend on their scientific
results -- although I guess their reputations did depend on it. Is that
similar to or different from modern academics with tenure? Does the fact that
there's a tenure selection process end up selecting good academics, or those
who can work the system?

------
jtedward
Hello everybody, I was an intern at the Center For Science, who are organizers
of this reproducibility study. It's kind of strange to me that the COS's first
mention on HN is this study when they have been working on a large open source
project for a few years now, but it's obviously a welcome surprise. Anyway
check it out,
[https://github.com/CenterForOpenScience/osf.io](https://github.com/CenterForOpenScience/osf.io)

------
afarrell
Here is the paper (with ~190 co-authors!):
[http://www.sciencemag.org/content/349/6251/aac4716](http://www.sciencemag.org/content/349/6251/aac4716)

------
dj-wonk
I wish more scientists studied meta-analysis:
[https://en.wikipedia.org/wiki/Meta-
analysis;](https://en.wikipedia.org/wiki/Meta-analysis;) an area of study
specifically tasked with the challenge of synthesizing various study results.
Roughly speaking, each study is treated as a (weighted) observation and then
combined using statistics.

To put it another way, interpreting empirical results requires (or should
require) statistics from bottom to top.

~~~
astazangasta
To put it yet another way, overemphasis on statistics is a huge part of the
problem.

It is extremely difficult to get statistics right. Am I applying a relevant
test? Even if it is relevant, is it too sensitive? Is my dataset unbiased or
have I controlled for all of the significant biases?

Multiple test corrections are also some of the most useless statistical tests
you can perform. They're basically only useful for reducing confidence to
reasonable levels. Hundreds of bad tests cannot yield a useful scientific
inference through sheer volume.

Meta-analysis is interesting, but it's ultimately a shoddy attempt to patch
over the much larger problem of widespread methodological failure. We can't do
bad science for decades and then expect meta-analysis to yield new insights;
at best it can serve to expose the poor quality of that work.

~~~
dj-wonk
> Meta-analysis is interesting, but it's ultimately a shoddy attempt to patch
> over the much larger problem of widespread methodological failure.

I think I know what you are getting at. Let me try to unpack what I mean, and
I would like to see if you would agree...

I agree that "widespread methodological failure" is a problem with science in
the real world.

However, by your writing, it would seem that you attack meta-analysis as a
technique.

1\. Meta-analysis (MA) is a solid technique and not shoddy. I'm not saying it
works equally well in all situations, but I am saying that it works better
than not using it. :)

2\. MA cannot "correct" one underlying study that is flawed. (This is obvious;
doing statistics with a sample size of one is silly.)

3\. Nor can MA correct for a systemic problem in underlying studies.

4\. However, in the same way that statistics can be resilient over random
errors in measurement, MA _can_ handle studies of varying quality, provided
that they do not have significant errors pointing in the same directions.
(This is the often claimed, but less-often tested, assumption that errors are
randomly distributed.)

------
thaumaturgy
The Reddit thread on this article is pretty good:
[https://www.reddit.com/r/science/comments/3imphg/scientists_...](https://www.reddit.com/r/science/comments/3imphg/scientists_replicated_100_recent_psychology/)

There's a comment from one of the paper's co-authors that will probably need
repeating here
([https://www.reddit.com/r/science/comments/3imphg/scientists_...](https://www.reddit.com/r/science/comments/3imphg/scientists_replicated_100_recent_psychology/cuhtyv6)):
"To those wanting to dismiss psychological science as "cult science" based on
these findings, note how ironic your response is. You're discrediting the very
people whose data you are using to back up your claim. This massive,
groundbreaking project was conducted on psychological science by psychological
scientists. In my view, psychological scientists are among the most dedicated
and rigorous scientists there are. No other field has had the courage to
instantiate a project like this. And I am sure that many of you would be
shocked to find out how low the reproducibility rates are in other fields.
Problems of non-reproducibility, publication bias, data faking, lack of
transparency, and the like plague every scientific field. The people you are
labeling as "cult" scientists are leading the movement to improve science of
all types in a much needed way."

Also, the current top comment quotes the following from the paper: "Any
temptation to interpret these results as a defeat for psychology, or science
more generally, must contend with the fact that this project demonstrates
science behaving as it should. Hypotheses abound that the present culture in
science may be negatively affecting the reproducibility of findings. An
ideological response would discount the arguments, discredit the sources, and
proceed merrily along. The scientific process is not ideological. Science does
not always provide comfort for what we wish to be; it confronts us with what
is"

And another co-author chimed in
([https://www.reddit.com/r/science/comments/3imphg/scientists_...](https://www.reddit.com/r/science/comments/3imphg/scientists_replicated_100_recent_psychology/cui2nir)):
"The real interesting part is seeing how other disciplines hold up in terms of
reproducibility. A new project has been started: Reproducibility Project:
Cancer Biology, they will try to replicate 50 studies. I am very curious how
this will turn out, I highly encourage other disciplines to also start a
reproducibility project to test how consistent their findings actually will
be. I don't see these results as discouraging, instead, I see it as a big step
in developing scientific methods. Now we know which methods and standards
might be wrong, we can try to fix it (for example by developing guidlines)."

~~~
leereeves
Worth repeating:

> Problems of non-reproducibility, publication bias, data faking, lack of
> transparency, and the like plague every scientific field.

~~~
paulmd
It's a particular problem with the "soft" sciences, though. For example,
psychology is tough because you can't read people's minds, you have to go with
how it affects their behavior in some measurable way or how they feel about
it, and there's a tremendous number of confounding variables. Economics is
another tough one, because experiments are not isolated from exogenous
conditions and thus cannot really have a control. The whole thing is also
highly interrelated with psychology as well. I'd throw poli-sci in that
category too.

At least in physics somebody else can set up the same experiment and run it
again. Sure, the same tendencies to publication bias, etc apply, but usually
within a few years to a decade such things are discovered, corrected, and the
problem identified. It's much harder to identify what resulted in the change
of outcome for a soft-science experiment.

~~~
thaumaturgy
This is addressed a bit in the Reddit thread, where the comments from
practicing scientists seem to mostly agree with you. Mathematicians are
claiming they aren't affected at all, physicists are claiming that disproving
a major theory in physics would get you a Nobel instead of scorn, and the
bioscientists saying it's a bigger problem in their field.

That said, I think it's dangerous to give any branch of science an automatic
pass on this. The fundamental causes of the Decline Effect aren't limited to
the "soft" sciences, they just currently seem to have a more susceptible
culture.

------
DanAndersen
Replications often don't happen, because it's not considered "new," and who
wants to pay for something that's not new? Who can get accolades and titles
and journal articles and tenure and HIGHER IMPACT FACTORS out of the drudgery
of replication?

There are a lot of science popularizers out there who focus a lot on the power
of science, and its ability to self-correct. And given their audience, it's
probably a good message to get out there. It's a lot better than most other
approaches we've had throughout history. But I think they can give the wrong
impression of science (often spoken of as SCIENCE!!!! with exclamation marks)
as this process delivered to humanity on stone tablets from on high, and with
a little too much trust in the idea that "the arc of science is long but it
bends toward correctness."

Science is a frustratingly, painfully, agonizingly _human_ process, made up of
people in various organizations, with their own lives and goals and
objectives. It's a process where there's been enough of a correlation between
what its participants want, and what accuracy/truth/improvement demands, that
it's worked pretty well so far, and it gets good results eventually. But there
are also perverse incentives in organizations and personal failings in its
participants. Things like funders' quest for short-term results, the desire to
only back proven winners, the drive to publish or perish, etc etc.

Goodhart's law: "When a measure becomes a target, it ceases to be a good
measure."

I guess it's a little like the slow, slow process of natural selection. Given
enough time, you can get some pretty impressive, complex stuff out of it, but
you can also get stuck in a local maximum for millions of years.

I don't know what the solution is. Perhaps it would be nice for graduation
requirements for master's/PhD programs to require attempted replication of
some existing works -- but then there's the risk of selecting things that are
easy to replicate. Or maybe some mandatory percentage of research grants going
toward a common replication pool. Setting up a good incentives structure is
hard, because any institution that constrains itself by following new rules
"for the good of science" will find itself falling behind in comparison with
institutions who only work on the new exciting stuff and get more of the media
attention.

I feel fortunate that I'm in computer science, where a lot of our work is
inherently easily replicable due to the nature of code (of course, once it
gets into user studies there's so much wiggle room that the same problems
arise), and I always appreciate when a research group releases testable,
compilable, documented code alongside their papers. I'd like to see that sort
of practice be more mandatory in CS, but when official prestige comes from
journals only, the incentive structures don't reward CS researchers for making
maintainable, understandable code -- just code that gets the results and
pretty pictures. Academic code is infamous for being poorly-architected.

\----

Along some of these lines, it's always a good read to go through Feynman's
"Cargo Cult Science":
[http://neurotheory.columbia.edu/~ken/cargo_cult.html](http://neurotheory.columbia.edu/~ken/cargo_cult.html)

~~~
astazangasta
There is a profound problem with the scientific process that I have been
worrying about that rarely gets mentioned:

The appeal and power of science is that you propose a model and then test it
with evidence. With strong evidence you can validate the predictive efficacy
of your model.

However, the model is not "true", it is only predictive. Furthermore, it's
only predictive to the extent that the evidence backs it up.

The problem with this process is that it hangs _entirely_ on the evidence. The
models themselves do NOT emerge from the scientific process; they emerge from
whatever - prejudice, confabulation, logic, supposition, etc. In other words,
the original conjecture of a scientific model is merely a product of the pre-
existing beliefs of the investigator.

Coupling poor evidence to this sort of conjecture produces "science" that is
simply supposition or bias with a veneer of credibility. Given the widespread
abuse of methodology (p-hacking, poor understanding of statistics, small or
unrepresentative sample sizes, not reporting negative experiments, bad
experimental design) this means a lot of "science" is merely varnished
prejudice.

~~~
dj-wonk
I don't see this as a problem. Perhaps I define some concepts differently.

As I see it, the scientific method is a good way to test a model against
reality. It is less suited to be a "generative" process for making models.
(There are other ways to test models too: internal consistency,
interpretability, and so on.)

Iteration, intuition, creativity, variation, intelligence, luck, guesswork,
and more are what create theories.

I'm not saying one could not using more "technical" or "rigorous" ways to
construct models -- there are, of course, such methods. I'm just saying that
"science" (in the Chalmers sense) is about mostly about testability of a model
against evidence.

~~~
astazangasta
>Iteration, intuition, creativity, variation, intelligence, luck, guesswork,
and more are what create theories.

These are all very positive. How about: racism, misogyny, bias, jealousy, and
stubbornness. These, also, can feed into the creation of theories, and do.

And, again, the point is that this process is only as good as the evidence;
when the evidence is lacking, we're left with only the investigator's
expertise to justify the model. That's not science.

~~~
dj-wonk
Excellent point. Yes, the sum total of human experience filters into model-
making. People often construct theories that correspond with their
preconceived biases.

I would like to think that (a) "opening the lid" on how these models work and
(b) testing them does a decent job of revealing and removing inaccurate or
not-useful aspects. What do you think are some good ways to minimize the
negative influences on models?

------
sctb
Yesterday's discussion:
[https://news.ycombinator.com/item?id=10131387](https://news.ycombinator.com/item?id=10131387)

------
castratikron
What's the significance of the picture at the top of the article?

~~~
bjterry
It is a stock photo that has the title "Personality Disorder"[1] and has the
tag "psychology." So the editors or whomever searched shutterstock for
"psychology" and scrolled through, saw that they liked that one, and chose it
for the article. I doubt if there is any greater significance than that.

1: [http://www.shutterstock.com/pic-174194297/stock-photo-
person...](http://www.shutterstock.com/pic-174194297/stock-photo-personality-
disorder.html?src=jQL9o-dXz-L7wrc2j9AEAg-1-29)

------
joesmo
It's psychology. How is this surprising or even news at all?

~~~
onezeno
Just because something isn't surprising does not mean it isn't newsworthy.

