
Replication Controversy in Psychology - nkurz
http://www.slate.com/articles/health_and_science/science/2014/07/replication_controversy_in_psychology_bullying_file_drawer_effect_blog_posts.single.html
======
wyager
Now, I'm just a layman, so I could be completely off the mark about this, but
this is how it seems to me.

Doesn't every single generation of psychologists pretty much completely reject
the findings of the previous generation? All of Freud's stuff is now
considered total bullshit. All of the stuff psychologists said in the 50s
about homosexuality being a mental illness is now considered bullshit.

In other fields of science, like chemistry or physics, this has _never_
happened since the introduction of the scientific method. Scientists have been
wrong lots of times, but never has the majority of the body of scientific
knowledge been thrown out. Usually the changes are something small, e.g.
"Einstein was wrong about hidden variable theory in some respect." Even the
most drastic changes in scientific thought are usually just the narrowing down
from a number of existing theories to a smaller number of theories. In most
fields of science, we seem to converge on the correct result over time, with
very small deviations.

Psychology doesn't seem to be converging at all. Am I wrong? If not, why is
this?

~~~
pcrh
> All of Freud's stuff is now considered total bullshit.

This isn't true. Freud's enduring contribution to psychology isn't stuff like
penis envy in women or interpretation of dreams, but that thing which you now
likely consider a fact: recognition of the existence of the subconscious.

The impact of the subconscious, and the impact of life psychological history
on later behaviour (and the fact that recognizing that impact of personal
history can assist treatment) was not a recognized process before he
promulgated it.

~~~
andreasvc
You shouldn't confuse the subconscious with the unconscious. The former is
part of Freud's legacy and rejected by most mainstream psychology, while the
latter is widely recognized as an important concept.

~~~
pcrh
Fair enough; a confusion of terms. But isn't it still true that Freud was the
first, or among the first, to propose that the unconscious mind influences the
conscious one? And also that one can learn of someone's unconscious processes
by conversing with them?

------
idlewords
An additional problem with research into things like social priming (and the
attempts to replicate it) is that they all use a sample population of American
undergraduates.

This paper goes into interesting detail about why this is a terrible
methodology:
[http://hci.ucsd.edu/102b/readings/WeirdestPeople.pdf](http://hci.ucsd.edu/102b/readings/WeirdestPeople.pdf)

~~~
autodidakto
Thanks. I've been wondering for a while: How much do we know about human
psychology is based on studies performed only on 20-year-olds?

~~~
tedks
...well, quite a lot actually, because 20 year olds are more like other humans
than different.

Humans don't have life stages. We don't wrap ourselves in cocoons and change
from grubs into butterflies from ages 16-20. For most of human existence,
20-25 year olds were the substantial power holders.

There's a branch of psychology called developmental psychology. It has to do
with how humans change over time. If developmental psychologists only studied
undergraduates, we wouldn't have a good grasp of the area, but they don't.
It's pretty easy to get kids and elderly people to study. It's even pretty
easy to get adults to study. It's more expensive than undergraduates, but it's
doable.

Beyond that, what sort of differences do you expect to see in areas like, say,
input attention, working memory, fluid intelligence, social behavior,
perception, motor control, etc.? 18-22 year olds are pretty much normal adults
in all of those regards.

The thing that makes this really insignificant is that we know so little about
how minds work. Psychology has only really existed since Skinner and
behaviorism made it a real science. If I spent 10 seconds I could think of
nice, clever-sounding retorts as to why the perception system of a 22 year old
is vastly different from the perception system of a 32 year old, but in
reality we know almost nothing and learning anything is good. As Heinlein
said, it's wrong to think the world is a sphere, but it's much better than
thinking the world is flat.

These discussions about replication always sadden me, because they miss the
point by so much. Psychology is one of the least respected and yet most vital
sciences. The problem is that everyone is a lay expert, because everyone has a
mind and thinks they understand how minds work. Psychology has taught us so
many valuable, horrible and beautiful things about ourselves.

If you're reasonably intelligent, you can come up with reasons not to believe
anything. It's easy to discredit things. It's hard to build things.

(It's also false that psychology studies are only performed on undergrads,
usually undergrads are a good pilot testing base and then you move on to
externally recruited populations, unless there's really no reason to do so
based on the field.)

~~~
pacaro
This may be a "get off my lawn" comment, but my understanding is that
pediatric psychiatrists and psychologists will treat patients up to 25 years
old. The boundary between childhood and adulthood is labile and varies from
person to person. From my own experience, re-reading books like "100 years of
solitude" or "the unbearable lightness of being" at ten year intervals shows
me how much I have changed

~~~
tedks
...but the optic nerve of a 20 year old is the same from a scientific
perspective in almost all respects to the optic nerve of a 30 year old. Maybe
if that optic nerve signaled some horribly deep work of literature like 100
years of solitude, the older brain behind it would interpret it differently,
but that nerve is the same.

When you remember the amazing swelling feelings of being ~adult~ you had when
you last read 100 years of solitude, your semantic and episodic memory systems
are, functionally, identical to what they were when you read it the first
time. Maybe your fluid intelligence is a little less fluid, but that doesn't
really start until you've aged a lot.

I'm sure you've changed as a person. I'm sure you have a wealth of new
experiences and are entirely different from your past self. This is irrelevant
to the physics, chemistry, and psychology of your body.

~~~
pacaro
In general I agree with you. It is important to separate experience from
capability (there's probably a better word here).

Much of the social psychology may not fit easily into that division though. To
pick on a common whipping boy in these arguments, Milgram's obedience study,
what is the role of experience in any set of choices like those given to the
participants. The asymmetric responsibilities when there is a power dynamic is
something that some people learn somewhere along the line.

I'm rambling a little but I feel that some of the observed effects in these
experiments may be things for which experience may provide a learned immunity

~~~
tedks
Milgram's obedience study was tested on adult men, between ages 30-40 IIRC.
There were 19 later experiments, you wouldn't call them replications because
they weren't exactly the same, but they were all consistent, and they sampled
diverse (for the 50's) populations.

If you didn't know that, and were assuming it was tested on undergrads
exclusively, you should recognize that your assumption was wrong, and
propagate that through your belief graph.

------
danielweber
I was very surprised to not see Jason Mitchell at Harvard Univeristy's attack
on the concept of replication:
[http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm](http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm)

The thing is amazing to read, in the "someone who thinks he's a scientist
actually wrote this?" sense, like this gem:

 _Because experiments can be undermined by a vast number of practical
mistakes, the likeliest explanation for any failed replication will always be
that the replicator bungled something along the way._

~~~
tedks
He is a scientist; unless you're also a domain expert in his field with at
least as much expertise as him, or a newcomer with radically impressive work,
you can't just not call him a scientist because he disagrees with you. If a
domain expert disagrees with you, it is probably because you are wrong. That's
just the base rate and it's probably true here as well.

Traditionally, replication hasn't been done in psychology because the
experiments have to be set up in really clever complicated ways in order to
tease out effects. Even the most dedicated replicator can't ever fully
replicate a psychology experiment. You'd have to ship the actors used as
confederates across the country to do that and of course nobody does that.
It's foolish to think any psych study has ever been replicated.

The way psychologists deal with this is the concept of converging operations.
This has to do more with being clever and considering implications of a thing
you're trying to study with respect to other theories you have much more
evidence for and in different situations. If you think stereotype threat is a
thing, for example, it would make sense for stereotype boost to exist if the
stereotype is positive. If a priming effect speeds up the recognition of
"doctor" after a subject has been shown "nurse", maybe recognition for
"warmth" will be slower after a subject has been shown "frozen". If people
only encode based on schema variables, they'll be unable to remember
information unrelated to a schema even if they're provided with congruent
encoding cues.

Converging operations is just studying the same concept from different angles
and seeing if you keep seeing consistent results. If the results aren't
consistent the hypothesis you're investigating will, in fact, die. Psychology
is an advancing science as of right now. If people were just publishing random
noise there'd be no reason for concepts to fall out of favor.

I think people on hacker news tend to wildly underestimate the degree of skill
involved in experimental psychology. So much of statistics, even, has come out
of psychology because of the need to refine and clarify what effect you're
seeing.

Alternatively, if you think because of this existing academic psychology is
lacking, you're in a _fantastic_ place to be. Do literally anything to get
some seed money, just working in a standard SDE job would be fine, and then
run some small experiments (and replicate them, of course) on learning or
addiction. Then write the next Farmville or build the next Facebook. If
there's really that much low-hanging fruit, go out and pluck it. Put up or
shut up.

~~~
yummyfajitas
_Traditionally, replication hasn 't been done in psychology because the
experiments have to be set up in really clever complicated ways in order to
tease out effects. Even the most dedicated replicator can't ever fully
replicate a psychology experiment. You'd have to ship the actors used as
confederates across the country to do that and of course nobody does that. _

Now wait a second. When a psychology study comes out, it claims "This
experiment shows stereotype thread reduces performance of Honduran men on a
Math test." Such studies rarely claim "This experiment shows stereotype threat
reduces the performance of Honduran men on a Math test when Jill the skinny
experimenter repeats the code words."

If using different experimenters yields a different result, it means the
effect being described is probably not as robust as the experimenters claim.
The cause might be Jill's shifty eyes rather than stereotype threat.

That's a replication failure, in the sense that it shows the claimed effect is
far weaker (or causally different) than the original study claimed.

~~~
tedks
I would say that it's typically assumed that there are confounding factors in
any experiment. An experimenter has to try to minimize that, but you're still
not going to be able to replicate a full experiment.

If you have more than one researcher administering a test in a stereotype
threat experiment, that'll reduce the likelihood of that being the confound.
If other studies (not replications, but their own experiments, with their own,
different approaches to study the problem) agree with the first experiment,
the effect is likely real.

I would also say that you're viewing the entire system far too
antagonistically. Nobody goes into academic psychology to get rich off of it.

~~~
droopyEyelids
But if you're not minimizing the confounding factors enough that the result
can be repeated, your experiment failed. Your hypothesis has neither been
provably confirmed nor denied. At that point you are, at best, still gathering
evidence, and publishing results would be an error.

------
ajarmst
Ioannidis' "Why most published research findings are false"
([http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fj...](http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124))
is particularly relevant here, I think.

~~~
ajarmst
Note, in particular, his "Corollary 4: The greater the flexibility in designs,
definitions, outcomes, and analytical modes in a scientific field, the less
likely the research findings are to be true. "

------
east2west
This covers all issues surrounding reproducability pretty well. It is
astounding that scientists want to hide behind their expertises when their
results and conclusions are being tested. Biologists who are under a similar
cloud tried the same tactics too. They have some secret sauce that make their
lab and experiment special and unique, beyond the comprehension of mere
mortals. Of course, as publically funded scientists they have a moral if not
legal obligation to disclose their "secret sauce" if that is very important.
Why publish a paper but leave the public in dark about the most important
factor? At best it is an oversight, at worst it is dishonst.

Experiment and reason underpin the modern science. Common sense says
inreproducible experiments call their colusions into question. Scientific
experise is no escape from reason. A fancy degree from a "top" university does
not make infallible anyone. Well controlled experiments are the final
arbitrator.

~~~
MarkMc
Am I being naive to think that scientific papers should be written in a way
such that people outside the area of expertise can reproduce it? You shouldn't
need to be a psychologist to run a psychology experiment. You shouldn't need
to be a physicist to run a physics experiment.

~~~
east2west
As much as I hate that sceintific papers don't give good introductions to the
public, specialists do have their values. An educated, intelligent reader has
limits when it comes to frontline research. It is like asking someone flying a
787 jetliner without touching an airplane before. Prior knowledge and
experience are important.

What I absolutely reject is false exclusivity that some scientists hide
behind. A professional statistician can spot statistical errors even if he or
she never did any cancer study before. A rational person can discern the
trustworth of a scientific report if enough details and background are well
explained. I always find that popular press lacks a degree of healthy
comptempt for specialists. It actually has improved somewhat recently. Still
sometimes newspaper reports read like press release.

~~~
MarkMc
I accept that there is some skill and training required to run an experiment -
particularly with hard sciences - but generally far less than is required to
design it and analyze the results.

To extend your metaphor, it would be like saying that only people who design
jetliners should be able to fly them.

------
danso
Controversies like these make me glad I work in the field of
programming...while arguments over what algorithm or frameworks work best are
less sexy than what the sciences cover, at least with narrow claims, it's
easier to concretely argue for one or the other...and then of course, if we're
talking about open-source, then the errors of ambiguity are even scarcer.

It seems like a large part of the problem, at least in these psychological
experiments, is that there isn't a structured/uniform way to describe
preconditions, methodology, assumptions, and measurement practices...OK, so
you used 40 undergraduate students for your original experiment...how did you
pick them out? How were their characteristics accounted for (e.g.
age/gender/major/health/etc)? Did any of them ever see the Trainspotting scene
before? Have any of them ever seen a movie of equal grossness? How many
minutes did you wait after showing them the Trainspotting scene? In what order
did you hand out the questionnaire? How did you seat the students? etc. etc.
etc. etc.

A totally honest researcher might have trouble enumerating and enforcing all
of the different controls, never mind _communicating_ them to other
researchers. From the research papers I've read, the ability to communicate
these facts doesn't seem to be of uniform quality.

~~~
agent00f
> Controversies like these make me glad I work in the field of
> programming...while arguments over what algorithm or frameworks work best
> are less sexy than what the sciences cover, at least with narrow claims,
> it's easier to concretely argue for one or the other

Yeah, it's pretty ironic when the field with (Comp) Science in the name isn't
really a science but math.

~~~
gipp
It is a science. A formal science. Same as math and logic, for example.

~~~
kylebrown
Hal Abelson disagrees with you. "Computer Science is a terrible name for this
business. First of all, its not a science. It might engineering, or it might
be art. But we'll actually see that computer -- so-called science -- actually
has a lot in common with magic."[1]

1\.
[https://www.youtube.com/watch?v=zQLUPjefuWA](https://www.youtube.com/watch?v=zQLUPjefuWA)

~~~
cfallin
Add to that list Fred Brooks (IBM System/360, Brooks' Law):
[http://www.cs.unc.edu/~brooks/Toolsmith-
CACM.pdf](http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdf)

Relevant quote: "... the scientist builds in order to study; the engineer
studies in order to build. ... I submit that by any reasonable criterion the
discipline we call 'computer science' is in fact not a science but a
synthetic, an engineering, discipline. We are concerned with making things, be
they computers, algorithms, or software systems."

------
ninguem2
If the standard in Psychology research is the so-called p-value being less
that 0.05, it means that they publish results whenever the chance of the
findings being a coincidence is less than 5%. It stands to reason that 5% of
the published results in Psychology will be coincidences and not real.

~~~
capnrefsmmat
That's exactly the opposite of what p values mean. Other replies addressed the
publication bias problem, but even absent that, p < 0.05 usually implies a
probability much higher than 5% that any single "significant" result is false:

[http://www.statisticsdonewrong.com/p-value.html](http://www.statisticsdonewrong.com/p-value.html)

The p value describes the probability that a _nonexistent effect_ will falsely
be be called significant. You can't turn that around (without Bayes' theorem
and a prior) to calculate the probability that the effect is nonexistent.

To compound this, most studies are underpowered -- they don't collect enough
data to detect the effect they're looking for. So a statistically
insignificant result usually does not mean the effect _does not_ exist.

~~~
neumann
There is a great (cheesy) explanation of the statistical 'significance' of
p-values: [http://theconversation.com/the-problem-with-p-values-how-
sig...](http://theconversation.com/the-problem-with-p-values-how-significant-
are-they-really-20029)

------
michaelfeathers
Maybe the answer is to perform distributed research by default. Design
experiments and have them conducted by unrelated teams in different locations.

~~~
couchand
That's a great idea but you really only need half of it. Design the
experiments in enough detail that unrelated teams _could_ conduct them, then
publish the results no matter the outcome. That's probably enough to catch
most of the loose research out there.

------
misnome
I was astounded to read:

> [Psychology] is actually leading the way in tackling a problem that is
> endemic throughout science [replication]

"Throughout science" presumably is meant to apply to everything including
physics, biology, etc. In fact, the article explicitly recognises "Failures to
replicate" in several other fields (medicine, biology, observational
astronomy) which seems to contradict itself - it's a big special thing when a
psychology journal publishes replications, and they are leading the way!
Except here is a load of uncontroversial non-replications from other fields.

Perhaps I'm just being spoilt - being in High Energy Physics, almost everyone
I work with has an understanding of the statistics involved, and extremely
skeptical outlooks on everything that is anything less than perfectly rigorous
- along with (usually) multiple separate experiments all measuring (or capable
of measuring) the same thing.

One big example I can think of was DAMA:
[http://en.wikipedia.org/wiki/DAMA/NaI](http://en.wikipedia.org/wiki/DAMA/NaI)
\- found evidence for dark matter, and a successive experiment by the same
team
[http://en.wikipedia.org/wiki/DAMA/LIBRA](http://en.wikipedia.org/wiki/DAMA/LIBRA)
has found the same result - but nobody "believes" the result, until it is
measured by somebody else, even though nobody can find a cause for the false
positive (assuming it is false).

Finally, I link (again) the Feynman 1974 Caltech commencement address, where
he actually talks about this reproducibility issue:
[http://neurotheory.columbia.edu/~ken/cargo_cult.html](http://neurotheory.columbia.edu/~ken/cargo_cult.html)

------
plg
As long as an academic researcher's personal career progress (and pocketbook)
depend on a seemingly never-ending flow of "novel" and "significant" findings
this will continue.

