
The Stanford Prison Experiment was flawed - ColinWright
http://bps-research-digest.blogspot.com/2014/07/what-textbooks-dont-tell-you-one-of.html
======
tokenadult
Articles about issues like this have been submitted several times to Hacker
News, and I'm glad to see that this article is getting some discussion. Right
after I saw the submission here, I looked up articles on the replication
problem in psychology, and I came back to the PsychFileDrawer.org top 20 list
of studies that are nominated as needing more replication.[1] I'd like to see
more replication of several of those studies myself.

Another recent commentary that I don't recall seeing submitted to Hacker News
is "Psychology's real replication problem: our Methods sections,"[2] which
suggests (quite plausibly to me) that many publications in psychology journals
describe the methods of the study so inadequately that it is hard to know
whether or not the study can be replicated.

A scholar of how scientific research is conducted and of statistical errors
that show up in many peer-reviewed scientific publications, Uri Simonsohn, has
a whole website about "p-hacking" and how to detect it.[3] Simonsohn is a
professor of psychology with a better than average understanding of
statistics. He and his colleagues are concerned about making scientific papers
more reliable. You can use the p-curve software on that site for your own
investigations into p values found in published research. Many of the
interesting issues brought up by comments on the article kindly submitted here
become much more clear after reading Simonsohn's various articles[4] about p
values and what they mean, and other aspects of interpreting published
scientific research. And I think Hacker News readers who have thought deeply
about statistics will be delighted by the sense of humor while making pointed
remarks about experimental methods that you can find in the papers of
Simonsohn and his colleagues.

Simonsohn provides an abstract (which links to a full, free download of a
funny, thought-provoking paper)[5] with a "twenty-one word solution" to some
of the practices most likely to make psychology research papers unreliable. He
also has a paper posted on evaluating replication results[6] with more
specific tips on that issue.

"Abstract: "When does a replication attempt fail? The most common standard is:
when it obtains p > .05. I begin here by evaluating this standard in the
context of three published replication attempts, involving investigations of
the embodiment of morality, the endowment effect, and weather effects on life
satisfaction, concluding the standard has unacceptable problems. I then
describe similarly unacceptable problems associated with standards that rely
on effect-size comparisons between original and replication results. Finally,
I propose a new standard: Replication attempts fail when their results
indicate that the effect, if it exists at all, is too small to have been
detected by the original study. This new standard (1) circumvents the problems
associated with existing standards, (2) arrives at intuitively compelling
interpretations of existing replication results, and (3) suggests a simple
sample size requirement for replication attempts: 2.5 times the original
sample."

I should add that slamming the entire discipline of psychology as a discipline
with sloppy methodology goes a bit too far. I have learned about most of the
publications that take psychology most to task from working psychology
researchers. There are whole departments of psychology[7] that largely have a
scientific orientation and are trying to improve the discipline's methodology.
Crap psychology abounds, but it is gradually being displaced by science-based
psychology based on sound methodologies. It is of course more methodologically
difficult to study the behavior of our fellow human beings than to study
clouds or volcanoes or insects, but many scientifically oriented psychologists
are working on the problem with good methods and sound statistical analysis.
Some thoughtful psychologists have been prompted to stress careful replication
by the failed studies that have come before.[8]

[1]
[http://www.psychfiledrawer.org/top-20/](http://www.psychfiledrawer.org/top-20/)

[2] [http://psychsciencenotes.blogspot.com/2014/05/psychologys-
re...](http://psychsciencenotes.blogspot.com/2014/05/psychologys-real-
replication-problem.html)

[3] [http://www.p-curve.com/](http://www.p-curve.com/)

[4] [http://opim.wharton.upenn.edu/~uws/](http://opim.wharton.upenn.edu/~uws/)

[5]
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588)

[6]
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2259879](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2259879)

[7]
[http://www.psych.umn.edu/research/areas/pib/](http://www.psych.umn.edu/research/areas/pib/)

[8]
[http://www.psychologicalscience.org/index.php/publications/o...](http://www.psychologicalscience.org/index.php/publications/observer/2012/february-12/psychologys-
woes-and-a-partial-cure-the-value-of-replication.html)

~~~
tlarkworthy
the replication problem is more general than Psychology, it's a problem across
biology. Perhaps more acute in psychology where the statistical methods lag a
bit, but there are serious problems with the volume of science produced and
the statistics used to analyse the aggregate. I feel like we might have a
foundational crisis emerging.

[http://neuroblog.stanford.edu/?p=3451](http://neuroblog.stanford.edu/?p=3451)

~~~
cyorir
The neuroscience review cited in this blog post is originally from nature
reviews [0]. While I agree with the general notion that neuroscientists need
to get better at using statistical methods and tools, I wonder if the author
of the blog you linked (Zalocusky) maybe makes the problem out to be somewhat
different than as it is described in the original review.

Specifically Zalocusky is trying to link the neuroscience results to an
earlier article by Ioannidis [1]. Whereas article [0] makes the conclusion
that neuroscience results are unreliable and hard to reproduce, Zalocusky is
applying the older methods and results from [1] that this means neuroscience
results are not only unreliable but also false, which is a stronger statement.
To support this, Zalocusky adds in a couple of back-of-the-napkin
calculations.

I think a bit more analysis and caution is necessary before making the leap
from Ioannidis' 2013 claim about neuroscience research to Zalocusky's stronger
claim about neuroscience research.

Incidentally there was a lot of discussion about Ioannidis' 2005 paper, some
which can be seen by Ioannidis' 2007 response to earlier criticism of the
paper [2]. When evaluating claims about entire fields of research, it is
important to be careful about how we interpret these claims.

edit: We need to be especially careful when using the word false. Does that
mean "not true"? Or "insufficient to describe the truth"? Or "directly
opposite to the truth"?

[0]
[http://www.nature.com/nrn/journal/v14/n5/full/nrn3475.html](http://www.nature.com/nrn/journal/v14/n5/full/nrn3475.html)

[1]
[http://www.plosmedicine.org/article/info:doi/10.1371/journal...](http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124)

[2]
[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1896210/](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1896210/)

~~~
tlarkworthy
Yeah I agree caution in interpretation is important.

However, given enough scientists all three forms of false are being published
in parallel (as well as genuinely true results), which is the key problem.
Without bounds on publication bias and publication quantity you can't really
rigorously associate a probability of truth from a p-value* in isolation.

(hmm, I suppose meta-studies are a kind of remedy to that so maybe it will all
work out anyway)

* or whatever

------
gergles
Well, define 'flawed'. It is flawed if you attempt to interpret it as being
representative of how people in the general population would behave in a
setting without an active experimenter interfering, sure.

It does still reveal the influence that one person can have on a group of
people and how much can be changed about behavior. It would be unwise to throw
away discussions of the experiment because people can't properly place the
data in context or evaluate the strengths/weaknesses of the experimental
design and implementation (which is what Psychology courses should be teaching
you to do in the first place.)

~~~
watwut
The linked article criticizes text books for not covering experiments
criticism. The full title is: "What the textbooks don't tell you - one of
psychology's most famous experiments was seriously flawed".

The title on HN have been shortened and information got lost in the process.

~~~
ColinWright
The title was too long, and it wasn't clear what should or should not be
included. I did my best.

<shrug />

~~~
judk
Note that this is exactly what is wrong with how the experiment is discussed
in text books and pop culture!

------
asgard1024
Can someone explain why is SPE not relevant to Abu Ghraib? While you can argue
the experiment was flawed, I am not sure its results are.

I am asking myself, if all the outrage about the SPE (and also Milgram)
experiments being unethical (or flawed) isn't really about something that we
don't want to really know about human nature.

~~~
bjourne
You shouldn't put the SPE and Milgram experiments in the same basket. His
experiments involved hundreds of participants from different demographics
(mostly males though) and the conclusions he drew were backed up by his data.
The setup he used is today deemed by most as unethical, but similar studies
have reached the same conclusions.

A little known fact is that Milgram found, when the studied persons were free
to set the voltage to administer, none of them increased it. His conclusion
were that most people are not sadists and do not enjoy hurting others.

Edit: Maybe I spoke to soon, see: [http://www.psmag.com/navigation/health-and-
behavior/electric...](http://www.psmag.com/navigation/health-and-
behavior/electric-schlock-65377/) But the experiments where still very
different in the number of test subjects and how scientifically they where
conducted.

~~~
asgard1024
I have the book Obedience from Milgram at home, although I just skimmed
through it. I think the situation is similar, Zimbardo would probably do more
experiments as well weren't the first result so horrific.

What I am a bit worried of is that we are not able to replicate (because of
the ethical problems), in one way or another, results of these studies. How
can future generations become educated about these problems? Especially if we
cast doubt over their results. Are our descendants doomed to figure out these
things the hard way?

------
frobozz
I vaguely recall seeing a similar thing about Milgram's Obedience experiments.
The experimenters allegedly broke protocol by prompting the subjects too many
times or bullying them, rather than using the prescribed neutral prompts.

However, I can't remember where I heard this.

~~~
trhway
>The experimenters allegedly broke protocol by prompting the subjects too many
times or bullying them

and thus showed that "prompting the subjects too many times or bullying them"
still makes people to follow the prompts and deliver the shock where
"prescribed neutral prompts" weren't enough.

------
marincounty
In my life, I have found at least 95% of the people I have interacted will
commit immoral acts/lie/cheat/ etc; as long as it's legal? They won't Jaywalk,
but will commit horrid/spineless acts in order to make a buck, or secure their
financial future. It would be hard to reproduce this study because so many
people know about it, but I doubt the outcome would be much different. Man is
spineless. I guess if you do show conviction and stand up for what is right;
you run the risk of Homelessness? To bad! Oh, and just doing the right thing
when people are looking, or convenient doesn't count.

~~~
triangleman
First of all, have you read the article? When you say "I doubt the outcome
would be much different" are you basing that on the same prejudices that
Zimbardo had when he ran the experiment the first time?

Also, watch out for sloppy thinking: If people will commit immoral acts as
long as it's legal, that makes them heartless, not spineless. Make sure you
know what your standard is, because it will be used to judge you.

------
mbateman
The most surprising thing is that the textbooks haven't been updated. When I
was a psych prof this was a commonplace among all my colleagues and even
students. It's even common to use this "experiment" as a case study in class
to discuss problematic experimental design.

I'm not sure why textbooks haven't been updated, but I doubt the hypotheses
mentioned in the article (ignorance, that the authors believe the study, and
pressure to keep textbooks short).

------
pessimizer
This blog article doesn't tell me that MPE was flawed, either. All of those
words criticizing other sources, but nothing summarizing the factual basis for
those criticisms other than links to papers that .001% of the people reading
this blog entry will complete.

My guess at why textbooks ignore the criticism is because it is not
convincing.

------
truantbuick
SPE is an absolute mess as an experiment, and shouldn't really be regarded as
science.

However, you really don't even have to go far so far into the technical
details to find it to be flawed. The whole premise seems absurd to me.

You get some students, tell some of them to act like prison guards and the
rest to act like prisoners, and then it's supposed to be surprising that they
follow through on this?

"Well, we didn't expect them to be so sadistic!"

There's nothing I read that was more sadistic than a fairly typical hazing
ritual. All participants knew this was ultimately voluntary and could quit at
any time.

------
NoMoreNicksLeft
It might be convenient if the Stanford Prison Experiment continued to be
discredited, often and harshly. Who would want to be in charge if all the
peons were constantly mumbling about how being in charge turned leaders into
petty tyrants?

It'd just be awkward and uncomfortable.

------
robg
Not flawed per se, if we see Zimbardo like Milgram as the guy in the white
coat. If anything it's a case study in what "encouragement" means in the
social context, like Abu Gharib or the Holocaust. No study is perfect, it
can't be. By trying to control for variables, new ones are introduced and
objectivity is lost and gained. No surprise there.

------
esquivalience
Zach Weinersmith of SMBC put this very well in one frame and a caption:
[http://www.smbc-comics.com/?id=3025](http://www.smbc-comics.com/?id=3025)

~~~
mathattack
This issue is true with much of social science. It would be great to see more
replication (or non-replication) experiments to confirm what we think we do or
don't know.

~~~
DanBC
The linked comic is true - there are many problems with social science
studies.

But, since you specifically mention replication and non-replication -- how
well is this covered in the rest of science? How well is the null-hypothesis
covered?

(I should say that I'm not trying to suggest that physics is as sloppy as
psychology sometimes appears to be.)

(EDIT: spelling)

~~~
mathattack
It's a problem everywhere. I think Physics is somewhat spared because the gap
between observation and math is less. (String theory perhaps being an
exception) I think biochemistry runs into these issues more because it is
harder to "prove" things with math. In general the less you can rely on
supporting math or theory, the more you need to be careful about replicating
experiments to come up with the truth.

~~~
cyorir
I think this is misunderstanding the role of math in science. Math is not used
to "prove" scientific results, per se. Math is used to construct models, and
prove that certain results would be expected from a given model. That doesn't
mean the models work in the real world; all sciences use experiments to test
models to see if they describe actual observations.

In this sense, you always need to be careful about replication, regardless of
field. If you can't show replication, then there is no guarantee that the
model will always match observations, no matter how much math went into the
construction of that model.

In Physics the main problem is isolating variables, and setting up appropriate
conditions. The good thing about Physics is that once the environment is set
up and the experiment is well designed, it is easy to re-run the experiment
for more trials. With chemistry and biology, this is usually the case as long
as you have clean reagents in a clean lab, but you are introducing more
factors of what you have to control as the complexity of your system scales
up.

It is hardest to replicate results involving humans, because with humans you
have a lot of complexity, so you know you can never control all influencing
factors. Usually in the social sciences, it is easier to hope these factors
cancel out with a large, non-homogenous sample size. The problem as these
experiments like SPE demonstrate is that it is hard to build an ideal sample
and, if the ethics or practicality of your methods are questioned, rerun the
experiment to replicate results.

------
hellodevnull
The half-life of knowledge in psychology is thought to be one year, so by now
most of the famous experiments (and their findings) in the field are
considered flawed.

[http://en.wikipedia.org/wiki/Half-
life_of_knowledge](http://en.wikipedia.org/wiki/Half-life_of_knowledge)

~~~
M4v3R
The very link from Wikipedia you used says that half-life of knowledge in
psychology is five years. Still, that's not very long either.

~~~
hellodevnull
Mistype. The point is don't take academic psychology seriously.

~~~
seanflyon
Sadly this extends past psychology to other scientific disciplines. There are
many foods for example that have been shown to both dramatically increase and
decrease you risk of cancer in well respected scientific journals. There is a
strong incentive to find extreme results and too few peers actually reproduce
results.

------
dmfdmf
Who cares? Isn't this an experiment validating a known conclusion anyway? Evil
exists in the hearts of many. Give these bad people unlimited power and they
will take out their frustrations on their fellow man. We see this all around
us, if one is paying attention. The disgruntled Russian doorman, the lady who
runs the local Home Owner's Association and likes to boss people around to
sooth her psuedo-ego, the cop who shoots the non-aggressive dog or chokes out
a cuffed arrestee because he can. And let's not forget the biggest experiment
of all -- Nazi Germany where millions died. Now there is an experiment that
went on way too long and should have been aborted. I just hope we learn the
right lessons.

NB: I am not saying everyone is evil.

~~~
JulianK
I think the reason why the experiment, if true, is so fascinating is because
it's about normal people put into an abnormal situation rather than inherently
bad people being given power.

What I take away from the experiment is that given the right situation and
pressures, most people would do things they thought they would never otherwise
do. More importantly, I think it's probably a mistake to think that you're the
exception and are morally incorruptible.

~~~
dmfdmf
This is exactly the conclusion they want you to reach. Who are you to judge?
Who are you to condemn anyone since you are evil too? This is the charter of
liberty for evil and all evil needs to survive, even prosper. This
"experiment" is the biblical tenet of Original Sin masquerading as science and
I reject it. If your moral principles cannot be practiced consistently by all
men then something is wrong with your principles, not human nature.

------
meowface
Are there any similar critiques of the Milgram experiment?

edit: Also, what the hell happened to HN comments sections lately?

~~~
unhammer
Well, there are interesting follow-ups at least:
[http://onlinelibrary.wiley.com/doi/10.1111/jopy.12104/abstra...](http://onlinelibrary.wiley.com/doi/10.1111/jopy.12104/abstract)
(summarised at [http://www.psychologytoday.com/blog/the-green-
mind/201406/ar...](http://www.psychologytoday.com/blog/the-green-
mind/201406/are-polite-people-more-violent-and-destructive) )

Some nice quotes from the discussion (citations removed here for readability):

 _As expected, Conscientiousness and Agreeableness predicted the intensity of
electric shocks administered to the victim. Second, we showed that
disobedience was influenced by political orientation, with left-wing political
ideology being associated with decreased obedience. Third, we showed that
women who were willing to participate in rebellious political activities such
as going on strike or occupying a factory administered lower shocks._

 _All these results suggest that situational context, even though a powerful
determinate of behavior, does not necessarily overwhelm individual-level
behavioral determinants. It is interesting to note that personality traits
such as Agreeableness and Conscientiousness, which are widely related to
positive outcomes such as better mental health, longevity, academic
performance, parenting, reduced aggression, and prosocial behavior, may also
have darker sides in that they can lead to destructive and immoral obedience._

------
oldmanjay
anecdotally, the people I know who pursue psychology vocationally are not the
most rigorous thinkers. more than half have a serious belief in astrology, for
instance.

which is to say, I don't know what people expect, but I'm never surprised when
I find out that any given psychology experiment is a fanciful construct of
lies and half-truths.

~~~
fit2rule
Psychology is a modern religion masquerading as a science. Its as much a means
of social control as anything else, and is promoted by governments because it
gives them a way of subverting the general will of the people, that they not
have religions dogma enforced upon them by the state. Psychology is little
more than cultural dogma categorized in a way that it allows application for
political purposes, and anyone who disagrees with that Psychology is a true
science, is of course .. "crazy" ..

Just count the number of times you hear someone say "that guy is certifiably
insane, he should be committed to an asylum" any time someone comes up with a
non-popular point of view .. this is the government religion at work.

~~~
thomholwerda
Read some neuroscience textbooks, for example, and then come back to me about
how it's a "religion".

This has to be one of the worst comments I've seen in ages. My god.

~~~
fit2rule
Such as what do you recommend I read?

Neuroscience is just an attempt by the psychiatric movement to evade its
religious/government roots. Neuroscience will also, eventually, be recognized
as a belief system of enforcement, merely a tool for those who wish to social
engineer the next generation of society for .. whatever .. means.

------
mathetic
Title of the article is seriously flawed. Every textbook and every respectable
course spends more time criticising Stanford Prison Experiment more than
discussing its results.

~~~
joshuahedlund
The text of the article contains evidence which supports the title's assertion
and contradicts your assertion that "every textbook" criticizes the
experiment. Do you have evidence to support your assertion?

> So, have the important criticisms and reinterpretations of the SPE been
> documented by key introductory psychology textbooks? Greggs analysed the
> content of 13 leading US introductory psychology textbooks, all of which
> have been revised in recent years, including: Discovering Psychology
> (Cacioppo and Freberg, 2012); Psychological Science (Gazzaniga et al, 2012);
> and Psychology (Schacter et al, 2011).

> Of the 13 analysed texts, 11 dealt with the Stanford Prison Experiment,
> providing between one to seven paragraphs of coverage. Nine included
> photographic support for the coverage. Five provided no criticism of the SPE
> at all. The other six provided only cursory criticism, mostly focused on the
> questionable ethics of the study. Only two texts mentioned the BBC Prison
> Study. Only one text provided a formal scholarly reference to a critique of
> the SPE.

