Another recent commentary that I don't recall seeing submitted to Hacker News is "Psychology's real replication problem: our Methods sections," which suggests (quite plausibly to me) that many publications in psychology journals describe the methods of the study so inadequately that it is hard to know whether or not the study can be replicated.
A scholar of how scientific research is conducted and of statistical errors that show up in many peer-reviewed scientific publications, Uri Simonsohn, has a whole website about "p-hacking" and how to detect it. Simonsohn is a professor of psychology with a better than average understanding of statistics. He and his colleagues are concerned about making scientific papers more reliable. You can use the p-curve software on that site for your own investigations into p values found in published research. Many of the interesting issues brought up by comments on the article kindly submitted here become much more clear after reading Simonsohn's various articles about p values and what they mean, and other aspects of interpreting published scientific research. And I think Hacker News readers who have thought deeply about statistics will be delighted by the sense of humor while making pointed remarks about experimental methods that you can find in the papers of Simonsohn and his colleagues.
Simonsohn provides an abstract (which links to a full, free download of a funny, thought-provoking paper) with a "twenty-one word solution" to some of the practices most likely to make psychology research papers unreliable. He also has a paper posted on evaluating replication results with more specific tips on that issue.
"Abstract: "When does a replication attempt fail? The most common standard is: when it obtains p > .05. I begin here by evaluating this standard in the context of three published replication attempts, involving investigations of the embodiment of morality, the endowment effect, and weather effects on life satisfaction, concluding the standard has unacceptable problems. I then describe similarly unacceptable problems associated with standards that rely on effect-size comparisons between original and replication results. Finally, I propose a new standard: Replication attempts fail when their results indicate that the effect, if it exists at all, is too small to have been detected by the original study. This new standard (1) circumvents the problems associated with existing standards, (2) arrives at intuitively compelling interpretations of existing replication results, and (3) suggests a simple sample size requirement for replication attempts: 2.5 times the original sample."
I should add that slamming the entire discipline of psychology as a discipline with sloppy methodology goes a bit too far. I have learned about most of the publications that take psychology most to task from working psychology researchers. There are whole departments of psychology that largely have a scientific orientation and are trying to improve the discipline's methodology. Crap psychology abounds, but it is gradually being displaced by science-based psychology based on sound methodologies. It is of course more methodologically difficult to study the behavior of our fellow human beings than to study clouds or volcanoes or insects, but many scientifically oriented psychologists are working on the problem with good methods and sound statistical analysis. Some thoughtful psychologists have been prompted to stress careful replication by the failed studies that have come before.
I ask because the topical breadth of your comments is absolutely astounding to me.
Please don't take this in too saccharine or fawning of a manner, but I could only hope to be as generally knowledgeable and contributory as you appear to be were I to buckle down and work my inquisitive ass off for the next 5 decades.
Anyways, I always look forward to your comments. Thanks for them, regardless of how you manage to dish them out.
If you have a blog, I'd be a happy subscriber.
Edit: I read your extensive profile, which answers my questions as to your experience. No need to reply if there's nothing of interest to add :-)
These problems of reproducibility occur in basic cancer research articles as well. Amgen proved this and wrote a paper detailing issues .
Academics are valued for their papers, citations and grants. And papers are difficult to publish if one fails to find an effect. The researcher has some incentive to find the right numbers because their job is, in a way, on the line.
There's lots of writing about the tension between "publish or perish"  and scientific integrity. It manifests in p-values often, but p-values are the dominant statistical tool right now in most fields. I think you will see the same tension regardless of your tools for doing science.
 Drug development: Raise standards for preclinical cancer research, http://www.nature.com/nature/journal/v483/n7391/full/483531a...
I have always despised this semi-excuse. It should not excuse, or even explain anything at all. It is a great deal harder to study the interior of Europa than it is to study how humans behave, but nobody would say that "crap astronomy abounds". At least not without some serious backing evidence; and they certainly wouldn't simply concede it to the hypothetical masses who already suspect it.
"It is hard to find the funds to perform this experiment, so excuse us for making shit up!" is something that would never fly, but "It is hard to find ways to perform this experiment ethically, so excuse us for making shit up!" is heard all the fucking time.
Specifically Zalocusky is trying to link the neuroscience results to an earlier article by Ioannidis . Whereas article  makes the conclusion that neuroscience results are unreliable and hard to reproduce, Zalocusky is applying the older methods and results from  that this means neuroscience results are not only unreliable but also false, which is a stronger statement. To support this, Zalocusky adds in a couple of back-of-the-napkin calculations.
I think a bit more analysis and caution is necessary before making the leap from Ioannidis' 2013 claim about neuroscience research to Zalocusky's stronger claim about neuroscience research.
Incidentally there was a lot of discussion about Ioannidis' 2005 paper, some which can be seen by Ioannidis' 2007 response to earlier criticism of the paper . When evaluating claims about entire fields of research, it is important to be careful about how we interpret these claims.
edit: We need to be especially careful when using the word false. Does that mean "not true"? Or "insufficient to describe the truth"? Or "directly opposite to the truth"?
However, given enough scientists all three forms of false are being published in parallel (as well as genuinely true results), which is the key problem. Without bounds on publication bias and publication quantity you can't really rigorously associate a probability of truth from a p-value* in isolation.
(hmm, I suppose meta-studies are a kind of remedy to that so maybe it will all work out anyway)
* or whatever
It does still reveal the influence that one person can have on a group of people and how much can be changed about behavior. It would be unwise to throw away discussions of the experiment because people can't properly place the data in context or evaluate the strengths/weaknesses of the experimental design and implementation (which is what Psychology courses should be teaching you to do in the first place.)
The title on HN have been shortened and information got lost in the process.
But there's no good control to make scientific conclusions on why the abuses occurred.
I am asking myself, if all the outrage about the SPE (and also Milgram) experiments being unethical (or flawed) isn't really about something that we don't want to really know about human nature.
A little known fact is that Milgram found, when the studied persons were free to set the voltage to administer, none of them increased it. His conclusion were that most people are not sadists and do not enjoy hurting others.
Edit: Maybe I spoke to soon, see: http://www.psmag.com/navigation/health-and-behavior/electric... But the experiments where still very different in the number of test subjects and how scientifically they where conducted.
What I am a bit worried of is that we are not able to replicate (because of the ethical problems), in one way or another, results of these studies. How can future generations become educated about these problems? Especially if we cast doubt over their results. Are our descendants doomed to figure out these things the hard way?
Things like Abu Ghraib are really severe. They don't just apply to "terrorists" in military detention. See 'WinterBourne View'. This was a hospital for long term care for people with learning disability. Residents were the victims of horrible treatment from people who were supposed to care for them. See also the poor care provided by some staff at West Staffs hospital.
Since our aim is to stop and reduce incidents of abuse we need good quality research to make sure that we're not making things worse.
 Using troubling but easy shorthand. Sorry.
 Cultural note: It was a private hospital, but most of the people there would have had their care paid for by the state. This happens quite a lot in mental health treatment - most eating disorder in-patient beds will be private; most medium and high secure "forensic" beds will be private.
> Zimbardo played a key role in encouraging his "guards" to behave in tyrannical fashion.
How do we know that the Abu Ghraib guards weren't told to be tyrannical by the CIA or other management figures? Noone was really blaming the guards themselves for causing it to happen.
What it does support is that if you put people in charge of other people, and only punish the people in charge for lapses in order, and not for cruelty, then a large percentage of them will become cruel to prevent lapses in order.
However, I can't remember where I heard this.
and thus showed that "prompting the subjects too many times or bullying them" still makes people to follow the prompts and deliver the shock where "prescribed neutral prompts" weren't enough.
Also, watch out for sloppy thinking: If people will commit immoral acts as long as it's legal, that makes them heartless, not spineless. Make sure you know what your standard is, because it will be used to judge you.
I'm not sure why textbooks haven't been updated, but I doubt the hypotheses mentioned in the article (ignorance, that the authors believe the study, and pressure to keep textbooks short).
My guess at why textbooks ignore the criticism is because it is not convincing.
However, you really don't even have to go far so far into the technical details to find it to be flawed. The whole premise seems absurd to me.
You get some students, tell some of them to act like prison guards and the rest to act like prisoners, and then it's supposed to be surprising that they follow through on this?
"Well, we didn't expect them to be so sadistic!"
There's nothing I read that was more sadistic than a fairly typical hazing ritual. All participants knew this was ultimately voluntary and could quit at any time.
It'd just be awkward and uncomfortable.
But, since you specifically mention replication and non-replication -- how well is this covered in the rest of science? How well is the null-hypothesis covered?
(I should say that I'm not trying to suggest that physics is as sloppy as psychology sometimes appears to be.)
In this sense, you always need to be careful about replication, regardless of field. If you can't show replication, then there is no guarantee that the model will always match observations, no matter how much math went into the construction of that model.
In Physics the main problem is isolating variables, and setting up appropriate conditions. The good thing about Physics is that once the environment is set up and the experiment is well designed, it is easy to re-run the experiment for more trials. With chemistry and biology, this is usually the case as long as you have clean reagents in a clean lab, but you are introducing more factors of what you have to control as the complexity of your system scales up.
It is hardest to replicate results involving humans, because with humans you have a lot of complexity, so you know you can never control all influencing factors. Usually in the social sciences, it is easier to hope these factors cancel out with a large, non-homogenous sample size. The problem as these experiments like SPE demonstrate is that it is hard to build an ideal sample and, if the ethics or practicality of your methods are questioned, rerun the experiment to replicate results.
Also, what the hell is up with a name like "Weinersmith"? Is that a nom de plume or something?
NB: I am not saying everyone is evil.
What I take away from the experiment is that given the right situation and pressures, most people would do things they thought they would never otherwise do. More importantly, I think it's probably a mistake to think that you're the exception and are morally incorruptible.
edit: Also, what the hell happened to HN comments sections lately?
Some nice quotes from the discussion (citations removed here for readability):
As expected, Conscientiousness and Agreeableness predicted the intensity of electric shocks administered to the victim. Second, we showed that disobedience was influenced by political orientation, with left-wing political ideology being associated with decreased obedience. Third, we showed that women who were willing to participate in rebellious political activities such as going on strike or occupying a factory administered lower shocks.
All these results suggest that situational context, even though a powerful determinate of behavior, does not necessarily overwhelm individual-level behavioral determinants. It is interesting to note that personality traits such as Agreeableness and Conscientiousness, which are widely related to positive outcomes such as better mental health, longevity, academic performance, parenting, reduced aggression, and prosocial behavior, may also have darker sides in that they can lead to destructive and immoral obedience.
which is to say, I don't know what people expect, but I'm never surprised when I find out that any given psychology experiment is a fanciful construct of lies and half-truths.
Yeah, I'd say.
Sometimes anecdotes are useful, revealing something interesting about a person's experience in the world. Here, you've done something very unfair: extrapolate from very a very small perspective into a very large idea of what the world is like.
Psychology is a rich field with diverse practitioners. Rational, data driven, empirical methods underpin psych research as anywhere in the sciences.
And where this isn't true, where Freudians and Jungians, etc., hold sway, there's something interesting going on there, too. Mushy and soft and philosophical and silly as it might be.
Don't discount the value of centuries of inquiry and practice because you know a half dozen idiot psychologists that believe in astrology. This is not a representative or useful sample.
Just count the number of times you hear someone say "that guy is certifiably insane, he should be committed to an asylum" any time someone comes up with a non-popular point of view .. this is the government religion at work.
England has government spending on psychology - mostly in the form of health spending on psychological talking therapies.
In England you can't easily be detained under the Mental Health Act for having a "non popular point of view". There are a number of checks and balances built into the system. While the system isn't perfect it is very unlikely that a person would be detained just for an unpopular opinion. So "The Government" is not out to get you.
Of course, your comment talks about the general population. Yep, you're right, many people in the general population display shocking ignorance about mental health problems and they are happy to blame mental illness when a person displays challenging behaviours or opinions. That's not the fault of government propaganda, that's the fault of ignorance about mental health problems.
My point is this: general society hasn't noticed this infiltration of their governments, and are - at this point - perfectly okay with it. So it doesn't really matter if its a science or not - Psychology has been accepted as the official state religion, for determining just how compliant any individual is with the group-think de jour, and that's all that really matters ...
Please could you point to a single "government appointed psychologist" saying [blah] about some figure? Because while I'm sure they're common in oppressive regimes I haven't heard of one recently in non-oppressive regimes.
> if an 'expert' (high priest) comes along with their statement that the individual has any one of the invented malladies in their bible (DSM-IV), then thats the end of that guy. Take him away!
There are abuses of the various rules for detaining and treating someone against their will. I mentioned that in my first reply. But it's not as easy as you suggest and hasn't been for some years now.
If you're saying that psychology had severe abuse in the system early on then we both agree.
I agree with you - it's rare to see this abused these days. Though I remember a particular nasty case from my childhood - a man in Oslo was committed with a diagnosis of paranoid delusions in 1971 after having been involved in a political struggle related to a school closure. His claims about violations of various laws in the process intended to close down the school were used as evidence of his supposed delusions. He was not released from the hospital until 1985. At which point he refused to leave the hospital, and camped outside it until near his death in 1996, while campaigning to have his diagnosis reversed.
The utterly bizarre situation was that he insisted that if the diagnosis stood, then he should be admitted, but the hospital refused - claiming he was too well to be admitted even voluntarily, while still insisting he still suffered from paranoid delusions.
In 1988 an investigation indicated that he was likely sane when admitted, though he may at some point later have suffered from mental problems caused or exacerbated by his forced hospitalisation.
In 1995 the relevant government department gave him a formal apology, admitting that the claims he first made in 1968, that was used as evidence of delusions in order to have him forcibly admitted, were in fact true... Nobody were willing to admit that the hospitalisation was politically motivated, but given the fact it's hard to conclude otherwise.
It shows how insidious claims like these could be: Everything he did to fight his hospitalisation and diagnosis was used as additional evidence of his supposed insanity.
I like to think that cases like this doesn't occur anymore..
I recall that in a TV discussion of the scandal a psychiatry professor was extremely critical of psychiatric profiles. I think she even said that she would not allow anyone to create one of herself.
Just look what they did to Forrestal in the early years of the takeover of the United States by the military-industrial complex:
(See also - http://psychiatricnews.wordpress.com/)
>If you're saying that psychology had severe abuse in the system early on then we both agree.
Nope. Still the same as it ever was.
This has to be one of the worst comments I've seen in ages. My god.
Neuroscience is just an attempt by the psychiatric movement to evade its religious/government roots. Neuroscience will also, eventually, be recognized as a belief system of enforcement, merely a tool for those who wish to social engineer the next generation of society for .. whatever .. means.
> So, have the important criticisms and reinterpretations of the SPE been documented by key introductory psychology textbooks? Greggs analysed the content of 13 leading US introductory psychology textbooks, all of which have been revised in recent years, including: Discovering Psychology (Cacioppo and Freberg, 2012); Psychological Science (Gazzaniga et al, 2012); and Psychology (Schacter et al, 2011).
> Of the 13 analysed texts, 11 dealt with the Stanford Prison Experiment, providing between one to seven paragraphs of coverage. Nine included photographic support for the coverage. Five provided no criticism of the SPE at all. The other six provided only cursory criticism, mostly focused on the questionable ethics of the study. Only two texts mentioned the BBC Prison Study. Only one text provided a formal scholarly reference to a critique of the SPE.