
Many Psychology Findings Not as Strong as Claimed, Study Says - mcgwiz
http://www.nytimes.com/2015/08/28/science/many-social-science-findings-not-as-strong-as-claimed-study-says.html
======
hackuser
Take the time to read the story (many comments don't reflect more than the
headline), which is far more complex and interesting than the headline. For
one thing, almost all studies' effects were reproduced, but they were
generally weaker.

* Most importantly, from the Times: _Strictly on the basis of significance — a statistical measure of how likely it is that a result did not occur by chance — 35 of the studies held up, and 62 did not. (Three were excluded because their significance was not clear.) The overall “effect size,” a measure of the strength of a finding, dropped by about half across all of the studies. Yet very few of the redone studies contradicted the original ones; their results were simply weaker._

* Also: _The research team also measured whether the prestige of the original research group, rated by measures of expertise and academic affiliation, had any effect on the likelihood that its work stood up. It did not._

* And: _The only factor that did [affect the likelihood of successful reproduction] was the strength of the original effect — that is, the most robust findings tended to remain easily detectable, if not necessarily as strong._

* Finally: _The project’s authors write that, despite the painstaking effort to duplicate the original research, there could be differences in the design or context of the reproduced work that account for the different findings. Many of the original authors certainly agree._

* According to several experts, there is no reason to think the problems are confined to psychology, and it could be worse in other fields. The researchers chose psychology merely because that is their field of expertise.

* I haven't seen anything indicating the 100 studies are a representative sample of the population of published research, and at least one scientist raised this question.

~~~
glial
> The only factor that did [affect the likelihood of successful reproduction]
> was the strength of the original effect — that is, the most robust findings
> tended to remain easily detectable, if not necessarily as strong.

This is probably just regression to the mean. The comment above suggests to me
that the tendency for the findings in replicated experiments to be weaker does
NOT necessarily come from any flaw in the experimental design, but from the
criteria for findings to be published.

You would expect any given effect to show some variation around a mean effect
size. My lab and your lab might arrive at slightly different results, varying
around some mean/expected result. If your lab's results meet statistical
significance, you get to publish. If my lab's don't, I don't get to. So the
published results are the studies that, on average, show a stronger effect
than you might see if you ran the study 100 times.

> Yet very few of the redone studies contradicted the original ones; their
> results were simply weaker.

If a third lab replicates the experiment, their results are more likely to be
close to the (possibly non-publishable) mean value than the (publishable)
outlier value. So on average, repeating an experiment will give you a LESS
significant result.

If the strength of the original effect (and thus probably the mean effect
strength over many repeated experiments) is larger, the chance of replicated
experiments also being statistically significant is higher.

In other words, these new results are very predictable and don't _necessarily_
indicate that anything is wrong.

~~~
adrianN
For an example of this effect in physics, look at the Millikan experiment

[https://en.wikipedia.org/wiki/Oil_drop_experiment#Millikan.2...](https://en.wikipedia.org/wiki/Oil_drop_experiment#Millikan.27s_experiment_as_an_example_of_psychological_effects_in_scientific_methodology)

~~~
leni536
Oh, the Millikan experiment. I had to do it in uni lab as a 4 hour long
experiment. It's impossible to gather enough data in this short time and your
eyes figuratively falls out by staring into the microscope measuring the
drops' velocity. I can assure that this is the worst experiment one can do as
a student.

------
legitster
As a marketer, I have long given up on psychology as a field. The
practitioners are too wishy-washy, the experiments too prone to confirmation
bias, and the results too unusable.

I feel behavioral economists cover many of the same subjects, but their work
is much more interesting and scientific. The experiments are tighter and ask
better questions. And they focus more on specific characteristics in decision
making, and less on asking big questions.

~~~
vacri
> _And they focus more on specific characteristics in decision making, and
> less on asking big questions._

It seems more that it's behavioural economists that do studies relevant to
your marketing, not a flaw in psychology itself. Psychology is an incredibly
broad field, and it asks questions both big and small. Plenty of psychology
studies ask very specific questions looking for very specific effects.

~~~
will_work4tears
My place of employment was funded a grant, and did a study to show that teens
were using mobile devices more often than they did years ago (I don't remember
the timeframe). I believe they got something like a 200k+ grant for this. A
team of like 10 PhD psychologists to show something we already know.

Not saying psychology as a field is pointless, or anything, just that they do
occasionally do some silly studies.

~~~
vacri
You're probably oversimplifying the study considerably, but even so, even
things that "everyone knows" need to be formally studied. All across science,
non-intuitive results happen all the time. In psychology, it's even more
common, since biology doesn't play by the relatively clean rules of materials
physics.

After all, "everyone knows" that blacks are inferior and only useful as slaves
(they even want to be, deep down); "everyone knows" that women are too
temperamental to vote sensibly; "everyone knows" that people of that religion
over there eat babies and we should destroy them before they destroy us...

For a more recent example, "everyone knows" that young people use condoms more
often now, given the higher levels of sex education about pregnancy and STIs
they've grown up with - yet regional studies often show significantly
decreased levels of condom use. Another one is the assumption that the current
crop of young'uns are fantastic with understanding computers due to growing up
with them, yet this hasn't been borne out in studies. Turned out that
consuming from a device doesn't mean you understand how it works any better.

Such studies are _particularly_ important at ferretting out what's happening
with people who aren't society's favourites - we all know what a man is,
right? Always looking to get laid, not afraid to get physical, plays sport,
drinks beer. Most men are like that, right? Not really; there's actually a
huge variety of interests and desires. Studies of 'obvious!' things are just
as necessary as fringe things, because sometimes the results are really quite
unexpected.

~~~
will_work4tears
"After all, "everyone knows" that blacks are inferior and only useful as
slaves (they even want to be, deep down); "everyone knows" that women are too
temperamental to vote sensibly; "everyone knows" that people of that religion
over there eat babies and we should destroy them before they destroy us..."

This is simply not true at all these days, and I don't like that HN users
would push a normal conversation into one where you imply your "opponent" is
racist, sexist, or a religious bigot. Might as well have called me a Nazi or
brought Hitler into the conversation.

And I don't believe you need Psychologists or a peer reviewed study to discern
mobile platform usage, we have other ways to getting said metrics.

~~~
afarrell
She was saying that society in general was racist back in the day when they
believed that conditions such as Drapetomania.

~~~
will_work4tears
Thanks for the downvote, if she was saying that, why did she say "Everybody
Knows" rather than "Everybody Knew." No, I see a distinct verbal jab in that
statement, but whatever, I'd downvote her if I could, but you in power like to
keep everybody but those in the click down by downvoting even when we
contribute to the conversation. Hey, I have 351 karma, why don't you get your
friends over here and bring me down to nothing!

------
Lawtonfogle
Perhaps the biggest take away I found while studying psychology is that a lot
of research in the field just couldn't be trusted until you vetted it.
Political pressures are too great a corrupting factor. Words would be
redefined. Conclusions overstated or over applied. This is on top of the
already existing 'publish or perish' issue that impacts science as a whole.

The closer one was to neurology (like physiological psychology), the better it
became. The closer one was to sociology (like IO psychology) or to a
politically charged issue, the worse it became.

Also applies to psychiatry. I remember reading some of the papers published
related to the DSM-V and at one point it looked like little more than peer
reviewed version of two siblings fighting (though that was the worse case, not
the average).

One big thing is to look at how the researchers defined words and look into
how things translate when multiple languages were involved.

~~~
_delirium
> Words would be redefined. Conclusions overstated or over applied.

Even neuroscience has a hearty helping of this. The data itself is one thing,
but what it's sold as showing is quite another. For example if you read a
claim that neuroscience has discovered something about "addiction", "free
will", "motivation", "friendship", "love", etc., dig into how they've chosen
to define those terms for the purpose of the study.

There's often a bit of concept-laundering going on, where a common term is
defined using a very narrow (and often conveniently chosen) definition to show
a result, but then the implications of that result are discussed with
reference to the original, more general sense of the term. News articles about
neuroscience are the worst, of course, but even quite a few scientists
themselves do this.

------
afarrell
Here is the paper the Reproducability Project just released on Estimating the
reproducibility of psychological science:
[http://www.sciencemag.org/content/349/6251/aac4716](http://www.sciencemag.org/content/349/6251/aac4716)

------
chockablock
Here's a nice, approachable treatment of some of the ways psych studies can go
wrong; it also offers concrete suggestions for fixes:

    
    
      Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn
      "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant"
      Psychological Science November 2011 22: 1359-1366,
    

Free full-text (no paywall):
[http://pss.sagepub.com/content/22/11/1359](http://pss.sagepub.com/content/22/11/1359)

(via:
[http://bahanonu.com/brain/#c20150315](http://bahanonu.com/brain/#c20150315) )

------
simulate
There has been a backlash against attempts at replication in psychology:
[http://phenomena.nationalgeographic.com/.../failed.../](http://phenomena.nationalgeographic.com/.../failed.../)
[http://andrewgelman.com/2013/12/17/replication-
backlash/](http://andrewgelman.com/2013/12/17/replication-backlash/) and even
a backlash to the backlash:
[http://phenomena.nationalgeographic.com/.../failed.../](http://phenomena.nationalgeographic.com/.../failed.../)

~~~
Asbostos
One of the weird critisisms I've heard (and reflected in your first link) is
"but the reproduced study had some slight difference". If a slight difference
can make an effect disappear then it's not such an interesting or general
effect as the original study claimed to show! Why didn't they already test for
the effect of slight differences themselves?

~~~
lqdc13
You can't evaluate every single slight variation on your testing environment.
That's just the nature of the field.

------
martingoodson
The replication effect sizes are almost all positive [1]. This study
represents very strong validation that the published results were
qualitatively true, in the main.

The regression to the mean effect is unsurprising and doesn't diminish this
finding. Given the difficulty of performing this kind of research, this is a
very positive result for the field.

[1][http://m.sciencemag.org/content/349/6251/aac4716/F1.expansio...](http://m.sciencemag.org/content/349/6251/aac4716/F1.expansion.html)

------
Apes
For anyone interested in more information on this topic, the book "Psychology
Gone Wrong" by Tomasz Witkowski and Maciej Zatonski is a very in depth look
into the problems in this field.

------
RA_Fisher
This is how I evaluate studies, "are the data and code made available?" It's a
simple request and this study is the first I've ever seen offer both. Bravo!

------
wesleytodd
"Study says other studies are flawed"....

Didn't read the article, but couldn't help but comment on the title, good
laughs.

~~~
WalterGR
"Many Findings About Psychology Findings Not Being as Strong as Claimed Not as
Strong as Claimed, Study Says"

~~~
thehypotemoose
This chain of logic was covered in a 2011 fiction piece published in Science's
competing journal Nature:
[http://www.nature.com/nature/journal/v477/n7363/full/477244a...](http://www.nature.com/nature/journal/v477/n7363/full/477244a.html)

“Although it is nonsensical to rely on evidence provided by human-based
research when judging whether humans are themselves inept, in doing so, the
editors (all human, I note) provide a perfect example of the feebleness of
human reasoning, thereby validating their claims.”

------
mamon
Psychology is more like religion than science.

------
cm2187
And of the 40 that turned the same results, a fair chunk might be just by
chance.

