
Beneath the Replication Crisis - danielam
https://societyinmind.com/2019/08/13/beneath-the-replication-crisis/
======
mensetmanusman
It is hard for me to wrap my mind around.

Imagine finishing graduate school and then suddenly learning that over half of
what you learned is probably not true... but not knowing which half

~~~
cerealbad
people are self-interested not truth interested. most scientific fields are
initiated by a small handful of geniuses, but over time become a type of
cloistered bureaucracy. your standard university education teaches you there
are two paths - you make significant contributions and are remembered long
after you die, or you plug in as a fresh node into the existing structure and
are rewarded with a comfortable life, just check your ethics at the door.

this is largely due to the failure of idealism as a philosophical framework to
set up any type of enduring political, economic or secular social structure.
the blame lies on hegel or perhaps marx's interpretation of hegel getting
stuck in praxis quagmires. which is why the turn towards asia and the new age
movement which emerges from it in the early 1900s is essentially anti-science.
the spiritual successor of communism is the idea of fusing the east and west
traditions to form the new enlightened human, who sees no race, religion or
gender but simply behaves as the universal light and creator. it has largely
been a failure, as folding china into the world community has been against the
pragmatic self-interest of various sensitive american industries - weapons,
space, communication, high finance, which can exert sufficient power on the
american leadership structures to avoid any type of idealism about world
peace. there are no more enemies or allies just competitive co-morbidity,
actors fraying the ropes they are tugging on until something breaks, like the
ussr in the 80s.

it's only a problem if you think humans are equal. if you shift the view that
humans are unequal and life is deeply unfair, then it's normal to see the next
100 years as a struggle for dominance between superstition and science,
between a world led by north america or eurasia, between the new dominoes of
fascist nationalism and disinterested international capitalism. it's a hot
peace, which will end up with one temporary victor, possibly colonizing mars -
or just landing there and coming back, before a new power emerges to challenge
the old empire. i would argue that this is a bi-product of the 'correct'
interpretation of hegel, the one that isn't taught directly in schools, the
master and slave dialectic.

~~~
__MatrixMan__
Wow, that's a lot to attribute to Hegel. I've never read him. Where should I
start in order to not absorb the wrong interpretation that is apparently so
prevalent?

I have a historical bogeyman of my own: I think Newton--perhaps accidentally--
encoded a lot more of his own non-mathematical perspective into his work on
Calculus. These biases got baked into our current theory of "real numbers"
(which are pretty spooky, once you get to know them, and don't strike me as
befitting their name). This was done primarily because the pure mathematicians
in the centuries following Newton couldn't justify his results, which was an
embarrassment since his work was so useful that it obviously was true. As you
say:

> people are self-interested not truth interested

So now we have this element of arbitrariness baked into our numbers, theories
that underpin the sort of methods that the article refers to here:

>Faith in this methodology certainly unites a much larger number of research
psychologists than does any kind of commitment to a particular theoretical
framework

Somewhere between physics and psychology, an assumption that worked for Newton
stopped working for us, but we didn't notice because we had only it to compare
it to.

I similarly extrapolate this accusation to wider political spaces (i.e. the
failure of standardized testing to make the kind of differences we wanted it
to, or the propensity of our economic system to generate jobs that don't
actually matter).

I see some parallels between our reactions to this piece, so I want to read
Hegel and see if we're just similarly out there, or if we're out there in
similar ways.

------
tomlock
> Combined with industry-wide pressures to publish, the replication crisis was
> inevitable.

> The replication crisis, if nothing else, has shown that productivity is not
> intrinsically valuable.

I think this is important to focus on - the point of universities has become
to produce profit, and to give people degrees that are profitable, and to
appear to be able to do those things. This has very little to do with
producing research with verifiable results. It's much more to do with getting
students into the funnel by making people with tenure appear as productive as
possible.

~~~
nine_k
AFAICT this is not about profit in a commercial sense, as in selling goods.

It's more like overfitting the target function of publishing impactful
research. A bit of p-hacking, a bit of cutting corners in experimental setup,
a sloppy null hypothesis check, and you honestly believe you see an effect!
Everyone is happy: you, your adviser, lab's administration, the journal where
you publish the paper.

But if you carefully check for everything, then find no effect, you kill an
interesting hypothesis, your paper is hard to publish, "you are not making
progress", and nobody is happy.

Crooked incentives, crooked results :(

~~~
tomlock
I agree that what you say is happening is happening, I think there was some
great meta-analysis that showed that p-values were not following a
distribution that was statistically possible - like on OkCupid where people
that are over 5'10" round up to 6ft.

But I think the underlying reason for the push to publish things - and
impactful things are easier to publish, is profit. More hireable grads, more
tenured professors publishing papers.

~~~
pryffwyd
I have read that a study is very unlikely to replicate if p is juuust under
0.05, but very likely to replicate if just over 0.05. The first is a good sign
of p hacking, while the second is a good sign of a real effect with a sample
size that wasn't quite big enough.

------
repolfx
The article argues the replication crisis is somehow unique to psychology, but
it's not.

As the Scott Alexander essay makes clear, it also affects psychiatry and it's
apparently the case that many areas of medicine have this problem. Biotech
studies are also hives of replication failures.

Even AI research has had replication failures and that's based on running
theoretically deterministic software on theoretically deterministic machines!

Some of this is that accurately describing studies and replicating them is
hard. But some of it is that academics aren't incentivised to find the truth,
but rather, to make it appear that academics know a lot of things.

~~~
rleigh
I'm unsurprised by the replication crisis. I've seen scientists p-hacking, and
even choosing inappropriate statistical tests because it gave a "better"
result. Example: using a non-parametric "U"-test when a parametric "T"-test
gave a non-significant result. Along with low numbers of replicates.
Statistical analysis is only useful and meaningful if your data is of good
quality, and you use a statistical test appropriate for the data. If your data
is of marginal quality, then I'm afraid that it's simply unworthy of
publication. But when your career hangs in the balance, stuff like this gets
through.

The major problem with current scientific practice is that good practice is
actively penalised. I used to work in industry, and I was shocked at the lax
standard of work, particularly with respect to accuracy and precision, of wet
lab scientists in university research settings. I mentioned this to a few
postdocs over the years and paraphrased was told that "if it's good enough to
publish, then it's good enough for me", which if you think about it, is
actually quite a low bar. Most of the people were fully aware they were doing
sloppy work, but didn't care.

How can it be improved? I think there are two sides to this coin. Firstly,
good practice has to be encouraged and rewarded, and sloppiness penalised.
That requires a culture change in the laboratories. Too many PIs don't care
about what happens in their labs so long as "good" results are being generated
by their underlings. They don't look after instrument calibration and ensure
that people are working to GLP standards. In industrial labs, we had to send
samples off to reference labs, analyse random samples provided to us, and
undergo inspections and audits to prove we were providing correct analyses.
Maybe academic labs should be obligated to prove themselves as well, or lose
their funding?

Secondly, a project delivering negative results should not be a career-ending
move. Failing experiments does not necessarily mean one is a bad scientist.
But right now, the incentives are to spin all results in a positive light,
even if it means publishing bad science, because that's what it takes to keep
the funding coming in. Success should be rewarded, but I think our criteria
for what success is need to be recalibrated to reduce charlatans abusing the
system for their own benefit. Publishing a paper isn't enough; it's got to be
replicable independently.

~~~
repolfx
I think you're right, and your experiences of corporate vs academic research
quality matches my own, more or less (in different fields).

But this does lead to the question of - why not just reduce academic funding,
matched by corresponding cuts in corporation taxes? That would obviously not
lead to a 1:1 transfer of research funding or anything even close to it, but
if the replication crisis seems to suggest anything at all it's that there's
too many scientists chasing too little real knowledge, with too few reality
checks of the sort industrial labs require. If more funding was from industry,
the quality of science might be higher.

------
lidHanteyk
The point about not having models is a very important point. Psychology is so
sure that the mind exists and has certain structural properties, but to what
extent is it empirical?

~~~
x220
Physicists are so sure that the theory of gravity is true, but to what extent
has this been verified? Have they seen the gravity particle yet?

~~~
geomark
Isn't the most useful model just the formla for the force of gravity? Sure
there is lots of interest in understanding an underlying mechanism. But a
useful model abstracts that away so you can predict behaviors.

~~~
x220
Exactly, that's the point I'm trying to make.

------
interblue
Regarding Bem's precognition experiments, an extract from Wikipedia:

In a 2017 follow-up article in Slate magazine on the "Feeling the Future"
experiments, Bem is quoted as saying, “[...] If you looked at all my past
experiments, they were always rhetorical devices. I gathered data to show how
my point would be made. I used data as a point of persuasion, and I never
really worried about, ‘Will this replicate or will this not?’”"[42] While
fellow psychologist Stuart Vyse sees this statement as coming "remarkably
close to an outright admission of p-hacking", he also notes that Bem "has been
given substantial credit for stimulating the movement to tighten the standards
for research" such as that taking place in open science.[43]

– [42]: [https://redux.slate.com/cover-stories/2017/05/daryl-bem-
prov...](https://redux.slate.com/cover-stories/2017/05/daryl-bem-proved-esp-
is-real-showed-science-is-broken.html) 43:
[https://web.archive.org/web/20180805142806/https://www.csico...](https://web.archive.org/web/20180805142806/https://www.csicop.org/specialarticles/show/p-hacker_confessions_daryl_bem_and_me)

Also from one of the articles' sources:

"There is some evidence, however, for the hypothesis that people can feel the
future with emotionally valenced nonerotic stimuli, with a Bayes factor of
about 40. Although this value is certainly noteworthy, we believe it is orders
of magnitude lower than what is required to overcome appropriate skepticism of
ESP." –
[https://link.springer.com/article/10.3758%2Fs13423-011-0088-...](https://link.springer.com/article/10.3758%2Fs13423-011-0088-7)

A low threshold for statistical significance won't solve all problems of
p-Hacking, and the implementation of Bayesian methods doesn't seem to
promising as well. Will be interesting to see how the field of psychology is
going to change over time.

------
StefanKarpinski
I find this article quite hard to read (poor writing, grammatical errors
making sentences incoherent), but the key thesis seems to be that because
psychology has embraced empiricism as its philosophy of science, and not
required any theory to explain and interpret empirical results, it is uniquely
susceptible to replication failure. In a sense, the replication crisis in
psychology is string evidence that empiricism is an incorrect theory of
science: if it were, psychology would be doing great—just as well as theory-
laden sciences like physics, chemistry and biology. Instead, what we see is
that the more a science takes coherent, broad theories seriously, the better
if fares as a scientific endeavor. This matches a Popperian/Deutchian
philosophy of science. The most interesting thing about the crisis is that it
is, effectively, a scientific experiment about the nature of science.

