
The decline effect and the scientific method (2010) - ForrestN
http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=all
======
aamar
For those that faintly recognize the author's name... this piece was written
by Jonah Lehrer, whose writings were later discredited for a combination of
plagiarism, inaccuracy, and methodological weakness. Interestingly, in the
ensuing reassessment of Lehrer's writings, this article has survived basically
unscathed, but I think this discussion (including some statements from the
interviewed scientists) is very useful:

[http://www.lastwordonnothing.com/2012/11/05/jonah-lehrer-
nat...](http://www.lastwordonnothing.com/2012/11/05/jonah-lehrer-nature-of-
truth/)

~~~
rubidium
One of the nice quotes from that link by Rich Palmer, who was one of the
authors of the scientific work referenced:

 _If there’s a lesson here, it’s about a widespread human failing. Most people
would rather some other clever person distill down all the complex details
into a good story for them, preferably in excellent prose. But those distilled
stories should never be treated as a substitute for original research results.
If anyone really wants ‘the truth’, they’re going to have to slog through an
awful lot of turgid and arcane original research and draw their own
conclusion._

------
exratione
The scientific method is the closest to perfection among implemented plans for
mining new truth devised to date by the ingenuity of humans. All of the
proposed better plans that I know of basically involve ways of running the
scientific method faster (e.g. strong AI).

The utility of the scientific method is that it works even when run by self-
interested, flawed, irrational humans. Technology moves ahead. People live
rather than die, and become far wealthier than their ancestors. That this
process kicked into high gear sometime around the time and place that the
scientific method was finally formalized and that formalism successfully
popularized, after thousands of years of ragged, slow, and erratic progress,
does not seem to me to be a coincidence.

------
ims
I think the consensus on why this happens boils down to regression to the
mean, selective experimentation (exciting things over boring things like
attempting to duplicate results), and selective publishing (e.g. positive
results are much more interesting to journals than hypotheses being rejected).

Did I miss any reasons?

~~~
Jach
I'd add bad math. Fisher significance testing is on shaky grounds[0] both
theoretically (significance testing violates the likelihood principle) and
practically (just by nature of having a small sample size, a lot of null
hypotheses can be rejected, while with a larger sample size hardly none are
rejected). Hypothesis testing is little better. The "Bayesian revolution" has
barely begun and has yet to influence a majority of researchers.

[0] [http://uncertainty.stat.cmu.edu/](http://uncertainty.stat.cmu.edu/)
Chapter 12.

~~~
simonster
I don't think this is quite right. If the null hypothesis is true and you
conduct the same study an infinite number of times, you will reject the null
hypothesis 5% of the time at alpha = 0.05 regardless of sample size, unless
there is something wrong with your sampling procedure or hypothesis test. This
is the point of null hypothesis significance testing.

In practice, if you fix p=0.05 and increase your n, the probability that you
will find a statistically significant result often _increases_ because your
power increases, and in many situations, the probability that the null
hypothesis is true is close to zero. (Andrew Gelman uses the example of asking
whether there are significant differences between voting patterns of men and
women.)

On the other hand, effect size estimates become more accurate as the sample
size increases. This mitigates the above issue, provided you actually report
your effect size. It also means that small sample studies that report
statistically significant results are more likely to overestimate their effect
size, which is especially problematic if you are applying null hypothesis
significance testing when you know the null hypothesis is false.

------
ndonnellan
Wouldn't it be great if educational institutions allowed "un-theses" to be
accepted for degree requirements? That is, if you rigorously invalidated an
existing theory. The university equivalent of the "Black Team".

~~~
daughart
Why do you think this wouldn't be accepted? I know in my field of biology if a
person were to disprove an existing theory it would be an important work. My
colleague was just scooped in Science on work that disproves the ribosome
spacing hypothesis when it comes to translation initiation and codon
optimization.

------
DougN7
This is just an anecdote, but I consistently see the same effect when doing
A/B testing. The initial hypothesis always does fantastic for the first week
or so. But if I let it run long enough, it always drops to zero. Been
scratching my head about it for a few years.

~~~
ploomans
I think the underlying cause is similar to the publication bias they mention
in the article.

For almost all A/B tests, A is actually not much different from B. But due to
small sample sizes you will see after the first week, at random, a rather
strong positive or negative result. And now the publication bias/selection
bias kicks in. If you see strong negative results in the first week you will
quickly give up and start a new A/B test. If the initial results show positive
results you get excited and keep testing but then in most circumstances you
get reversal to the mean at high sample sizes. This would most likely also
have happened to the experiments you terminated early but you selected them
away and in your memories only for the good initial results a reversal to the
mean often seem to happen.

Most A/B tests, ran for a long enough time, will show insignificantly
differences. Blogs may give a different impression but again explained by
publication bias.

I always compare A/B testing to genetic mutations. Almost none have a strong
impact on the fitness of an animal but once in a very long while you have a
positive one. Luckily they accumulate and you can get some impressive results
with A/B testing (aka natural selection)

------
skilesare
I think this article is interesting in that it mostly talks about experiments
that are aiming to capture hard correlations in incredibly complex systems.
The scientific method is an amazing tool that we've been using over the last
350 years to pretend that the world is a machine and thus glen incredibly
insightful information. In the end though it probably isn't a machine and as
systems become more and more complex they can become even harder to machinify.
(Christopher Alexander has a lot more to say on the issue:
[http://www.katarxis3.com/Alexander.htm](http://www.katarxis3.com/Alexander.htm))

------
tunesmith
Isn't this the same sort of thing that was discussed in the recent A/B testing
articles? A test that demonstrates amazing results is going to be more visible
than the ones that demonstrate weak results. It's kind of like how if you stop
your A/B test when you see the strongest evidence, you're missing the point.

------
cardiffspaceman
The real problem with the FA is the ending, which strongly agrees with people
epitomized in fiction by the likes of Sheldon Cooper's mother in her reesponse
to her son's statement of a scientific law as a fact: "and THAT is YOUR
opinion."

------
sjbach
Warning: article written by Jonah Lehrer. It's probably still a good read, but
that spoils it for me.

