
I disagree with Turing and Kahneman regarding the strength of statistical evidence - libovness
http://andrewgelman.com/2014/09/03/disagree-alan-turing-daniel-kahneman-regarding-strength-statistical-evidence/#.VAcdcfPG37I.twitter
======
blauwbilgorgel
Turing believed in fairy tales. Gödel believed in ghosts.

This was also at the start of the cold war, where the US suspected that the
USSR was funding millions to ESP research.

One of the most publicized projects was the Stargate Project: _Even though a
statistically significant effect has been observed in the laboratory, it
remains unclear whether the existence of a paranormal phenomenon, remote
viewing, has been demonstrated._

I think Turing was closer to the machine learning camp than the statistics
camp. On that note I'll quote a competitor in the MLSP 2014 Schizophrenia
Detection Challenge:

 _To the people that are going to write papers for this one...

What really strikes me is the fact that stats fail really hard in this
problem. I have a couple of 2-variable combinations that score around 0.87 on
training set with logistic regression and a couple of 3-variable combinations
with training AUC 0.9 (ish). All results were "statistically significant" at
0.001 (not even 0.01) . I have tried the same selections with SAS, SPSS, R and
scikit (with regularization) . All results are consistent (and similar) with
all packages, yet again they scored around 0.5 (random) in public and private
leaderboard. This makes me think about all the PhDs' thesis and medical
science papers I've seen being carried out on mickey mouse sets , claiming
statistical significance gives credibility to their findings ... Is machine
learning more reliable than stats? I say, if you can't predict it consistently
on a hold out set, then you got nothing whatever the t,F,Chi-sq distributions
say._ KazAnova -
[https://www.kaggle.com/users/111640/kazanova](https://www.kaggle.com/users/111640/kazanova)

~~~
raverbashing
Well, fairy tales exist (as a work of fiction). Fairies on the other hand...

And unfortunately, Gödel believing in ghosts was the least of his problems.

------
eemax
If you dismiss any positive results in parapsychology as false on reductionist
grounds, the whole field becomes a sort of meta-control group for statistics
that are measuring real things.

[http://lesswrong.com/lw/1ib/parapsychology_the_control_group...](http://lesswrong.com/lw/1ib/parapsychology_the_control_group_for_science/)

Anyways, this article singles our Turing and Kahneman but it seems this is a
fairly common pitfall/mistake among, well, almost everyone who uses
statistics.

~~~
spindritf
Exactly, the problem is that parapsychology and other junk (let's be
honest...) clears the same bar that we set for social science in general.

[http://slatestarcodex.com/2014/04/28/the-control-group-is-
ou...](http://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-
control/)

~~~
DanBC
...or a bunch of science in general.

[http://www.bbc.co.uk/programmes/b04f9r4k](http://www.bbc.co.uk/programmes/b04f9r4k)

------
Homunculiheaded
To be fair to Kahneman by 2012 he did come around and recognize that there
were serious issues with "priming" research[0].

The chapter on priming in "Thinking Fast Thinking Slow" completely ruined the
book for me. The very premise of the book is (paraphrased): "I will teach you
to overcome bias and think rationally" and then here is an entire chapter that
should make anyone with a critical eye strongly question the research. Even if
one is ignorant of the issues these experiments have with reproducability the
experiment design itself is terrible, and there seems to be no way to draw the
wild conclusion that are drawn. Not to mention that if priming actually worked
as claimed you would see it become a hotter topic among marketers than SEO. I
fail to see how a world expert on cognitive bias can fail to question such an
obvious fault in a topic that he covers in his own book.

[0] [http://www.nature.com/news/nobel-laureate-challenges-
psychol...](http://www.nature.com/news/nobel-laureate-challenges-
psychologists-to-clean-up-their-act-1.11535)

~~~
d136o
Indeed, I had a very strong feeling that the quote came from the book and
waited until I had a chance to log into amazon and search for the text within
the book.

In pages 56-57 of this NYTimes best seller you will find this gem. I remember
wincing when I read it, repelled by the idea that I "[have] no choice but to
accept that the major conclusions of these studies are true."

I picked up the book with an open mind but was turned off by citation after
citation of what felt like were probably very weak experiments.

The worst part is that this type of book, which must have sold millions (?) of
copies, perpetuates possibly mistaken ideas and overstates the strength of the
"effects" studied.

------
unclebunkers
The reason we can stand on the shoulder of giants and improve is because we do
not carry their personal ignorances. We have our own ignorances, which will be
dismissed in coming generations. This doesn't make the giants any less giant,
only more human.

------
DanBC
> One frustration I’ve had in recent discussions regarding controversial
> research is the seeming unwillingness of researchers to entertain the
> possibility that their published findings are just noise. Maybe not, maybe
> these are real effects being discovered, but you should at least consider
> the possibility that you’re chasing noise. Despite what Turing and Kahneman
> say, you can keep an open mind.

The other thing researchers do is look for effect X, fail to find it, but do
find effect Y. They don't publish about X because journals aren't interested
in null results. They do publish about Y because, well, money.

BBC Radio 4 did a reasonable programme about this:
[http://www.bbc.co.uk/programmes/b04f9r4k](http://www.bbc.co.uk/programmes/b04f9r4k)

~~~
csours
Title is likely a reference to Firesign Theatre's "Everything you know is
wrong"
[https://www.youtube.com/watch?v=YKZtt2yEwfs](https://www.youtube.com/watch?v=YKZtt2yEwfs)

------
mnarayan01
The quote from Turing, most relevantly:

    
    
      Unfortunately the statistical evidence, at least for telepathy, is overwhelming.
    

doesn't seem to go anywhere near as far as the author implies. Further, the
source for the quote is a link to a paper which apparently just attributes the
quote to Turing...but I don't even know that for sure since the link 404s.
This is particularly...problematic...as the immediately preceding quote:

    
    
      You have no choice but to accept that the major conclusions of these studies are true.
    

(which does appear to go as far as the author implies) is actually attributed
to Kahneman much later.

~~~
ertdfgcb
Here's the source for the Turing quote:
[http://cogprints.org/499/1/turing.html](http://cogprints.org/499/1/turing.html)

------
mathattack
Like you, I am more in Twain's camp. "There are three types of lies: Lies,
Damned Lies, and Statistics."

Statistics are great for providing evidence to counteract our intuition, and
show empirical holes in logic. That said, they should (like most everything
else) always be considered provisional truths as opposed to final truths.
Experiments can be misdesigned. Stats can be misinterpreted. And (God forbid!)
statistics can be misinterpreted.

~~~
grayclhn
Of course, there's Mosteller's reply: “It’s easy to lie with statistics, but
it’s easier to lie without them.”

~~~
willhinsa
Except, the lies created by abusing statistics are pernicious and more
difficult to exterminate.

~~~
voxic11
Do you have any evidence of that? Lies that rely on empirical evidence can be
shown to be false, while lies not based on empirical evidence can never be
shown to be false.

~~~
npunt
_Some_ lies not based on empirical evidence can never be shown to be false.
However, there's tons of them that CAN be shown to be false.

I just bought a house in San Francisco! They're so cheap!

(That was a lie, based on no evidence, that can be disproven with evidence.
People do this all the time)

------
micro_cam
Clearly the only scientific conclusion to make is that there is less magic in
the world then there used to be.

In all seriousness though, having worked analyzing data in research, one
overwhelming issue is that statisticians are often only involved in the
analysis and not the experimental design resulting in studies that are plagued
with noise and other confounding factors.

It is hard to know how this would effect a study of ESP etc since even a batch
of non randomly selected people who could reliably do this would be
significant but non the less replication has failed.

------
clairity
at the risk of wading into epistemology (of which i'm certainly no expert),
inferential statistics never claims to determine truth. it's a way of taking
large sample sets and examining them mathematically to find little patterns in
the data that may or may not _support_ (but never confirm) a hypothesis. it's
a powerful tool but that's all it does.

it's still up to our brains to try to make that last leap between data and
prediction, which is something our brains are exceedingly good at, if
necessarily flawed (in a schrödinger's cat kind of way).

gödel (among others) helped us find the limits of logical systems like this.
i've mentioned it before (in
[https://news.ycombinator.com/item?id=8169025](https://news.ycombinator.com/item?id=8169025))
but i'm really looking forward to the development of non-logic based math
(quantum computing?) that's more akin to how our brains work and allows for
multiple, conflicting "truths" to exist simultaneously. our consciousness
lives in the rational world but the rest of our brains seems to be quite
content with an ambiguous, entangled reality.

~~~
t__r
What do you mean with non-logic? I think that's hard to define. There are
various approaches in logic-based AI that aim to somehow be more akin to how
we handle multiple conflicting truths. A popular formalism is abstract
argumentation theory, which is the study of evaluating the acceptability of
multiple conflicting arguments. It has been shown that this is closely related
to non-monotonic logics, such as LP with negation as failure and various types
of defeasible reasoning. Proof theoretically, it replaces the notion of proof
with that of a successful strategy in a two-person dialogue game.

------
tokenadult
The Dutch skeptical researchers whose work Gelman comments on here are
exceptionally willing to share author manuscripts for free downloads on the
World Wide Web. However, the particular paper that Gelman comments on here the
most, a paper still in press (not yet formally published) has turned into a
dead link. I get a 404 message when I follow the link from Gelman's blog post.
Searching around on the lead author's (Eric-Jan Wagenmakers's) website, I see
the notice

"Wagenmakers, E.-J., Wetzels, R., Borsboom, D., Kievit, R., & van der Maas, H.
L. J. (in press). A skeptical eye on psi. In May, E., & Marwaha, S. (Eds.),
Extrasensory Perception: Support, Skepticism, and Science. ABC-CLIO. NB. I was
requested to remove this preprint temporarily, until the publisher has given
explicit consent to publish it here. If you seek a preprint please send me an
Email."

And I'll note for the record that many of the papers by Eric-Jan Wagenmakers
and his co-authors continue to be available on his website,[1] whether
published yet or not, but please be sure that he is allowed to carry out his
agreements with publishers so that those papers stay available. The work of
that group of authors is very important and will do much to improve the
science of psychology.

Gelman also comments favorably in his blog post about the work of Uri
Simonsohn and his colleagues, who have devoted much thought to the issue of
"p-hacking." Simonsohn is a professor of psychology with a better than average
understanding of statistics. He and his colleagues are concerned about making
scientific papers more reliable. Many of the interesting issues brought up by
the comments on the article kindly submitted here become much more clear after
reading Simonsohn's various articles[2] about p values and what they mean, and
other aspects of interpreting published scientific research.

Simonsohn provides an abstract (which links to a full, free download of a
funny, thought-provoking paper)[3] with a "twenty-one word solution" to some
of the practices most likely to make psychology research papers unreliable. He
has a whole site devoted to avoiding "p-hacking,"[4] an all too common
practice in science that can be detected by statistical tests. You can use the
p-curve software on that site for your own investigations into p values found
in published research.

He also has a paper on evaluating replication results[5] (an issue we discuss
from time to time here on Hacker News) with more specific tips on that issue.

"Abstract: "When does a replication attempt fail? The most common standard is:
when it obtains p>.05. I begin here by evaluating this standard in the context
of three published replication attempts, involving investigations of the
embodiment of morality, the endowment effect, and weather effects on life
satisfaction, concluding the standard has unacceptable problems. I then
describe similarly unacceptable problems associated with standards that rely
on effect-size comparisons between original and replication results. Finally,
I propose a new standard: Replication attempts fail when their results
indicate that the effect, if it exists at all, is too small to have been
detected by the original study. This new standard (1) circumvents the problems
associated with existing standards, (2) arrives at intuitively compelling
interpretations of existing replication results, and (3) suggests a simple
sample size requirement for replication attempts: 2.5 times the original
sample."

[1]
[http://ejwagenmakers.com/papers.html](http://ejwagenmakers.com/papers.html)

[2] [http://opim.wharton.upenn.edu/~uws/](http://opim.wharton.upenn.edu/~uws/)

[3]
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588)

[4] [http://www.p-curve.com/](http://www.p-curve.com/)

[5]
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2259879](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2259879)

------
drpgq
I guess at the end of the day, the need to publish something is so strong that
statistical flukes end up being published. It's too bad that once something's
out there it can take a long time to be overturned.

------
cLeEOGPw
Many very smart people today believe in time travel. Many top scientists
believe P = NP, etc. It's not abnormal to believe things that are not yet
proven to be false. It's quite reasonable.

------
eruditely
When you're going to grandstand, you should have more content/paragraph ratio.
What is this?

------
bayesianhorse
Bayesians for the win!

