
Distinguishing Cause From Effect Using Observational Data - cyang08
https://medium.com/the-physics-arxiv-blog/cause-and-effect-the-revolutionary-new-statistical-test-that-can-tease-them-apart-ed84a988e#.80rcx6pq8
======
gwern
[https://arxiv.org/abs/1412.3773](https://arxiv.org/abs/1412.3773)
[https://medium.com/the-physics-arxiv-blog/cause-and-
effect-t...](https://medium.com/the-physics-arxiv-blog/cause-and-effect-the-
revolutionary-new-statistical-test-that-can-tease-them-apart-ed84a988e)

~~~
sctb
Thanks, we updated the link from [http://www.vocativ.com/335705/correlation-
causation](http://www.vocativ.com/335705/correlation-causation) to this.

------
cschmidt
There is a popular science book on this topic as well:

Why: A Guide to Finding and Using Causes

[https://www.amazon.com/Why-Guide-Finding-Using-Causes-
ebook/...](https://www.amazon.com/Why-Guide-Finding-Using-Causes-
ebook/dp/B0184Q1RSA)

It is sitting on my bookshelf, but I haven't managed to get to reading it yet.

------
kem
This has been discussed in the stats literature for awhile now. It's an
interesting idea but makes lots of assumptions about the nature of noise
versus signal. It could be really useful in some situations, but in others it
would be totally useless, depending on how realistic the assumptions are in
any given scenario.

------
throwwit
Is the reasoning behind the method a corollary to compressed sensing?

------
LoSboccacc
so p0.8, wasn't 0.95 the standard for claims once?

~~~
aab0
No, because a classification accuracy is not a p-value. By construction, a
random guesser would achieve 50% accuracy in guessing whether A~>B or A<~B for
each pair of cause-and-effects in their dataset. So getting >50% accuracy is
the goal here.

~~~
te
Interestingly, the authors do acknowledge on p. 46 that their sample size is
too small to obtain a statistically significant result:

A rough estimate how large the CauseEffectPairs benchmark should have been in
order to obtain significant results can easily be made. Using a standard
(conservative) Bon- ferroni correction, taking into account that we compared
37 methods, we would need about 120 (weighted) pairs for an accuracy of 65% to
be considered significant (with two-sided testing and 5% significance
threshold). This is about four times as much as the current number of 37
(weighted) pairs in the CauseEffectPairs benchmark. Therefore, we sug- gest
that at this point, the highest priority regarding future work should be to
obtain more validation data, rather than developing additional methods or
optimizing computation time of existing methods. We hope that our publication
of the CauseEffectPairs benchmark data inspires researchers to collaborate on
this important task and we invite everybody to contribute pairs to the
CauseEffectPairs benchmark data.

~~~
aab0
If you want to distinguish a particular method, but you can definitely tell
that overall, the methods are collectively outperforming chance and so in this
dataset, it _is_ possible to infer the direction of causation.

