

Cause And Effect: A New Statistical Test That Can Tease Them Apart - mazsa
https://medium.com/the-physics-arxiv-blog/cause-and-effect-the-revolutionary-new-statistical-test-that-can-tease-them-apart-ed84a988e
&quot;The key assumption is that the pattern of noise in the cause will be different to the pattern of noise in the effect. That’s because any noise in X can have an influence on Y but not vice versa.&quot;[...] &quot;That’s a fascinating outcome. It means that statisticians have good reason to question the received wisdom that it is impossible to determine cause and effect from observational data alone.&quot; https:&#x2F;&#x2F;medium.com&#x2F;the-physics-arxiv-blog&#x2F;cause-and-effect-the-revolutionary-new-statistical-test-that-can-tease-them-apart-ed84a988e
======
learnstats2
This statistical test for causation (X->Y) is based on the idea that X and Y
each contain noise - noise present in X flows causally to Y but noise present
in Y won't flow back to X.

But, even if true, it isn't clear that this makes for a good test. For
example, it's plausible that Y could have a damping effect and remove noise,
which would reverse the results of the test.

"They say the additive noise model is up to 80 per cent accurate in correctly
determining cause-and-effect." This has been exaggerated by Medium from
"accuracies between 65% and 80%" in the original article.

But a coin-flip model should be 50% accurate. 65% accuracy is unconvincing.
The journal article's conclusion admits that their results are not
statistically significant in any sense. As such, the results do not even meet
the weakest possible scientific standard. They couldn't reproduce earlier
published results in this field (typical of publication bias).

Their final paragraph concludes that there is surely a method of doing this,
but they just haven't found that method here.

In my opinion, the results do not support that conclusion.

~~~
Matumio
I think we are talking about independent samples of X and Y, not about time
series. If X causes Y, then you model y = f(x, u), where U is a random
variable independent of X (think: unexplained, e.g. noise). I don't think
there can be any dampening effect in this setup. This model is generic: you
can find a f(x, u) for any relationship between X and Y. But you may get a
much simpler noise model (like additive gaussian noise) in one direction. It's
no proof, but a strong hint (Occam's razor). There is also the family of
algorithms like IC* and FCI that can recover a causal graph from statistical
dependencies between random variables. As output you get a set of causal
graphs that are still plausible given the observed dependencies, including
constraints about the presence or absence of latent common causes.

~~~
rcthompson
i.i.d. variables are uncorrelated.

~~~
Matumio
pardon, I meant independent samples of course (edited)

------
cafebeen
This isn't as generally useful as the title suggests... due to these
assumptions:

"that X and Y are dependent (i.e., PXY=PXPY), there is no confounding (common
cause of X and Y), no selection bias (common effect of X and Y that is
implicitly conditioned on), and no feedback between X and Y (a two-way causal
relationship between X and Y)"

~~~
gwern
> This isn't as generally useful as the title suggests...

My understanding is that the more general case of causal discovery, where you
have lots of variables, is already somewhat solved by Pearlian techniques (if
you have A/B/C you can conditionalize on each and see what graph fits best -
all independent, C confounding A & B, A -> B -> C, etc). But these techniques
break down when it's just A and B: A->B and B->A look symmetrical - the
problem is so simple that it's hard, because there's no C which can help break
the symmetry and suggest something about the underlying causal graph.

OP is interesting because it points out that even without a C, A and B often
_aren 't_ symmetrical in one particular way. So now pairs can be attacked as
well as triplets and bigger; and maybe it points the way to new ways to tackle
more complicated datasets.

~~~
cafebeen
Certainly interesting--wish the article mentioned the Pearlian techniques. My
issue is with the overstated claims, e.g.

"Statisticians have always thought it impossible to tell cause and effect
apart using observational data. Not any more."

Maybe including "bivariate" in the paper's title could have helped.

------
panarky
Here's a tool Google built called CausalImpact to go beyond correlation and
get at cause and effect in time-series data.

[http://google-
opensource.blogspot.com/2014/09/causalimpact-n...](http://google-
opensource.blogspot.com/2014/09/causalimpact-new-open-source-package.html)

And their related research into using Bayesian structural time-series models
to infer cause and effect.

[http://research.google.com/pubs/pub41854.html](http://research.google.com/pubs/pub41854.html)

------
NoMoreNicksLeft
> Another dataset relates to the daily snowfall at Whistler in Canada and
> contains measurements of temperature and the total amount of snow. Obviously
> temperature is one of the causes of the total amount of snow rather than the
> other way round.

This isn't obvious to me at all.

It's true that rainfall causes trees (and that drought can kill them). But
it's less obviously true that trees (in massive numbers) can affect the
regional climate enough to cause rain. They do this by pumping water out of
the ground and increasing humidity, by changing wind patterns, etc.

When trees cause rain, it's a lesser effect than when rain causes trees, but
it's still there.

So when someone tells me that it's obvious that hundreds of thousands of tons
of frozen, powdered water laying on the ground doesn't cause the temperature,
I have to wonder if they're a retard.

------
mazsa
"The key assumption is that the pattern of noise in the cause will be
different to the pattern of noise in the effect. That’s because any noise in X
can have an influence on Y but not vice versa."[...] "That’s a fascinating
outcome. It means that statisticians have good reason to question the received
wisdom that it is impossible to determine cause and effect from observational
data alone." [https://medium.com/the-physics-arxiv-blog/cause-and-
effect-t...](https://medium.com/the-physics-arxiv-blog/cause-and-effect-the-
revolutionary-new-statistical-test-that-can-tease-them-apart-ed84a988e)

~~~
PeterWhittaker
I don't think it's hype or over stating things to suggest that this may be the
most significant advance in practical statistics and methodologies for
scientific investigation in years, perhaps decades.

Like many brilliant ideas, it seems so obvious in retrospect, another great
"Why didn't I think of that?" moment.

~~~
jacobn
"We simplify the causal discovery problem by assuming no confounding,
selection bias and feedback."

That's a pretty big caveat.

Not saying the paper is not important, just that it's not (yet?) a full
solution to the overall conundrum.

~~~
UhUhUhUh
Particularly since the confounding issue is really enormous in science. And
sits at the core of the example the article gives in introduction... That
would be an achievement in itself to build an experiment without
confounder(s).

------
righttoremember
In econometrics this approach is called "identification thorough functional
form" because it relies on assumptions about the exact distribution of some is
the variables.

The main problem is that it requires making assumptions that are very hard or
impossible to test. Nonetheless it's an interesting idea, but I doubt this
method can replace randomized trials or instrumental variables except in a
tiny fraction is cases

------
jsprogrammer
>Obviously temperature is one of the causes of the total amount of snow rather
than the other way round.

Can someone explain how this is 'obvious'?

How can this be a claimed scientific way to tell cause and effect and then
drop a sentence like that in the middle of the explanation?

Even if you accept that it's true that temperature determines snowfall, it
seems there is likely some feedback loop in there. The fallen snow doesn't
just disappear, wouldn't it affect later measured temperatures? Remove a bunch
of (cold) snow from an area and the average temperature of the area should
increase faster than if you had left the snow, no?

~~~
raverbashing
" The fallen snow doesn't just disappear, wouldn't it affect later measured
temperatures?"

Unless your weather station is totally covered by snow, no.

And the effect of snow in the total thermal capacity of a city is small. Just
compare its mass to the mass of "all things" (buildings, etc)

~~~
FreeFull
Snow and ice does have an overall effect on climate, due to its high albedo
reflecting light that would otherwise be absorbed and turned into heat.

------
snowwrestler
This strikes me as a fairly useless test, because it only works in situations
where you are sure there are only 2 variables, and you're trying to determine
which one is dependent. Such a situation only happens in a carefully
controlled experiment--and in those situations, you can easily determine
causation by creating counterfactual tests.

What people really want to know is whether statistics alone can be used to
exclude hidden shared causes from an uncontrolled data set. Even the article
itself uses such an example: the impact of hormone replacement on heart
disease.

This test does not further that goal. I remain convinced that it is
impossible. In fact to my understanding, that is the origin of the scientific
method: rather than accepting conclusions from the first data set, science
constructs hypotheses and tests to exclude hidden causes.

------
Zak
Ideally, upon finding correlated variables, one would perform an experiment,
changing one to see if it causes the other(s) to change. Looking at noise
enables the same principle to be applied when the researcher lacks the ability
to perform such an experiment.

------
fitshipit
It's like all statistical tests -- it works really well (provably well) when
the assumptions it requires hold. However, it's usually impossible to know if
those assumptions hold without holding the desired answer in the first place.
That's why nonparametric tests are so popular (not saying they have much to do
with the article at hand, but people are definitely willing to get less
definitive results in exchange for making fewer assumptions).

------
xtacy
Nice article. I think the fact that testing if "X-caused-Y", by exploiting the
fact that this is not symmetrical, has also been used by the "pseudo-
causality" Granger causality test:
[http://en.wikipedia.org/wiki/Granger_causality](http://en.wikipedia.org/wiki/Granger_causality)

Also, causality in reality can be quite complicated if there are feedback
loops: X-causes-Y-causes-X.

------
raverbashing
Would be interesting to test this in data such as these:
[http://www.tylervigen.com/](http://www.tylervigen.com/)

------
streptomycin
Reminds me of
[http://www.pnas.org/content/104/16/6533.full](http://www.pnas.org/content/104/16/6533.full)
\- interesting, but probably only applicable to very simple systems. If you
have various complex interconnections between components, simple A -> B
reasoning is not helpful.

------
Xcelerate
Heh, so maybe we can finally figure out if the "random" correlations in
quantum mechanics are really random or if there's a cause.

(I'm joking of course, but has anyone ever actually rigorously analyzed
quantum random data?)

~~~
bdcs
[http://noosphere.princeton.edu/](http://noosphere.princeton.edu/) \- The
Global Consciousness Project "When human consciousness becomes coherent, the
behavior of random systems may change. Random number generators (RNGs) based
on quantum tunneling produce completely unpredictable sequences of zeroes and
ones. But when a great event synchronizes the feelings of millions of people,
our network of RNGs becomes subtly structured. We calculate one in a trillion
odds that the effect is due to chance. The evidence suggests an emerging
noosphere or the unifying field of consciousness described by sages in all
cultures."

Looks pretty, shall we say, wacky, but they are supposedly finding
correlations in quantum RNGs.

------
keithpeter
Has anyone zipped up the data sets referenced in the paper in a handy file at
all? Just before I start right clicking...

------
RoboTeddy
From the paper
([http://arxiv.org/abs/1412.3773](http://arxiv.org/abs/1412.3773)),

> Concluding, our results provide evidence that distinguishing cause from
> effect is indeed possible by exploiting certain statistical patterns in the
> observational data. However, the performance of current state-of-the-art
> bivariate causal methods still has to be improved further in order to enable
> practical applications.

------
kevinfindlay
So is it worth investing time in this to see if there are any practical
applications?

------
yarrel
Sonitus post hoc ergo sonitus propter hoc.

------
dang
We changed the URL from
[http://arxiv.org/pdf/1412.3773v1.pdf](http://arxiv.org/pdf/1412.3773v1.pdf)
because, with some exceptions (such as computing), HN tends to prefer the
highest-quality general-interest article on a topic with the paper linked in
comments.

This comes up often enough that it is a good case for linking related URLs
together, which is something we intend to work on in the new year.

~~~
_delirium
I think with science that can be a dangerous preference, because of the
tendency of the general-interest articles to overstate (and/or incorrectly
state) the results. This one is better than most, especially compared to the
junk that comes out of university press offices. But it still somewhat
oversells the result compared to the more modest claims of the paper. The
paper's introduction and conclusion are pretty accessible, imo, and a better
representation of the findings than the Medium blogpost is.

~~~
dang
That's why I said "tends to prefer". There's no rigid policy—it's a lot of
case by case, and feedback from HN users makes the biggest difference in
deciding them. You may be right on this one, but it's a marginal call, and the
HN comments have clarified the situation quite a bit either way.

There is indeed an industry based on blowing scientific findings out of all
proportion, and we do a lot to keep most of that at bay. But the solution
isn't as simple as always linking to the papers, and there's no hope of
keeping HN immune from this systemic dysfunction, only of correcting its worst
excesses.

------
_almosnow
Interesting, a year ago this was one of the challenge say Kaggle. Given a set
of sample pairs determine which one of them (if any) is causing the others.

