
If correlation doesn’t imply causation, then what does? (2012) - Rickasaurus
http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/
======
kevinalexbrown
Good reading in addition to Pearl himself is via Cosma Shalizi.

Ref list:
[http://vserver1.cscs.lsa.umich.edu/~crshalizi/notebooks/caus...](http://vserver1.cscs.lsa.umich.edu/~crshalizi/notebooks/causality.html)
Chapters:
[http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/ch22.pdf](http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/ch22.pdf)
[http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/ch23.pdf](http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/ch23.pdf)
[http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/ch24.pdf](http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/ch24.pdf)

Incidentally, Shalizi is a great source for going back to the basics. His
course at CMU "Advanced Data Analysis from an Elementary Point of View" has
course notes and exercises available gratis:
[http://www.stat.cmu.edu/~cshalizi/uADA/13/](http://www.stat.cmu.edu/~cshalizi/uADA/13/).
And any time I need to get a reading list, the best place to start is usually
his 'notebooks' page.

~~~
saint-loup
I concur. His notebook is a wonder of great resources on many (many many)
subjects, with often interesting comments.
[http://vserver1.cscs.lsa.umich.edu/~crshalizi/notabene/](http://vserver1.cscs.lsa.umich.edu/~crshalizi/notabene/)

------
cousin_it
Sometime ago I tried to come up with the simplest possible explanation for
Simpson's paradox. This was the result:

1) Imagine that most women with a certain disease survive, while most men die.

2) Imagine that most women with the disease take a certain medicine, while
most men don't.

3) Imagine that the medicine has absolutely no effect. Women just happen to
have better innate resistance to the disease, and also just happen to buy the
medicine more because it's marketed to women.

Now if you do a statistical analysis without counting men and women
separately, you will conclude that the medicine is very correlated with
survival!

Note that there's no way to know in advance that you should slice the
population along such-and-such variables, which can be a lot more subtle than
just gender. Also note that the example works even if the medicine has a
slight negative effect, i.e. you can reverse the direction of correlations by
choosing to slice or not to slice.

I think such results make it clear that you can't easily trust conclusions
from statistics. One minute you're thinking that cholesterol causes heart
disease, and the next minute you're asking yourself, what if cholesterol is
part of the body's response to heart disease? That's why we need randomized
controlled studies, and theories of causality.

~~~
TillE
> most women with the disease take a certain medicine, while most men don't

Unless "most" = nearly all, this kind of situation is not difficult to
untangle if you have sufficient and appropriate data.

~~~
ChikkaChiChi
The point is well made. Any sort of deviation from control that exhibits a
pattern will directly influence the outcome of your results.

------
snowwrestler
This article is largely focused on epidemiological studies. These are a very
well-known type of scientific investigation, particularly among the tech crowd
--because it is tech-intensive to manage and analyze large volumes of data.

But, epidemiology is not all of science, it's just one way to do science. So
this statement is wrong, or, at least, not complete:

> The standard scientific answer to this question is that (with some caveats)
> we can infer causality from a well designed randomized controlled
> experiment.

The more general scientific answer is that we need an hypothesis that can be
tested and disproven by observation. Such observations can come from a
randomized controlled trial, but a surprising amount of important scientific
observations do not. For example, a major confirming test of relativity was to
measure the apparent displacement of stars due to gravitational lensing during
an eclipse. Obviously there is no way to create a random double blind study of
this. (Unless one were to stare too long without eye protection! [rimshot])

But wait, I could hypothesize that saying "blutarski" when flipping a coin
will make it land heads up, and if I only do the observation once, my stupid
hypothesis might get confirmed. Right?

In this case, yes, a lengthy trial would help provide much better
observations. But the more general answer is that a reasonable hypothesis must
propose a physical system that could plausibly cause the predicted result.
There is no known physical system that connects "blutarski" with coins, or
Facebook traffic with Greek bonds. But there is a plausible physical mechanism
by which smoke inhalation could lead to lung cancer.

This is what scientists are talking about when they say things like "there's
more to science than curve-fitting." But most people do not understand the
difference, and that's why we see things like global climate circulation
models get mixed up with historical climate reconstruction, or paleontology
get mixed up with evolutionary biology.

~~~
Goladus
This is an excellent explanation.

I'm not sure what you mean by paleontology being mixed up with evolutionary
biology, though.

~~~
snowwrestler
A lot of critics of evolutionary theory believe that the fossil record is the
most important piece of supporting evidence--hence the focus on "missing
links" between species.

The scientific reality is that Darwin formulated his theory by looking at
_living_ species, and the modern study of evolutionary processes does too--
primarily viruses, bacteria, and insects, because their lifespans are so
short.

------
da-bacon
Also see the follow up with responses from Judea Pearl:
[http://www.michaelnielsen.org/ddi/guest-post-judea-pearl-
on-...](http://www.michaelnielsen.org/ddi/guest-post-judea-pearl-on-
correlation-causation-and-the-psychology-of-simpsons-paradox/)

~~~
kenjackson
Thanks. This is a treat to see this article. I've long wanted to read Pearl's
book on this topic, but haven't had the chance to carve out the time. I'm
going to dive into this article now.

------
tlarkworthy
This is the fundamental reason why general AI might not be possible on a
computer without a body. To infer causality, you must form a hypothesis __then
design an experiment __to confirm /deny it. Passively observing the world
can't disambiguate between complex correlations or causality (even with a
fancy calculus). You _need_ action to learn the intricacies of the world.

Think how the discovery of electricity led to electronics. There is nothing
like electronics in the natural world, the only route to electronics is via an
iteratively refined causal model of the universe.

~~~
gwern
> This is the fundamental reason why general AI might not be possible on a
> computer without a body.

'Body' is a rather misleading term to use for what a general AI _might_ need.

And I say 'might' because part of the motivation for the Pearlean program is
for discovering under what data and conditions one can infer causality without
randomized interventions.

------
nbouscal
Relevant XKCD: [http://xkcd.com/552/](http://xkcd.com/552/)

~~~
surement
Thanks! That's a great one.

------
joe_the_user
Correlation + plausible based on your knowledge of the world implies causation
(obviously to the appropriate degree).

It's the flip-side of extraordinary claims require extraordinary evidence.

Facebook driving Greek debt is implausible and two vaguely shaped curves
aren't enough. A formula that predicts to many decimal places over a fair
period, prospectively, would be really weird but hard to ignore.

Spanish debt, gold prices or something related on the other hand, would
require proportionately less evidence for causation.

This disturbs people because it implies you can get stuck in a pathological
view of the world and statistics and evidence won't get you out. Sorry about
that.

Edit: And this is pretty the Bayesian Approach. It's just that Bayesian
Statistics and Pearl arguments, are themselves just models of the world. You
can have others but, even more, you need more of an arguments than "stuff that
seems implausible needs more evidence" and all of the more elaborate stuff is
going to be specific to a given situation and thus might be applicable to a
different one.

~~~
raverbashing
"Facebook driving Greek debt is implausible"

Yes, but in reality _you don 't know that_

Plausibility is a subjective measure, and while I would say that, yes, it can
be a hint, you cannot disregard something merely because it's implausible

~~~
Robin_Message
> a hint, you cannot disregard something merely because it's implausible

Coming back to Bayesian statistics, the word for this is _prior_ , and its not
that you disregard evidence, its that you can quantify both your existing
beliefs about reality _and_ the change in your beliefs according to the
evidence you see.

Firstly, our hint: we have a prior assumption of the probability that the
popularity of Facebook is driving up Greek debt (call it P(Fg)). Then, we
observe a correlation between these two things. For the sake of argument, I'm
going to make this 0.001 (I'd probably estimate less).

Now, once we see this correlation, we now need to calculate two things: 1. The
probability of observing that correlation (call that P(C)). Note that the more
extraordinary the correlation, the less probable it is, and the smaller this
term would be. In this case, the graph matches vaguely, I'm going to give it a
probability of 0.1.

2\. Given a world where there is a causation, what is the probability we'd see
this correlation (Q: I'm not 100% on this part). Now, Greek debt could
plausibly be driven by other things, which would mask the Facebook effect, so
there's no guarantee there would be a correlation. This term is called P(C |
Fg), and I have no idea what value to give it. Let's try 0.5.

What we want to know is: P(Fg | C), that is, the probability of a connection
given we have observed a correlation.

Boom! P(Fg | C) = P(C | Fg) x P(Fg) / P(C)

So our posterior probability (after observing this correlation) changes from
0.001 to 0.001 x 0.5 / 0.1 = 0.005

------
memla
No amount of correlation seems to _imply_ causation. Suppose there is some
event X that always causes two other events, Y and Z. Let's also say that Y
and Z are exclusively caused by X. Then Y has a 100% perfect correlation with
Z, but ex hypothesi it is not caused by it.

------
SagelyGuru
It is dangerous to assume causality from any data alone. (Data and statistics
are over-rated nowadays). You need to do the harder work of discovering the
proper mathematical model (equation) relating explicitly the dependent
(caused) variables to the independent (causing) variables. In the absence of
such a verified and proven model, you just can not take a shortcut of pulling
causality out of statistics, like a rabbit out of a hat. Incidentally, the
"Simpson's paradox" to which so much attention is given here, is a trivial
illustration of the fact that you can not meaningfully add percentages from
differing amounts. Something every school kid ought to learn.

~~~
AlexFinks
But why is it dangerous, on balance, to make assumptions of causality from
data and statistics alone?

Animals, such as rats and ravens, face this problem all the time, and yet they
can meaningfully effect the world in such manner that would imply causal
understanding, and a sensitivity towards the difference between mere
correlation or a correlation with causal potential.

Humans do the same as well, naive people who have never learned about
experimental design, or have never learned the concept of correlation, also
make useful judgments on the causal model behind ordinary problems and events.

How did these machines make actionable judgments on causality with nothing
more than noisy inputs to their sensory systems? Through what technique did
they discern the difference between mere correlation, and a correlation with
exploitable causality?

~~~
radarsat1
> naive people who have never learned about experimental design, or have never
> learned the concept of correlation, also make useful judgments on the causal
> model behind ordinary problems and events.

They also are often... racist. Or hold whatever other stereotypes to heart.
Racism is just a good example of an extreme position to hold which is often
due to assumptions of causality.

"Lots of minorities are in prison. There is a high correlation between being a
minority and being in prison. Therefore being a minority leads to being a
criminal."

This completely ignores external reasons why minorities might end up in prison
more often than others. For instance, it could be that minorities have an
equal amount of criminal activity as the general population, but are more
likely to end up in prison because of it. Correlation does not imply
causation.

I think the number of social issues that arise due to assumptions of causation
is quite high, actually, and often leads to poor decision making in policy.
That is why it is "dangerous."

------
nhebb
_> North: Democrat (145/154, 94 percent), Republican (138/162, 85 percent)

> South: Democrat (7/94, 7 percent), Republican (0/10, 0 percent)

> Overall: Democrat (152/248, 61 percent), Republican (138/172, 80 percent)_

This example is often cited for Simpson's Paradox, but because the South
headcount for Republicans (10) and Democrats (94) had such a wide disparity
and the Republican headcount was so low, it always seemed more like twisting
the data to fit a political narrative than to make a statistical argument.
Like other statistical methods, there should be a minimum population size for
the paradox to be meaningful.

~~~
haldujai
[http://www.bmj.com/content/309/6967/1480.full](http://www.bmj.com/content/309/6967/1480.full)
[https://en.wikipedia.org/wiki/Simpson's_paradox#Kidney_stone...](https://en.wikipedia.org/wiki/Simpson's_paradox#Kidney_stone_treatment)

Has larger sample sizes, all would be valid for statistical analysis. Although
this paradox isn't really dependent on a sample size, it is not so much a
statistical test. It's a trend that can be observed when looking at sub groups
compared to overall averages.

In the above example open surgery was better for both small and large kidney
stones and is a prime example of Simpson's paradox. However, one should note
that Simpson's paradox can also be subject to further confounding variables.
Perhaps patients that were suitable for open surgery were in better medical
condition and therefore the open surgery sample was non-random and caused the
higher success rate.

The presence of Simpson's paradox, similar to correlation, also does not imply
causation.

------
javert
> We can’t have X causing Y causing Z causing X! At least, not without a time
> machine.

You should be able to have cyclic graphs, though. W could cause X which causes
Y which causes Z which causes more X.

~~~
bigfudge
Perhaps you just need to include time. Does X_1 causing Y causing Z causing
Z_2 solve your problem without requiring cyclic graphs?

~~~
javert
Yes, but then the nodes in the graph become even more disconnected from the
concretes in reality that they are supposed to represent.

I am actually highly skeptical of the entire notion of the article, but I
haven't had time to finish the article yet. My skepticism comes from the fact
that causal relationships can be explained by identifying and understanding
the causal factors at play and drawing relevant conclusions. For example,
smoking causes lung cancer because exposure to toxic chemicals causes genetic
mutations. Existing mathematics (such as basic statistics, including
correlation) can be used as additional empirical evidence to support such an
explanation.

------
graycat
> what does?

(1) A clear mechanism. Data. My car won't run. Cause. The universal joint at
the differential for the rear wheels failed leaving the rear end of the drive
shaft on the ground.

(2) A solid scientific theory. Data. I let go of the 2 x 4, and it hit my foot
and hurt. Cause. Newton's law of gravity.

(3) Other. Data. There is a correlation between smoking and lung cancer.
Cause. Guess that there are some chemicals in cigarette smoke that cause lung
cancer. Without actually finding the chemicals, basically test the heck out of
the connection, i.e., look for and reject (statistically as in an hypothesis
test) other candidate causes, see if cigarette smoke does cause mutations
(causing mutations are easier to test for than causing cancer, and nearly
every chemical that causes cancer also causes mutations; so, if a chemical
doesn't cause mutations, then it likely doesn't cause cancer; if the chemicals
in cigarette smoke do cause mutations, then can't reject that they cause
cancer and have to keep entertaining that the chemicals might cause cancer and
have to keep testing), reject _spurious_ correlations, do some more tests that
might reject causality and observe that they do not reject, look for other
causes, work hard, get tired, give up, and finally conclude that have done
enough work and time to stop smoking.

~~~
new_test
Doesn't sound like you understand what the discussion is about.

~~~
graycat
I responded directly to the question in the title of the post here at HN:

> If correlation doesn’t imply causation, then what does?

not directly to the article or the discussion here on HN.

I don't think that the article makes much sense.

I gave a very simple answer, (1) and (2). For (3), that's a mess and closer to
the discussion.

My view is that for _causality_ , what I gave with (1) and (2), simple,
childishly simple, dirt simple, is, unfortunately, in reality, about all there
is to the poor, struggling subject.

Put more respectfully, beyond my simplistic (1) and (2), working with
_causality_ is super difficult and quite unpromising as in mostly just f'get
about it. For my (3) it boils down to ways to reject causality and then we
accept causality once we get so tired trying to reject it we just give up and
accept it. Why? Because without something detailed and mechanical, say, from
chemistry, biochemistry, and the forbiddingly complicated, detailed
biochemistry of cells, we are missing anything very solid to call _causal_.
E.g., in my (1), with the universal joint that failed, we have a simple
explanation that makes a solid _cause_ ; getting something so simple and solid
for a cause of cancer from smoking will be super difficult. Don't worry, I
don't smoke, but neither do I claim really to know a _cause_ of cancer.

 _Causality_ is a great, intuitive idea for humans and animals, but looked at
in detail it's tough to establish in all but some narrow situations.

For _causal networks_ , _path analysis_ , _Markov random fields_ , directed
acyclic graphs, lots of diagrams with circles and arrows, f'get about it.

For getting _causality_ out of data analysis, mostly just a fool's errand --
f'get about it.

There is a significant reason I concentrated on (1) something mechanical and
(2) something from classic physics -- those two darned near cover what can be
done with _causality_. For the biological sciences -- causality is really
important but really tough. For the social sciences, they try and try, and my
wife did in her Ph.D. in essentially mathematical sociology and my brother did
in his Ph.D. in political science, but, net, watching my wife and brother
struggle with trying to make causality work in social science, where it's so
easy to make it with (1) and (2), I just said f'get about it.

You can entertain my views and my first very short post as a contribution to
the discussion based on a lot of background and a claim that more is a fool's
errand or just chalk it up to my ignorance.

One more point: I didn't even mention correlation. Why? Because correlation is
so far from causality that it's hardly worth even mentioning.

------
logical42
Correlation does imply causation. It just doesn't necessarily establish it as
a fact.

~~~
JoshTriplett
Correlation correlates with causation, but "correlation correlates with
causation" doesn't imply "correlation implies causation".

~~~
logical42
Depends what you mean by 'imply'.

~~~
foobarbazqux
[http://dictionary.reference.com/browse/imply](http://dictionary.reference.com/browse/imply)

The third meaning is the one that "correlation does not imply causation"
refers to. There is no other meaning for that word in that context.

~~~
mortehu
As described in the article, "correlation does not imply causation" would hold
true even if you take "imply" to mean "is weak evidence for". This is due to
Simpson's paradox[1], which says that correlation can be inverted when you
take into account an additional distinguishing factor in your data.

A specific example. Look at [2] and take the "y" axis to be "cigarettes per
day" and the "x" axis to be "life expectancy in years". The overall trend is
that more cigarettes leads to a shorter life. However, if we find that the
blue group are people who do not inhale, and the red group are people who do
inhale, the correlation is reversed for all smokers -- smoking more leads to a
longer life.

1\.
[https://en.wikipedia.org/wiki/Simpson's_paradox](https://en.wikipedia.org/wiki/Simpson's_paradox)

2\.
[https://en.wikipedia.org/wiki/File:Simpson%27s_paradox_conti...](https://en.wikipedia.org/wiki/File:Simpson%27s_paradox_continuous.svg)

~~~
foobarbazqux
Where do hypotheses come from if correlation isn't weak evidence for
causation?

~~~
mortehu
Usually, you have some additional information, aside from the correlation
itself. For example, instead of just knowing "some variable X correlates with
some variable Y", you may know what X and Y actually are, and some facts about
similar entities.

~~~
foobarbazqux
But don't you have that additional information in the Simpson's paradox
examples?

------
skatenerd
[https://en.wikipedia.org/wiki/David_Hume](https://en.wikipedia.org/wiki/David_Hume)

------
zalmkleurig
[http://en.m.wikipedia.org/wiki/Granger_causality](http://en.m.wikipedia.org/wiki/Granger_causality)

------
im3w1l
>Obviously, it’d make no sense to have loops in the graph:

I stopped reading here. Not saying models without loops cannot be useful
though.

~~~
davorak
Here is my understanding from a non-expert. The obviousness is suposed to come
from if `a` causes `b`, `b` can not have caused `a` because that would need
some form of time travel.

`b` could cause `a2` though which is very similar to `a` but is separated in
time from `a`.

~~~
im3w1l
Yeah that makes sense. But then he uses the model wrongly, since he puts
"hidden factor" "smoking" "lung cancer" in the vertices, and not "hidden
factor (t=0)", "smoking(t=1)" and "lung cancer(t=2)". Further if used like
that, then a possible hidden factor could actually be "lung cancer(t=0)",
since it could (conceivably) cause both "smoking(t=1)" and "lung cancer(t=2)".

~~~
davorak
> But then he uses the model wrongly, since he puts "hidden factor" "smoking"
> "lung cancer" in the vertices, and not "hidden factor (t=0)", "smoking(t=1)"
> and "lung cancer(t=2)".

The direction of the arrows shows causation with also implies time has passed
though the amount of time is not specified in the model he is using, though it
could be added. He has a section latter in the article about why he left it
absent earlier and covers how it could be included.

> Further if used like that, then a possible hidden factor could actually be
> "lung cancer(t=0)", since it could (conceivably) cause both "smoking(t=1)"
> and "lung cancer(t=2)".

If lung cancer actually caused smoking then yes that would a simple solution.
If that was the answer though the very next question would be what caused
"lung cancer(t=0)". Which "smoking(t=-1)" if existed would be suspect, and
another "hidden factor"(s) of course as well.

------
maaku
A predictive model and Bayes theorem.

------
gazd
Isolated correlation?!

------
wissler
David Hume said it first:
[http://en.wikipedia.org/wiki/Constant_conjunction](http://en.wikipedia.org/wiki/Constant_conjunction)

------
dschiptsov
multiple causation.))

------
nsxwolf
This is seriously in need of a tl;dr

~~~
jrn
I've read the book, a scattered 2 times and couldn't give you a tl;dr.

~~~
physicslover
I read chapters of it a few years back, too, but forgotten the details. All in
all, I was pretty convinced of the soundness of the approach at the time.

My recollection is that this framework of causal inference allows one to ask
questions about a probabilistic model that one can then try and measure to
test causality.

These questions are interventions or assertions (the do operators) that
something happened.

So one would start out with a Graphical Model like in the smoking example
which defines a probability model, and then make do assertions on the model
for various candidate causes and see what that would imply about the change in
probabilities and then design an experiment to measure them and confirm them.

~~~
jrn
I thought it was a good approach as well, but I have a hard time fitting it
into the math I know. I've been learning category theory so I may take another
stab at placing it. But this all takes time.

And the book's layout is fairly disjointed.

I may just begin experimenting with the theory, rather than placing it, my
life is short.

