
Causality in machine learning - maverick_iceman
http://www.unofficialgoogledatascience.com/2017/01/causality-in-machine-learning.html
======
nkurz
I worked on a project a few years ago for a UC Berkeley biostatistics
professor who believes he has a theoretically proven approach of using non-
randomized observational data to create an asymptotically-efficient unbiased
estimator of counterfactual treatments. My impression as a programmer who has
only dabbled in machine learning is that this is a phenomenal claim far beyond
the state of the art.

My job was to take an inefficient proof-of-concept R package, and make it
computationally and memory efficient enough to run on real-world datasets. I
failed totally. The obvious explanation would be that I just wasn't able to
understand the math involved well enough to implement the algorithm despite
immersing myself in it for months. My personal guess though is that both the
paper and the reference implementation were flawed in some way that made the
task impossible.

Anyway, the paper is here:

Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal
Marginal Structural Working Models

Petersen, Schwab, Gruber, Blaser, Schomacher, and van der Laan; J Causal
Inference 2015

[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405134/pdf/nih...](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405134/pdf/nihms675179.pdf)

And the R package here:
[https://github.com/joshuaschwab/ltmle](https://github.com/joshuaschwab/ltmle).

My understanding was never perfect, but my belief is that there is kernel of
insight in this approach that has not yet been explored in machine learning.
Alternatively, maybe it works as is, and just needs a better implementation.
I'd love to see someone implement this approach, or fix it, or discredit it.
As it is, I think it's potentially incredibly valuable work that is getting
very little attention.

~~~
nurettin
This is very interesting from a development perspective. You seem to have
tested the working parts of the algorithm and broke it down to multiple passes
when appropriate, I am surprised your work did not reach completion. You did
all the right things.

What exactly was considered wrong with the output? How did this fail?

~~~
nkurz
_How did this fail?_

I'm still trying to understand this, and I worry that much of it is due to my
personal failings. I think I'm much worse than most people at making headway
on problems when I know that my understanding is flawed, and I rationalize
this as part of my moral code as a sort of "first do no harm". But I'm scared
to discard my moral instincts for sake of convenience, out of fear that I
wouldn't know where to stop.

In this case, I couldn't find an interpretation of the paper that matched my
interpretation of the sample code, and I couldn't find an interpretation of
either that seemed plausibly correct. The flaws in each seemed so obvious to
me that I had to presume that I was not interpreting either correctly, and I
concluded that I must be missing something essential.

There wasn't much in the way of test structure other than eyeing up the
results and declaring it to be OK. I was scared that if I inadvertently
implemented something incompatible with both the paper and the code, my errors
would never be caught. None of the grad students seemed to fully understand
the paper, and the professor wasn't familiar with the R code. My clumsiness
with the terminology of the field made it difficult for me to communicate with
the professor.

I feel terrible about the whole thing, but don't know what in particular I
should have done differently.

~~~
shurtler
Hey, the key assumption (the "identification" assumption) is in 2.4 of the
paper you referred to. I do not have time to go through the notation, but it
seems to be like a fairly standard "no unmeasured confounders" assumption - so
you need to ASSUME that nothing influences both the treatment and the outcome
you are interested in. Then, of course, you do not need to randomize, but can
use observational data.

I know that Petersen and van der Laan are well-respected researchers in
causality that know exactly what they are doing. E.g., Petersen has a course
(I think at Berkeley) that uses causal graphs which were pioneered by Judea
Pearl (see comments below). I can only second the recommendation to dig into
his work.

------
dajohnson89
The unofficial blog phenomenon intrigues me. This is a great post, but what
happens when an unofficial blog (or Twitter account) posts something
controversial? Breaks the law? What recourse does the company (or govt agency)
have?

I'm not really denouncing the unofficial blogs, but I can see it making PR
people uncomfortable, and understandably so. On a personal level, it seems
like an easy way to use the name of your employer (btw, how do we know they're
really employees of X) as a springboard for popularity, without the huge
responsibility that comes along with officially representing the organization.

------
nonbel
Causality has never been of much interest to me. I never understood why others
make such a big deal out of it. Maybe it is a real thing, maybe illusion,
definitely some kind of heuristic... but shouldn't we be searching for useful
"laws" (eg F ~ m1*m2/r^2) rather than "cause"? I know some have denied
causality a place in science:

"In the following paper I wish, first, to maintain that the word "cause" is so
inextricably bound up with misleading associations as to make its complete
extrusion from the philosophical vocabulary desirable; secondly, to inquire
what principle, if any, is employed in science in place of the supposed "law
of causality" which philosophers imagine to be employed; thirdly, to exhibit
certain confusions, especially in regard to teleology and determinism, which
appear to me to be connected with erroneous notions as to causality."
[http://www.hist-analytic.com/Russellcause.pdf](http://www.hist-
analytic.com/Russellcause.pdf)

~~~
nerdponx
It's critically important whenever you are being asked to develop a policy of
some kind, whether it's public policy or a business procedure. If you only
understand relationships and not the causal direction relationships, you can
end up making very bad decisions. E.g., "black men drop out of school more
often than white men, therefore we shouldn't waste time educating black men."

~~~
nonbel
But the alternative I proposed is coming up with quantitative laws, not vague
astrology like "Race A does this more often than Race B, therefore we need to
do less of C".

That is an interesting connection to make though. If people view __wild
speculation __as the alternative focus of research, incorporating the concept
of causality would at least slow them down. This would reduce the amount of
misinformation generated and number of destructive policies implemented.
Still, I think causality offers more than that.

~~~
nerdponx
How do you distinguish a "law" from "astrology" in this case?

~~~
nonbel
A law will make some precise quantitative prediction, "astrology" will make
vague predictions like "A is correlated with B", or "A is positively
correlated with B".

Here is a spielraum[1] of possible experimental results, with the * indicating
the region consistent with a law:

    
    
      1) |------*------|
    

Here it is for a vague speculation that there is "some correlation" (ie
astrology):

    
    
      2) |******-******|
    

You can see it will be much easier to find evidence consistent with #2 than
#1, even if there is nothing to the ideas at all.

[1] Paul Meehl. 1990. Appraising and Amending Theories: The Strategy of
Lakatosian Defense and Two Principles That Warrant It. Psychological Inquiry
1990, Vol. 1, No. 2, 108-141.
[http://rhowell.ba.ttu.edu/meehl1.pdf](http://rhowell.ba.ttu.edu/meehl1.pdf)
[figure 3]

------
tbonza
Correlation is not causality. I'm so glad someone put this article together

------
lngnmn
Causality must be defined as a fully-observable model and then proved by a
replicable experiment, it cannot be established by any number of mere
observations. Statistics or probability are unrelated to causality _in
principle_.

~~~
randcraw
"Causal inference in statistics: An overview", Judea Pearl

[http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf](http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf)

"If correlation doesn’t imply causation, then what does?", Michael Nielsen

[http://www.michaelnielsen.org/ddi/if-correlation-doesnt-
impl...](http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-
causation-then-what-does/)

~~~
lngnmn
> If correlation doesn’t imply causation, then what does?

Experiment. Prove of _implementation_ , the way molecular biologists do it.

~~~
pc2g4d
The answer is more complicated than that. Pearl has shown that it some
situations it's possible to answer causal questions without running
experiments. Basically, you can ask causal questions using the "do-calculus"
he developed:

P(Dog barks | do(kick the dog) )

The `do` modifies your model to treat "kick the dog" as observed, breaking its
dependence on any other variables, e.g. the doorbell ringing or the cat
hissing. With those links broken, the model becomes simpler, and often the
causal question can be answered using observational rather than experimental
quantities.

At least, that's the idea. Not sure how broadly accepted it is, but Pearl is a
towering figure and his work seemed robust at least to my not-so-expert eyes.

~~~
lngnmn
That would still be a strong correlation, but not causation. One could add a
weight to each observable state and select for the biggest weight so far, but
it would have nothing to do with the actual causes. Would a dead dog bark when
kicked?

There are some principles which cannot be undone by any amount of hipsterism
and sophisticated sectarian bullshitting. There is no way to jump from
observation to causality without knowing the implementation. At least in this
particular universe.

In the realm of models (or ideas) it could be seem doable, but a map is not a
territory, model does not represent reality until proven experimentally.

~~~
mhermher
Here's the canonical example:

A is correlated to C. B is correlated to C. A is NOT correlated to B.

How is that possible?

The argument is that, if causation is unidirectional and acyclical, then the
only causal structure that leads to the above is that A and B both cause C.

There's no other way to do it. And you can derive causation from observational
data!

Of course in real life there are a bunch of problems with this, particularly
measurement error and hidden unmeasured or unmeasurable variables. But more or
less those exist with experimental data as well. You can only conclude things
about observed variables and a very large number of unmeasured variables could
throw off your inference.

Anyway, I more or less held your belief on this until I read Pearl's (and
colleagues') material. There's much more expansion to it than the canonical
example above.

~~~
lngnmn
Assuming unproven causality is exactly how uneducated primitive people used to
form their religious beliefs.

Churches are related to people. People are related to the Sun. The Sun is
caused by the prayers of the people in churches. Sun is in the sky today
because someone somewhere prayed.

