
Causal Inference Book - onuralp
https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
======
ehudla
There's also this primer, recently out: [https://www.amazon.com/Causal-
Inference-Statistics-Judea-Pea...](https://www.amazon.com/Causal-Inference-
Statistics-Judea-Pearl/dp/1119186846)

------
lpage
There are very interesting and fairly recent results on causal discovery under
additive noise models [1]. Although such models aren't universally applicable
the underlying concept is intuitive and a good fit for many problem domains.
On the time series front, google open sourced CausalImpact [2] for Bayesian
structural time-series modeling a few years ago. Looks like RankScience [3] is
putting that research to good use.

I'm surprised that causal discovery doesn't get more play. Aside from the
direct applications to scientific research, causality is a strong consistency
hint for assessing models/features learned from data.

[1]
[https://news.ycombinator.com/item?id=8776582](https://news.ycombinator.com/item?id=8776582)

[2] [https://opensource.googleblog.com/2014/09/causalimpact-
new-o...](https://opensource.googleblog.com/2014/09/causalimpact-new-open-
source-package.html)

[3]
[https://news.ycombinator.com/item?id=13552862](https://news.ycombinator.com/item?id=13552862)

~~~
dhfhduk
The theory you mention about additive noise, is clever but doesn't strike me
as robust at all. It's been awhile since I looked at that literature, but when
I did it seemed really unrealistic, that at some level it reduced to assuming
that all deviations from normality are interpretable in terms of the desired
causal inferences. There might be some scenarios where you could reduce the
variables involved to the point where that idea is feasible, but otherwise it
seemed really unbelievable to me.

Part of me wants to dive into this causality modeling, because it seems up my
alley, but I'm very sceptical of it showing anything definitive. I do
observational research, but short of a priori randomization, I'm sceptical of
any claims to causality. Even then, with experiments, I'm deeply sceptical
unless something has been replicated across various secondary conditions by
multiple distinct groups.

Modern causality theory and modeling has definitely raised the bar in terms of
what we say about data, and I love it, but sometimes I wonder if causality is
a red herring. Even with hard experimental evidence, I'm tempted to not
interpret it beyond "when someone does X, this tends to happen."

~~~
gwern
It does seem too good to be true, but they've compiled a real-world dataset of
causal relationships, and the additive-noise and fancier ML algorithms _do_
seem to infer the right direction well above chance, so there's at least
something there.

------
benrawk
Check this one out, it is the classic in book on causal inference:
[https://www.amazon.com/Experimental-Quasi-Experimental-
Desig...](https://www.amazon.com/Experimental-Quasi-Experimental-Designs-
Generalized-Inference/dp/0395615569)

~~~
apathy
Nah, Robins started (asterisk) all this causal/propensity-scored/pretend-
experimental mania.

It's what drove me back to hard interventional experiments and eventually to
adaptive clinical trial design.

(Asterisk): ok more like Pearl started it and Robins made it more practical
with doubly robust designs etc. I still find most of the mechanics rather
shady.

~~~
rrherr
> I still find most of the mechanics rather shady.

Interesting, can you expand on this? I have no experience with causal
inference and would like to learn more. Thanks!

~~~
apathy
Look up confounding by indication and some of the power/sensitivity studies
for so-called doubly robust estimators. Counterfactuals are reasonable. A lot
of "let's turn this observational study into a designed experiment with math"
approaches turn out not to be. That's the gist of it; if you want rigor, you
should read the papers. I don't have a reference to hand at the moment (on
phone) but it shouldn't take more than a few minutes of searching google
scholar to hit the appropriate vein. The bottom line is simply TANSTAAFL.

~~~
rrherr
Have you seen the new paper, “Human Decisions and Machine Predictions”?
[http://scholar.harvard.edu/files/sendhil/files/w23180.pdf](http://scholar.harvard.edu/files/sendhil/files/w23180.pdf)

I'm wondering if their methodology is reasonable?

From the abstract: “Millions of times each year, judges must decide where
defendants will await trial—at home or in jail. By law, this decision hinges
on the judge’s prediction of what the defendant would do if released. … Yet
comparing the algorithm to the judge proves complicated. … We only observe
crime outcomes for released defendants, not for those judges detained. This
makes it hard to evaluate counterfactual decision rules based on algorithmic
predictions. … We deal with these problems using different econometric
strategies, such as quasi-random assignment of cases to judges. … A policy
simulation shows crime can be reduced by up to 24.8% with no change in jailing
rates, or jail populations can be reduced by 42.0% with no increase in crime
rates. Moreover, we see reductions in all categories of crime, including
violent ones. Importantly, such gains can be had while also significantly
reducing the percentage of African-Americans and Hispanics in jail. … While
machine learning can be valuable, realizing this value requires integrating
these tools into an economic framework: being clear about the link between
predictions and decisions; specifying the scope of payoff functions; and
constructing unbiased decision counterfactuals.”

~~~
apathy
they seem to be fixated on how shiny and new L1 penalties are, in 2014. Greg
Ridgway started using gradient boosting machines (GBM) for propensity scoring
in the early 2000s, and I didn't see them cite him, so I kind of hate them
already. On the other hand, at least GBM works well.

I'm no economist, though. Perhaps this is novel at NBER. It's just odd to see
someone acting like using an ensemble to enable data-driven model selection is
something new.

nb. I didn't read the entire 76-page paper (partly because it's obscenely
verbose). A quick skim and here are my from-the-hip remarks. If they suck,
I'll refund every cent you paid me ;-)

------
hollerith
It makes me feel sorry for the researchers that Harvard puts ads at the bottom
of their web page.

------
jwtadvice
I read through about 10 pages of the first book ("without models"). It struck
me as very nearly identical to current statistics practice. It clearly
differentiated itself as discussing counterfactuals (the data needed to
actually determine causality) but I could not find the section of the book
that described how counterfactual data can be inferred from missing data
(without it being "turtles all the way down").

Does anyone in this area have a succinct way to explain how counterfactual
data can be inferred by these techniques - and how traditional statistics
practice is not able to perform this inference?

~~~
apathy
Part II describes what you are asking for.

I wish abbreviations like inverse probability weighting and marginal
structural models were expanded in the second book. It's annoying to have to
look up "IP weighting" only to discover "oh, it's IPW, god damn you [authors]
to hell".

MSMs are interesting. Now I remember why Robins' name stuck in my head. The
book shows all the math explicitly, which is nice, and it delves into causal
inference for time-to-event data, which is also nice.

Now I'm curious whether they look at piecewise constant survival models for
time-varying coefficients. It's mentioned, but I didn't read enough to see if
it's treated in detail. If it is, everyone who does A/B testing should read
the book, because this is one of those "little details" from biostatistics
that becomes super important at big retailers (like, say, Amazon, where the
principal economist at the time pointed it out to me).

~~~
apathy
Answered my own question -- Part III of this book is what I am waiting for. I
was going to send the authors a note, which seems silly, but my name is in an
awful lot of standard textbooks just because I sent in corrections or notes,
and for some reason I find that satisfying. Like I made some sort of a
difference to some students somewhere.

~~~
diego898
Sorry if I missed something - where do you see a part III?

~~~
apathy
It's mentioned at the end of Part III, in chapter 17, on survival models.

------
fulafel
Haven't read the book, but the concept of causal inference from event data
really deserves more attention. Automatic/assisted cause analysis in complex
systems has huge potential.

~~~
onuralp
Do you have any particular example or domain in mind?

Regarding the automatic analysis, Automatic Statistician (Tenenbaum &
Ghahramani et al.), which is supposed to automate / assist the exploratory
analysis, comes to mind. Not sure if it'd be fair but I tend to lump some of
the probabilistic programming - such as BayesDB - platforms into the same
bucket. While these are exploratory tools, they are ideally (pragmatically)
meant to help identify associations and context whereby correlation is highly
suggestive of causation.

Another area of interest: one of the comments below mentioned adaptive
clinical trial design. I know that Bayesian trials are intensively implemented
in some cancer research centers such as MD Anderson (off-topic: curious to
what extent IBM Watson was used as part of this protocol). I would love to
know whether adaptive trials are increasingly popular or widely used in pharma
industry.

~~~
fulafel
I was thinking about debugging & failure modes, in distributed systems.
There's potential to automatically test hypotheses by starting systems from
known states.

------
onuralp
I have seen an earlier version of this book recommended here on HN, and
thought that some might be delighted to know that there is a revised version
available to download.

------
brians
Surely the title should be "causal"?

~~~
eanzenberg
Yes-- And don't call me Shirley!

