A Philosopher Reviews Judea Pearl's “The Book of Why”

epistasis · on Sept 5, 2019

I like the review, but the criticism of distinguishing causality from counterfactual reasoning feels weak to me. Do we actually care about the counterfactual reasoning most? Of course. But establishing causality as its own thing before counterfactuals is necessary in the way that Pearl has structured his math. And even the grammar of human languages enforces this separation of concepts. Do you need causality to even have the concept of counterfactual? Of course. Is there value in describing that causality before going on to counterfactuals? Yes I think so, because of the difficulties of current counterfactual algorithms.

Separating out causality opens the door to doing counterfactual reasoning in better ways. Since whatever method is in the Book of Why confused the reviewer, it's probably not the best way to look at counterfactuals, so the causality bits can be taken without establishing the counterfactual methods as the be-all end-all.

For that matter, DAGs versus cyclic directed graphs or time-based DAGS is still a big concern too. Many if not most of our causal reasoning will have loops, when time is not accounted for, and DAG formulations make this difficult to unroll. There may be big improvements in causal modeling, without considering the counterfactuals even.

Also, in Pearl's prior 2000 book on Causality, it's clear that Pearl gave plenty of credit to Spirtes, and it always seemed that they were working in parallel and on very similar problems; I'm not sure how much Spirtes took from Pearl, but Pearl makes clear that his ideas are heavily informed by Spirtes' work.

empath75 · on Sept 5, 2019

I am not sure how to define "cause" without introducing counterfactuals.

"This happened because I did that" seems to imply "Had I not done that, this would not have happened" -- a counterfactual.

entropicdrifter · on Sept 5, 2019

I'm not sure that's always the case.

Couldn't you have a scenario where your action was the direct cause of an outcome that might have come about some other way without your involvement at all?

For instance, if a ball rolls down a hill because you pushed it, but a breeze blows a moment after you pushed the ball, if that breeze was strong enough to push the ball down the hill too, then you have a scenario where "that ball went down the hill because I pushed it" is true, but "had I not pushed the ball, it would not have rolled down this hill" is not true.

hyperpape · on Sept 5, 2019

You're 100% correct.

The point isn't just about causality. As an undergraduate, one of my philosophy professors (Bill Lycan) told a class "never attempt to define anything in terms of counterfactuals. No matter what it is, there will be decisive counterexamples."

To be super-clear, this is not a statement about Pearl, and it's not denying that counterfactuals are interesting, just a claim that you can't define much of anything in terms of them.

P.S. He actually said "analyze", but the way philosophers use that term, it's appropriate to make the substitution to avoid confusion.

skybrian · on Sept 5, 2019

Yes, what you're describing is basically an "or" gate. The counterfactual is "what if neither cause happened?"

shkkmo · on Sept 5, 2019

right, but the potential existence of multiple potential causes (an "or" gate) is what makes this reasoning a fallacy:

> "This happened because I did that" seems to imply "Had I not done that, this would not have happened" -- a counterfactual.

The point is that constructing the counterfactual requires a more than just knowledge of a single causal link, but a broader set of causal knowledge about many causal links.

Language is messy and ambiguous and we do often colloquially use "X caused Y" to imply the truth of the counter factual "If X had not happened Y would not have happened". Interestingly, a different tense such as "X causes Y" generally does not have the same counter factual implication.

ssivark · on Sept 6, 2019

That seems like an unintentional straw man. The actual counterfactual is: "had I not pushed the ball, it would not have rolled down this hill [at the moment it did]". That says nothing about whether some other cause could not push the ball over at a later instant. Equivalently (if you don't like separating out the causes in time), if formulated as an OR operation on a bunch of causes, the counter factual would be: "Had I not pushed the ball, either the ball would not have rolled down the hill, or there would have been another cause for pushing it down the hill."

epistasis · on Sept 5, 2019

Maybe this is a problem with terminology then; because these definitions are clearly distinct and have entirely separate methods when you formalize them in math.

Perhaps we should call Pearl's Rung 3 counterfactuals "subjunctive" or something entirely else to distinguish the concepts and methods of Rung 2 and Rung 3. But these rungs are extremely different in formulations and models, and are clearly distinct. If you want to call rung 2 "simple counterfactuals," well I'm not super into that sort of debate as long as we all agree to the meaning of the terms. And if philosophy finds the distinction between Pearl's Rung 2 and Rung 3 difficult to accept, then it may also mean that philosophy has not yet discovered what Pearl has formulated.

longtom · on Sept 6, 2019

No, these two statements are not equivalent. Perhaps what you have in mind is the contrapositive "This didn't happen, so I didn't do it". More formally, A -> B is equivalent to -B -> -A, but it is not equivalent to -A -> -B.

rocqua · on Sept 5, 2019

The DAG, directed graph, or time based DAG question is quite interesting. It seems to me though that time is so interwoven with causality that it is superfluous to include it. Hence, at a glance, I would argue for normal Directed graphs (allowing cycles) with the implicit understanding that following edges means time increases

mannykannot · on Sept 5, 2019

I am not sure Maudlin's example of counterfactual causal reasoning is counterfactual -- though, for all I know, it might be rung-three causal reasoning by Pearl's definition. All the subject seems to be doing is searching the tree of possible outcomes in order to select the most desirable one. She could only say “had I let go of the lamp it would have shattered” after having chosen an alternative course of action and then done it, and by then, of course, it can have no causal role in the decision-making.

mark_l_watson · on Sept 5, 2019

I have only read part of this book, so far.

I think counter factual reasoning or thought experiments are fundamental to the way we humans think about the world.

epistasis · on Sept 5, 2019

As I understand it, that counterfactual reasoning is fundamental to human thought seems to be one of the main motivating factors of Pearl's entire causality push, and precisely why it's not OK to ignore causality in science and statistics and machine learning.

ngcc_hk · on Sept 6, 2019

The key is between hume rejection and data only, there are spectrum of links. Double bind rct ...

adolph · on Sept 5, 2019

See also a review by Andrew Gelman:

https://statmodeling.stat.columbia.edu/2019/01/08/book-pearl...

stewbrew · on Sept 5, 2019

Well, his a priori is rather skewed, though.

ml_basics · on Sept 5, 2019

I see what you did there

enjoylife · on Sept 6, 2019

> “but given my own background I could not but wonder how much farther Pearl would have gotten had he had the training I did as a philosopher.”

The review had some good call outs but given the above quote and a few surrounding criticisms, this review is more pretentious than anything.

Luc · on Sept 6, 2019

The reviewer is Tim Maudlin, one of the foremost philosophers of physics. It's kind of hard to find someone better placed than him to review a book about causality, and his comment seems entirely appropriate.

ncmncm · on Sept 6, 2019

This presents the RCT criterion uncritically, albeit citing a case where it would be hard to apply.

But RCT fails -- gives a nonsense result -- when the hypothesis under test is incoherent. This wouldn't matter, except that RCT is routinely used in such circumstances, and the results treated as gospel by people in positions of authority.

Consider: Outcome X may have six causes A-F. RCT tests B, and finds that varying B only affects one in six cases. With infinitely many trials, the relationship resolves, but with one trial the difference is indistinguishable from noise.

Substitute a medical symptom for X, and a medical treatment that addresses one of six causes, for B. After one RCT, B is "shown" to be ineffective.

The problem is not B. The problem is that X is ill-defined. X could be a mental illness, or tumors in a given organ. How often do we read "anti-depressants shown ineffective"? The only way we have to distinguish one variety of depression from the next is which treatment works.

The problem is not limited to medicine.

fluentmundo · on Sept 6, 2019

A nice point. But this is not a criticism of the RCT so much as a criticism of how its results are used or interpreted.

ncmncm · on Sept 8, 2019

It is a criticism of the notion of RCT as the unimpeachable "gold standard" of evidence. The limits of a tool are the most important thing to learn about it, and, for RCT, few can be bothered.

fluentmundo · on Sept 10, 2019

But it's not a limitation of the tool (the tool does provide gold-standard evidence); it's the stupidity of the researcher using the tool. (The tool works perfectly well and does, in fact, constitute the gold standard of causal evidence.) In your depression example, it's the stupidity of a researcher who fails to consider why a 1-in-6 success might be significant, or fails to consider that the umbrella disease "depression" can be multiply realized by different physiological mechanisms, which in turn require different treatments.

Your beef isn't with RCTs, or with the notion of RCT as evidence. Your beef is with researchers who don't know how to think.

Your complaint is like saying, we should stop considering hammers to be the gold-standard of nail-hitters because some idiots use them to try to turn screws.

ncmncm · on Sept 11, 2019

My beef is with the public and policymakers convinced they have no need to understand the method's limitations, "because it's the gold standard", thus never misleading.

coldtea · on Sept 5, 2019

>Well, there are some caveats even here. The real gold standard is a double-blind experiment, in which neither the subjects nor the experimenters know who is in which group. In the case of car color, we would literally have to blind the drivers, which would of course raise the accident rate considerably.

Not really. It's just enough that we don't tell the red car drivers that they're part of a special group. Just let them think we have equal number of different colors assigned to different drivers (and don't let them see what the others got). Then, the fact that their assigned color happens to be red will hold no significance to them related to the test then...

tfowler · on Sept 6, 2019

What if being assigned a red car caused drivers to drive more carefully because they believe that red cars attract the attention of law enforcement more so than other colors?

anthony_doan · on Sept 6, 2019

> Just let them think we have equal number of different colors assigned to different drivers (and don't let them see what the others got).

They, the drivers, shouldn't know anything about the experimental design in the first place. Let alone equal number of different colors.

mistermann · on Sept 5, 2019

Recent appearance on Sam Harris podcast, I quite enjoyed it....

https://samharris.org/podcasts/164-cause-effect/

#164 - Cause & Effect - A Conversation with Judea Pearl

August 5, 2019

In this episode of the Making Sense podcast, Sam Harris speaks with Judea Pearl about his work on the mathematics of causality and artificial intelligence. They discuss how science has generally failed to understand causation, different levels of causal inference, counterfactuals, the foundations of knowledge, the nature of possibility, the illusion of free will, artificial intelligence, the nature of consciousness, and other topics.

Judea Pearl is a computer scientist and philosopher, known for his work in AI and the development of Bayesian networks, as well as his theory of causal and counterfactual inference. He is a professor of computer science and statistics and director of the Cognitive Systems Laboratory at UCLA. In 2011, he was awarded with the Turing Award, the highest distinction in computer science. He is the author of The Book of Why: The New Science of Cause and Effect (coauthored with Dana Mackenzie) among other titles.

Twitter: @yudapearl

tu7001 · on Sept 5, 2019

I would strongly recommended this podcast.

terminlvelocity · on Sept 6, 2019

I really enjoyed hearing Judea Pearl being interviewed, as I am most of the way through "The Book of Why" and have learned a lot from it. I did feel that Sam steered the conversation a bit too much towards his favorite topics (like free will) and wish there was a bit more discussion of philosophy/history of science, but it was still a great listen.

I first learned of Judea Pearl by stumbling across the transcript of a talk he gave while I was researching DAGs: http://singapore.cs.ucla.edu/LECTURE/lecture_sec1.htm . The way he grounded his talk in the history of thought hooked me, and the talk serves as a good general overview for those deciding if they want to pick up the book.

anthony_doan · on Sept 6, 2019

Post is sweet.

It point out the cliche, "correlation is not equal to causation."

It is a real thing but throwing that quote around without reading into the research or having proper foundation to assess a conducted research is just as bad.

guerrilla · on Sept 6, 2019

Yes, indeed, its also interesting to notice that people struggle when one asks "Why isn't it?" One often gets strange statistical answers that don't really get to the root of anything.

conjectures · on Sept 6, 2019

I mostly agree with Pearl.

However, the account of traditional statistics given in this article is misleading. Randomisation is a major part of traditional stats, and it is inherently a causal hypothesis: breaking the links between unobserved covariates and treatment regimes.

An important alternate contemporary causal inference framework by Rubin has origins in a 1923 thesis...

...but the content of Pearl's approach seems superior; if you ignore the academic spats.

fluentmundo · on Sept 6, 2019

>> Randomisation is a major part of traditional stats, and it is inherently a causal hypothesis

Yes, randomization is central to classical statistics, but no, it is not inherently causal. Drawing a random sample from a bivariate distribution (X,Y) is key to doing a lot (though not all) of classical statistical inference (think of estimating slopes in regression), but the randomization does not imply anything about the causal relationship between X and Y. When you speak of randomization in the context of "treatment regimes," you are thinking about randomized controlled trials, which the piece does analyze explicitly, in some detail. So in this sense the account given in the essay is not misleading.

conjectures · on Sept 7, 2019

I'm afraid you're mistaken. Randomisation allows one to make the strong causal assumption that the treatment regime allocation is unrelated to any of the other variables, observed or unobserved.

Anyway, the section you're pointing to agrees with me. It just happens to be overlooked when they summarise...

fluentmundo · on Sept 17, 2019

No, I am not mistaken; you are confused. And your confusion is very pervasive in the technical community. You're not talking about randomization in general when you speak of a "treatment regime." You're talking about randomization in a causal experiment such as a randomized controlled trial. But "randomization" is a broader thing than randomization in a controlled experiment. The assumption of randomization is made for almost all classical statistical inference, which has nothing at all to do with causation.

Say you want to do basic linear regression: you want to estimate the slope for Y regressed on X. The most stringent form of inference works like this: you draw a random sample (X_i, Y_i), modeled as n independent and identically distributed realizations from the joint distribution (X,Y). Etc. This is certainly a stochastic model; we require randomization (or some approximation of it) to do inference. But it has nothing whatsoever to do with causality.

mistrial9 · on Sept 5, 2019

I had a chance to see Judea Pearl speak at a high-end mathematics conference a few years ago, and asked him personally afterwards about a few things in his talk.. I did not get much satisfaction in the admittedly brief exchange, and therefore unfortunately my impression was that there was some smoke-and-mirrors aspect to his talk, I would guess for some competitive reasons I don't know.. however, the company on stage at that event were of the highest rigor, so take this as you will...

kunkelast · on Sept 9, 2019

This is the content I like to see at HN most of all. Good article, thanks for sharing.

melling · on Sept 5, 2019

I’m 30 pages into the book. It hasn’t grabbed me yet.

Worth finishing?

terminlvelocity · on Sept 6, 2019

The writing style does not change much, and I do find that it's taking me longer than usual to work my way through the book, especially since I'm very interested in the subject matter. That being said, there are some good nuggets of information a little further in. Understanding some of the patters that pop up in the causal scenarios he lays out and being able to think about these with a shorthand or graphically has changed the way I think about complex situations.

ylem · on Sept 6, 2019

I read it in a beach in Greece. It really opened up some new thoughts to me that are relevant to some of the problems that I am thinking about in applying ML in science. It made me want to go back and read some of his more technical work. It's not the easiest of reading, but I definitely found it worthwhile!

ngcc_hk · on Sept 6, 2019

very good article. Highly recommended