Hacker News new | past | comments | ask | show | jobs | submit login
The Book of Why: The New Science of Cause and Effect [pdf] (berkeley.edu)
192 points by elcritch 6 days ago | hide | past | favorite | 38 comments

Great book that clearly explains the perils or inferring causation and now you can deal with them constructively.

For those who want more he has a more technical book "Causality: Models, Reasoning and Inference" which is also excellent.

Pearl is a legend in the field, who wrote the seminal "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference".

There is a school of thought that there is nothing new here. Pearl is very honest and open about who invented what and of the mistakes in his earlier work. While no one finding is entirely new here, the overall package adds up to a lot IMHO. And this after having intensively studied statistics and probability over many years. It really changed my approach to and understanding of causality. And most importantly it gave me reliable intuitions on the subject. After his books, things seem obvious to me that others struggle with.

I like Pearl's work a lot more than his opinions on who invented what, and who is right about what. For that reason, I would wholeheartedly recommend his books but I would also recommend skipping any sideshows about related literature.

Fact is, Pearl selectively picks literature concerned with causality and, in particular, literature not successfully tackling the subject. He does ignore many other approaches to the issue, especially parallel developments to tackle problems in fields that he specifically critiques. In other words: Pearl, next to being a great researcher, is also a showman who knows how to build a following.

The essence of the debate is this: Neither Pearl's framework, nor anyone else's capture all valid approaches to causal inference. One can construct cases in Rubin's framework that DAG can not solve and vice-versa. The downside to Pearl's approach is that it is - right now - more difficult to implement. The cases where DAG undoubtedly succeeds better than other approaches are, in a sense, unlikely to succeed as a practical research projects.

That being said, a great strength of such graphical models is that they allow quite sophisticated reasoning in several well-known simple but non-intuitive cases. Such reasoning otherwise requires an immense amount of experience and / or education on the pitfalls of causal inference. That is also a reason why I would like to see this framework taught more in schools.

All in all, as another great post in this thread has pointed out, much of the debate is in violent agreement on base issues. Once this issue transcends the egos involved, much progress will be made and that is, in my view, very exciting.

One can construct cases in Rubin's framework that DAG can not solve and vice-versa


I'll try to dig it up, saw it a while ago on a forum.

Another thing might be that in Rubin's framework it's immediately straightforward to do semi-parametric estimation and get consistency and all that. I'd say in practice that's probably not the first thing to do for DAGs, where writings are focused on toy models (the question then would be: how do I get to the correct DAG?).

Edit: This was posted itt, it has some examples of Rubin's framework (potential outcomes) that can not be identified in the DAG framework https://arxiv.org/pdf/1907.07271.pdf

For balance and discussion, Andrew Gelman's critical review of 'The Book of Why': https://statmodeling.stat.columbia.edu/2019/01/08/book-pearl...

Thanks for sharing this! I knew that Gelman and Pearl have been going back and forth for years, and it's great to see this.

I fall heavily on the Pearl side of this debate, because I'm interested in scaling to thousands to millions of variables, and though graphs are not a great tool, they are pretty much the only one that I think we have for scaling that direction.

That said, I think their disagreements don't have much effect on practice, and have as much to do with their different academic lineages as anything else. I expect that any dispute will be resolvable as the science progresses

> I fall heavily on the Pearl side of this debate, because I'm interested in scaling to thousands to millions of variables, and though graphs are not a great tool, they are pretty much the only one that I think we have for scaling that direction.

Are you able to elaborate?! It'd be really interesting to hear what/if tooling exists for these graphs. I just found the book recently and am listening to the Audible version. It also reaffirms my desire for a "Bayesian inference" spreadsheet like tool. Something that'd help organize a few dozen thoughts ideas for researchers/engineers.

> a "Bayesian inference" spreadsheet like tool


R package "daggity" is excellent.


I've used libDAI in the past, but I'm not sure if it's maintained anymore. I've come across several other toolkits that looked interesting since then, but never had the right project. There's a group at (CMU?)/Pitt that I think now has a Python package for causal inference in particular. (Causality is far more complicated and harder to discover than a lot of the stuff I work towards though, which is all in biology.)

Thousands and millions of variables approaches deep learning w/ softmax. It is difficult to see how it would work any other way. I mean would you want to have a causal model of particles?

Why would Pearl's approach be better at scaling? If anything, the Rubin causal model scales wonderfully with more variables.

This is a strange debate. They seem to be in violent agreement.

Balance is the wrong word. you might want to check first who was AG's doctor father. Surprise, it was Rubin.

Recognizing that not many will care about my opinion on the subject, I will link to this excellent (and IMO accurate) paper by Guido Imbens:


One of the problems in groups is that there is a belief that the loudest and most self-confident individual is correct. I believe that is the case with Pearl. He attracts a following based on his personality, but when it comes to actually doing empirical research, he comes up short.

"Separate from the theoretical merits of the two approaches, another reason for the lack of adoption in economics is that the DAG literature has not shown much evidence of the alleged benefits for empirical practice in settings that resonate with economists....In contrast in the DAG literature, TBOW, [Pearl, 2000], and[Peters, Janzing, and Sch ̈olkopf, 2017] have no substantive empirical examples, focusing largely on identification questions in what TBOW refers to as “toy” models. Compare the lack of impact of the DAG literature in economics with the recent embrace of regression discontinuity designs imported from the psychology literature, or with the current rapid spread of the machine learning methods from computer science, or the recent quick adoption of synthetic control methods developed in economics [Abadie and Gardeazabal, 2003, Abadie, Diamond, and Hainmueller,2010]. All three came with multiple concrete and detailed examples that highlighted their benefits over traditional methods. In the absence of such concrete examples the toy models in the DAG literature sometimes appear to be a set of solutions in search of problems, rather than a set of clever solutions for substantive problems previously posed in social sciences, bringing to mind the discussion of Leamer on the Tobit model ([Leamer, 1997])."

i get the sense that back in the 90s, statisticians must have been really rude and dismissive to Judea Pearl, and he might not have gotten over it.

it seems very plausible to me that at the time, many people outside CS would have considered a graph to be a type of illustration not a "legitimate" mathematical structure, and wouldn't have understood or cared about computational complexity arguments about performing inference or computing independence.

it does seem to be the case that for various reasons, for economics and population-level social science, there's limited advantage to using DAGs.

I have also noticed as a causal inference outsider in machine learning land, people use DAGs or SWIGs to write down assumptions, and then translate to potential outcomes to derive estimators. And for people who do automated causal discovery, I haven't heard of anyone trying to use a potential-outcomes-based representation for their system.

so i buy Pearl's argument that DAGs are the "right" data structure for representing causality, both for humans and computers.

But many social scientists may not have any reason to care about this, because automated causal discovery is hopeless for their applications, and for methodological reasons any reasonable model has to be such that representing things directly as potential outcomes is manageable anyway.

> it does seem to be the case that for various reasons, for economics and population-level social science, there's limited advantage to using DAGs.

A tangent: on the one hand, I do think DAGs are the best thing since sliced bread, and I wish more people would embrace them and add them to their toolbox as a fundamental modelling tool. On the other hand, the next step is realization that DAGs aren't sufficient. Lack of cycles is nice for analysis and implementation, but it's also a drawback. The world is running on feedback loops, and yet this seems to be a secret restricted only to specialists deep inside their respective fields. I'd wish the public was more accustomed to working with dynamic models.

(I don't think cyclic graphs help much with causality analysis, as presumably cycles would imply time travel.)

yeah, i've wondered about this too.

the natural thing to do is to just unroll the models in time. the classic case of this is a hidden Markov Model, but you can easily have a more complicated time-indexed DAG.

There are also some interesting connections to reinforcement learning (MDP can be expressed as DAG e.g. https://www.microsoft.com/en-us/research/wp-content/uploads/..., and then many RL algorithms are equivalent to causal inference estimators).

it seems like people deal with equilibria and feedback loops in a pretty ad-hoc way though.

That's a long paper and I am interested to read it.

From what I have heard, there is more adoption of Pearl's methods in Epidemiology and in Social Sciences, in contrast with Econometrics.

To me, it reminds me of Dyson Freeman's, "Birds and Frogs". For most Statistician's and Economists Regression is their goto tool and the Potential Outcome Framework is much more natural to them. You could say the same about the synthetic controls methods you mentioned.

I find this dialog to be reminiscent of the controversy surrounding Wolfram's A New Kind of Science. Perhaps one could frame it as the theorists versus the pragmatists.

I disagree with that characterization of Pearl's work. Pearl's methods are actually already being used in many fields like Epidemiology.

I didn't mean to imply that theorists don't produce valuable work, or that theory does not necessarily lead to practice. The distinction I intended is that some people prefer to work off of an underlying model, while others are content so long as they get the results they desire.

For example, contrast symbolic processing with artificial neural nets. Symbolic processing has a very solid philosophical basis and it can be used to solve many meaningful problems. Some problems, however, are so complex or nuanced that there are insufficient computing resources to implement a solution based on symbolic processing. Artificial neural networks can be used to address those complex problems, yet we lack the theory (at this time) to really understand the full limits or capabilities of complex artificial neural networks.

My reference to Wolfram was about his proposal to cast everything as automata. Even though that does not seem practical for all or even most problems, it provides a certain comfort, much like lambda calculus provides. Beyond the comfort of a solid philosophical grounding, automata also give us a way to approach many problems (such as simulation) in a principled manner.

Put another way, some people like to understand first, and use that understanding to discover results. Other people want the results first, and then seek to understand based on the results they found.

If you’re interested in learning more about causality, you might be interested in my blog series working through Pearls “Causal Inference in Statistics: A Primer”: https://github.com/DataForScience/Causality (Jupiter notebooks and links to blog posts)

Nice Causal Inference book from Hernan and Robins (pdf available for download): https://www.hsph.harvard.edu/miguel-hernan/causal-inference-...

Hernan demonstrates how to put this to work in epidemiology. Quite interesting. Pearl's work should be seen in context though as a effort to create reasoning machines (AI), i.e., in automating reasoning.

If you're interested in learning more about causal inference and 'do calculus', you might like this course and free textbook:


Meta question: what's the workflow for creating PDFs like this? Is the PDF generated from markup-style text with a certain application or process?

I notice a lot of academic papers have a particular 'feel' to them, using a subset of typography styles and presentational guidelines, especially where mathematical formulae are presented to the reader.

Note: I do not have an especially academic background, nor any experience in university (tertiary) education. Please excuse my unintended ignorance on this subject. Thank you.

This is probably done with LaTeX or one of its variants

See here: https://en.m.wikipedia.org/wiki/LaTeX

Most STEM papers are written with latex, this book likely is too.

Thanks for sharing, this makes me want to move the book up my list of to-reads. As a side-note, I’m blown away to learn that Judea Pearl is the father of Daniel Pearl — can’t imagine the amount of composure it must take to continue to operate at the highest levels in his field after experiencing grief of that nature.

It's a great book; an enjoyable read even for someone like me who's mostly afraid of statistics. Pearl's enthusiasm for the topic is palpable, and he seems to have the academic pedigree to match the occasionally lofty claims.

The book makes a very clear case in walking the layperson through various causal techniques and underlining how they differ from traditional methods. Few popular science books actually go into such detail.

It's a good book, but some may be turned off by the first chapter or so that reads like he's trying to rub his victories into the faces of his academic adversaries. It had more than a little air of gloating to it, in my opinion, and the book would have been better without that. (Though he spent considerable time on the earlier precedents for his work, which I had previously thought was entirely novel.)

There is, again, principles from real philosophy, which state that there is and always will be a principal gap between an observation (and language) based model and What Is.

The classic example is inability to tell why the sun will rise tomorrow by merely observations of it doing so and any kind of statistics.

The "why" is in understanding of the nature of the process we call Sun and realization that such kind of process cannot be stopt or even change in 24 your.

Another modern illustration of the same principle is the principal, infallible inability to infer the actual wiring if processor from the level of code it excites.

This, by the way, is the very same Upanishadic principle of inability to infer the true nature of Brahman from the level of human intellect, conditioned by a language and experience.

Some things will remain only guesses, models and "scientific", (or rather sectarian) consensus.

Just noting: paper (and book) came out in 2018. The book is well worth reading if you missed it then.

"Why" is not a query of causality, but rather purpose. The proper word would be "How".


Aristotle holds that there are four kinds of answers to "why" questions:

Matter (the material cause of a change or movement): The aspect of the change or movement that is determined by the material that composes the moving or changing things. For a table, such might be wood; for a statue, such might be bronze or marble.

Form (the formal cause of a change or movement): A change or movement caused by the arrangement, shape, or appearance of the thing changing or moving. Aristotle says, for example, that the ratio 2:1, and number in general, is the cause of the octave.

Agent (the efficient or moving cause of a change or movement): Consists of things apart from the thing being changed or moved, which interact so as to be an agency of the change or movement. For example, the efficient cause of a table is a carpenter, or a person working as one, and according to Aristotle the efficient cause of a boy is a father.

End or purpose (the final cause of a change or movement): A change or movement for the sake of a thing to be what it is. For a seed, it might be an adult plant; for a sailboat, it might be sailing; for a ball at the top of a ramp, it might be coming to rest at the bottom.

I think if you remove consciousness from the equation, identifying causality would be a reasonable response to why. However, there is an underlying assumption that causality actually exists. In that case, I think the existence of causality would have to be taken as axiomatic.

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact