For those who want more he has a more technical book "Causality: Models, Reasoning and Inference" which is also excellent.
Pearl is a legend in the field, who wrote the seminal "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference".
There is a school of thought that there is nothing new here. Pearl is very honest and open about who invented what and of the mistakes in his earlier work. While no one finding is entirely new here, the overall package adds up to a lot IMHO. And this after having intensively studied statistics and probability over many years. It really changed my approach to and understanding of causality. And most importantly it gave me reliable intuitions on the subject. After his books, things seem obvious to me that others struggle with.
Fact is, Pearl selectively picks literature concerned with causality and, in particular, literature not successfully tackling the subject.
He does ignore many other approaches to the issue, especially parallel developments to tackle problems in fields that he specifically critiques. In other words: Pearl, next to being a great researcher, is also a showman who knows how to build a following.
The essence of the debate is this: Neither Pearl's framework, nor anyone else's capture all valid approaches to causal inference. One can construct cases in Rubin's framework that DAG can not solve and vice-versa. The downside to Pearl's approach is that it is - right now - more difficult to implement. The cases where DAG undoubtedly succeeds better than other approaches are, in a sense, unlikely to succeed as a practical research projects.
That being said, a great strength of such graphical models is that they allow quite sophisticated reasoning in several well-known simple but non-intuitive cases. Such reasoning otherwise requires an immense amount of experience and / or education on the pitfalls of causal inference. That is also a reason why I would like to see this framework taught more in schools.
All in all, as another great post in this thread has pointed out, much of the debate is in violent agreement on base issues. Once this issue transcends the egos involved, much progress will be made and that is, in my view, very exciting.
Another thing might be that in Rubin's framework it's immediately straightforward to do semi-parametric estimation and get consistency and all that. I'd say in practice that's probably not the first thing to do for DAGs, where writings are focused on toy models (the question then would be: how do I get to the correct DAG?).
Edit: This was posted itt, it has some examples of Rubin's framework (potential outcomes) that can not be identified in the DAG framework
I fall heavily on the Pearl side of this debate, because I'm interested in scaling to thousands to millions of variables, and though graphs are not a great tool, they are pretty much the only one that I think we have for scaling that direction.
That said, I think their disagreements don't have much effect on practice, and have as much to do with their different academic lineages as anything else. I expect that any dispute will be resolvable as the science progresses
Are you able to elaborate?! It'd be really interesting to hear what/if tooling exists for these graphs. I just found the book recently and am listening to the Audible version. It also reaffirms my desire for a "Bayesian inference" spreadsheet like tool. Something that'd help organize a few dozen thoughts ideas for researchers/engineers.
One of the problems in groups is that there is a belief that the loudest and most self-confident individual is correct. I believe that is the case with Pearl. He attracts a following based on his personality, but when it comes to actually doing empirical research, he comes up short.
"Separate from the theoretical merits of the two approaches, another reason for the lack of adoption in economics is that the DAG literature has not shown much evidence of the alleged benefits for empirical practice in settings that resonate with economists....In contrast in the DAG literature, TBOW, [Pearl, 2000], and[Peters, Janzing, and Sch ̈olkopf, 2017] have no substantive empirical examples, focusing largely on identification questions in what TBOW refers to as “toy” models. Compare the lack of impact of the DAG literature in economics with the recent embrace of regression discontinuity designs imported from the psychology literature, or with the current rapid spread of the machine learning methods from computer science, or the recent quick adoption of synthetic control methods developed in economics [Abadie and Gardeazabal, 2003, Abadie, Diamond, and Hainmueller,2010]. All three came with multiple concrete and detailed examples that highlighted their benefits over traditional methods. In the absence of such concrete examples the toy models in the DAG literature sometimes appear to be a set of solutions in search of problems, rather than a set of clever solutions for substantive problems previously posed in social sciences, bringing to mind the discussion of Leamer on the Tobit model ([Leamer, 1997])."
it seems very plausible to me that at the time, many people outside CS would have considered a graph to be a type of illustration not a "legitimate" mathematical structure, and wouldn't have understood or cared about computational complexity arguments about performing inference or computing independence.
it does seem to be the case that for various reasons, for economics and population-level social science, there's limited advantage to using DAGs.
I have also noticed as a causal inference outsider in machine learning land, people use DAGs or SWIGs to write down assumptions, and then translate to potential outcomes to derive estimators. And for people who do automated causal discovery, I haven't heard of anyone trying to use a potential-outcomes-based representation for their system.
so i buy Pearl's argument that DAGs are the "right" data structure for representing causality, both for humans and computers.
But many social scientists may not have any reason to care about this, because automated causal discovery is hopeless for their applications, and for methodological reasons any reasonable model has to be such that representing things directly as potential outcomes is manageable anyway.
A tangent: on the one hand, I do think DAGs are the best thing since sliced bread, and I wish more people would embrace them and add them to their toolbox as a fundamental modelling tool. On the other hand, the next step is realization that DAGs aren't sufficient. Lack of cycles is nice for analysis and implementation, but it's also a drawback. The world is running on feedback loops, and yet this seems to be a secret restricted only to specialists deep inside their respective fields. I'd wish the public was more accustomed to working with dynamic models.
(I don't think cyclic graphs help much with causality analysis, as presumably cycles would imply time travel.)
the natural thing to do is to just unroll the models in time. the classic case of this is a hidden Markov Model, but you can easily have a more complicated time-indexed DAG.
There are also some interesting connections to reinforcement learning (MDP can be expressed as DAG e.g. https://www.microsoft.com/en-us/research/wp-content/uploads/..., and then many RL algorithms are equivalent to causal inference estimators).
it seems like people deal with equilibria and feedback loops in a pretty ad-hoc way though.
From what I have heard, there is more adoption of Pearl's methods in Epidemiology and in Social Sciences, in contrast with Econometrics.
To me, it reminds me of Dyson Freeman's, "Birds and Frogs". For most Statistician's and Economists Regression is their goto tool and the Potential Outcome Framework is much more natural to them. You could say the same about the synthetic controls methods you mentioned.
For example, contrast symbolic processing with artificial neural nets. Symbolic processing has a very solid philosophical basis and it can be used to solve many meaningful problems. Some problems, however, are so complex or nuanced that there are insufficient computing resources to implement a solution based on symbolic processing. Artificial neural networks can be used to address those complex problems, yet we lack the theory (at this time) to really understand the full limits or capabilities of complex artificial neural networks.
My reference to Wolfram was about his proposal to cast everything as automata. Even though that does not seem practical for all or even most problems, it provides a certain comfort, much like lambda calculus provides. Beyond the comfort of a solid philosophical grounding, automata also give us a way to approach many problems (such as simulation) in a principled manner.
Put another way, some people like to understand first, and use that understanding to discover results. Other people want the results first, and then seek to understand based on the results they found.
I notice a lot of academic papers have a particular 'feel' to them, using a subset of typography styles and presentational guidelines, especially where mathematical formulae are presented to the reader.
Note: I do not have an especially academic background, nor any experience in university (tertiary) education. Please excuse my unintended ignorance on this subject. Thank you.
See here: https://en.m.wikipedia.org/wiki/LaTeX
The book makes a very clear case in walking the layperson through various causal techniques and underlining how they differ from traditional methods. Few popular science books actually go into such detail.
The classic example is inability to tell why the sun will rise tomorrow by merely observations of it doing so and any kind of statistics.
The "why" is in understanding of the nature of the process we call Sun and realization that such kind of process cannot be stopt or even change in 24 your.
Another modern illustration of the same principle is the principal, infallible inability to infer the actual wiring if processor from the level of code it excites.
This, by the way, is the very same Upanishadic principle of inability to infer the true nature of Brahman from the level of human intellect, conditioned by a language and experience.
Some things will remain only guesses, models and "scientific", (or rather sectarian) consensus.
Aristotle holds that there are four kinds of answers to "why" questions:
Matter (the material cause of a change or movement): The aspect of the change or movement that is determined by the material that composes the moving or changing things. For a table, such might be wood; for a statue, such might be bronze or marble.
Form (the formal cause of a change or movement): A change or movement caused by the arrangement, shape, or appearance of the thing changing or moving. Aristotle says, for example, that the ratio 2:1, and number in general, is the cause of the octave.
Agent (the efficient or moving cause of a change or movement): Consists of things apart from the thing being changed or moved, which interact so as to be an agency of the change or movement. For example, the efficient cause of a table is a carpenter, or a person working as one, and according to Aristotle the efficient cause of a boy is a father.
End or purpose (the final cause of a change or movement): A change or movement for the sake of a thing to be what it is. For a seed, it might be an adult plant; for a sailboat, it might be sailing; for a ball at the top of a ramp, it might be coming to rest at the bottom.