Hacker News new | past | comments | ask | show | jobs | submit login

I am at loss at what you want to say to me, but let me reiterate:

Any learning model by itself is a statistical model. Statistical models are never automatically causal models, albeit causal models are statistical models.

Several causal models can be observationally equivalent to a single statistical model, but the substantive (inferential) implications on doing "an intervention" on the DGP differ.

It is therefore not enough to validate and run on a model on data. Several causal models WILL validate on the same data, but their implications are drastically different. The data ALONE provides you no way to differentiate (we say, identify) the correct causal model without any further restrictions.

By extension, it is impossible for any ML mechanism to predict unobserved interventions without being a causal model.

ML and AI models CAN be causal models, which is the case if they are based on further assumptions about the DGP. For example, they may be graphical models, SCM/SEM etc. These restrictions can be derived algorithmically, based on all sorts of data, tuning, coding and whatever. It really doesn't change the distinction between causal and statistical analysis.

The way these models become causal is based on assumptions that constitute a theory in the scientific sense. These theories can then of course also be validated. But this is not based on learning from historical data alone. You always have to impose sufficient restrictions on your model (e.g. the DGP) to make such causal inference.

This is not new, but for your benefit, I basically transferred the above from an AI/ML book on causal analysis.

AI/ML can do causal analysis, because its statistics. AI/ML are not separate from these issues, do not solve these issues ex-ante, are not "better" than other techniques except on the dimensions that they are better as statistical techniques, AND, most importantly, causal application necessarily implies a theory.

Whether this is implicit or explicit is up to the researcher, but there are dangers associated with implicit causal reasoning.

And as Pearl wrote (who is not a fan of econometrics by any means!), the issue of causal inference was FIRST raised by econometricians BASED on combining the structure of economic models with statistical inference. In the 1940's.

I mean I get the appeal to trash talk social sciences, but when it comes to causal inference, you probably picked exactly the wrong one.

You are free to disregard economic theory. But you can not claim to do causal analysis without any theory. Doing so implicitly is dangerous. Furthermore, you are wrong in the sense that economic theory has put causal inference issues at the forefront of econometric research, and is therefore good for science even if you dislike those theories.




And by the way, I can come up with a good number of (drastic, hypothetical) policy interventions that would break your inference about a market crash - an inference you only were able to make once you saw such a market crash at least once.

If this dependence is broken, your non-causal model will no longer work, because the relationship between yield curve and market crash is not a physical constant fact. What you did to make it a causal inference is implicitly assume a theory about how markets work (e.g. - as they do right now -) and that it will stay this way. Actually, you did a lot more, but that's enough.

Now, you and me, we can both agree that your model with yield curves is good enough. We could even agree that you would have found it before the financial crashes, and are a billionaire. But the commonality we agree upon is a context that defines a theory.

Some alien that has been analyzing financial systems all across the universe may disagree, saying that your statistical model is in fact highly sensitive to Earth's political, societal and natural context.

Such is the difficulty of causal analysis.


> By extension, it is impossible for any ML mechanism to predict unobserved interventions without being a causal model.

In lieu of a causal model, when I ask an economist what they think is going to happen and they aren't aware of any historical data - there is no observational data collected following the given combination of variables we'd call an event or an intervention - is it causal inference that they're doing in their head? (With their NN)

> Now, you and me, we can both agree that your model with yield curves is good enough.

Yield curves alone are insufficient due to the rate of false positives. (See: ROC curves for model evalutation just like everyone else)

> We could even agree that you would have found it before the financial crashes,

The given signal was disregarded as a false positive by the appointed individuals at the time; why?

> Some alien that has been analyzing financial systems all across the universe may disagree,

You're going to run out of clean water and energy, and people will be willing to pay for unhealthy sugar water and energy-inefficient transaction networks with a perception of greater security.

That we need Martian scientist as an approach is, IMHO, necessary because of our learned biases; where we've inferred relations that have been reinforced which cloud our assessment of new and novel solutions.

> Such is the difficulty of causal analysis.

What a helpful discussion. Thanks for explaining all of this to me.

Now, I need to go write my own definitions for counterfactual and DGP and include graphical models in there somewhere.


A further hint, here is a great book about causal analysis from a ML/AI perspective

https://mitpress.mit.edu/books/elements-causal-inference

I feel like you will benefit from reading this!

It's free!


> In lieu of a causal model, when I ask an economist what they think is going to happen and they aren't aware of any historical data - there is no observational data collected following the given combination of variables we'd call an event or an intervention - is it causal inference that they're doing in their head? (With their NN)

It's up for debate if NN's represent what is going on in our heads. But let's for a moment assume it is so.

Then indeed, an economist leverages a big set of data and assumptions about causal connections to speculate how this intervention would change the DGP (the modules in the causal model) and therefore how the result would change.

An AI could potentially do the same (if that is really what we humans do), but so far, we certainly lack the ability to program such a general AI. The reason is, in part, because we have difficulty creating causal AI models even for specialized problems. In that sense, humans are much more sophisticated right now.

It is important to note that such a hypothetical AI would create a theory, based on all sorts of data, analogies, prior research and so forth, just like economists do.

It does not really matter if a scientist, or an AI, does the theorizing. The distinction is between causal and non-causal analysis.

The value of formal theory is to lay down assumptions and tautological statements that leave no doubt about what the theory is. If we see that the theory is wrong, because we disagree on the assumptions, this is actually very good and speaks for the theory. Lot's of social sciences is plagued by "general theories" that can never really shown to be false ex ante. And given that theories can never be empirically "proven", only validated in the statistical sense, this leads to a many parallel theories of doubtful value. Take a gander into sociology if you want to see this in action.

Secondly, and this is very important, is that we learn from models. This is not often recognized. What we learn from writing down models is how mechanics or modules interact. These interactions, highly logical, are USUALLY much less doubtful than the prior assumptions. For example, if price and revenues are equilibrium phenomena, we LEARN from the model that we CAN NOT estimate them with a standard regression model!

This is exactly what lead to causal analysis in this case, because earlier we would literally regress price on quantity or production on price etc. and be happy about it. But the results were often even in the entirely wrong direction!

Instead, looking at the theory, we understood the mechanical intricacies of the process we supposedly modeled, and saw that we estimated something completely different than what we interpreted. Causal analysis, among other things, tackles this issue by asking "what it is really that we estimate here?".




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: