Economists and social scientists try to do non-experimental causal inference. Maybe they're not good at it, maybe the very problem is unsolvable, but it's not because they don't know how Random Forests or RNNs work. Economists already know that students from single parent families do worse at school than from married families. If the problem is just to predict individual student results, number of parents in the household is certainly a good predictor. The problem facing economists is, would encouraging marriage or discouraging diveroce improve student results? Nothing in PyTorch or Tensorflow will help with the answer..
Medicine (and also social sciences) is indeed more complex; but classification and prediction are still the basis for making treatment recommendations, for example.
Still, the task really is the same. A NN (like those that Torch, Theano, TensorFlow, and PyTorch produce; now with the ONNX standard for neural network model interchange) learns complex relations and really doesn't care about causality: minimize the error term. Recent progress in reducing the size of NN models e.g. for offline natural language classification on mobile devices has centered around identifying redundant neuronal connections ("from 100GB to just 0.5GB"). Reversing a NN into a far less complex symbolic model (with variable names) is not a new objective. NNs are being applied for feature selection, XGBoost wins many Kaggle competitions, and combinations thereof appear to be promising.
Actually testing second-order effects of evidence-based economic policy recommendations is certainly a complex highly-multivariate task (with unfortunate ideological digression that presumes a higher-order understanding based upon seeming truisms that are not at all validated given, in many instances, any data). A causal model may not be necessary or even reasonably explainable; and what objective dependent variables should we optimize for? Short term growth or long-term prosperity with environmental sustainability?
... "Please highly weight voluntary sustainability reporting metrics along with fundamentals" when making investments and policy decisions?
Were/are the World3 models causal? Many of their predictions have subsequently been validated. Are those policy recommendations (e.g. in "The Limits to Growth") even more applicable today, or do we need to add more labeled data and "Restart and Run All"?
From https://research.stlouisfed.org/useraccount/fredcast/faq/ :
> FREDcast™ is an interactive forecasting game in which players make forecasts for four economic releases: GDP, inflation, employment, and unemployment. All forecasts are for the current month—or current quarter in the case of GDP. Forecasts must be submitted by the 20th of the current month. For real GDP growth, players submit a forecast for current-quarter GDP each month during the current quarter. Forecasts for each of the four variables are scored for accuracy, and a total monthly score is obtained from these scores. Scores for each monthly forecast are based on the magnitude of the forecast error. These monthly scores are weighted over time and accumulated to give an overall performance.
> Higher scores reflect greater accuracy over time. Past months' performances are downweighted so that more-recent performance plays a larger part in the scoring.
The #GobalGoals Targets and Indicators may be our best set of variables to optimize for from 2015 through 2030; I suppose all of them are economic.
The issue is the following: In economics, one is interested in an underlying parameter of a complex equilibrium system (or, if you wish, a non-equilibrium complex system of multi-agentic behavior).
This may be, for example, some pricing parameter for a given firm - say - how your sold units react to setting a price.
Economics faces two basic issues:
First, any predictive model (like a NN or simple regression) that takes price as an input, will not correctly estimate the sensitivity of revenue to price. It is actually usually the case, that the inference is reversed.
A model where price is input, and sold units or revenue is output (or vice-versa) will predict (you can check that using pretty much any dataset of prices and outputs) that higher prices lead to higher outputs, because that is the association in the data.
Of course we know that in truth, prices and outputs are co-determined. They are simultaneous phenomena, and regressing one on the other is not sufficient to "causally identify" the correct effect.
This is independent of how sophisticated your model is otherwise. Fitting a better non-linear representation does not help.
The solution is of course to reduce down these "endogenous" phenomena to their basic ingredients. Say you have cost data, and some demand parameters. Then, using a regression model (or NN) to predict the vector of endogenous outcome variables will work, and roughly give you the right inference.
Then, as a firm, you are able to use these (more) exogenous predictive variables to find your correct pricing.
This is not new, pops up everywhere in social science, is the basis of a gigantic literature called econometrics, and really has nothing to do with how you do the prediction.
The only thing that NN add are better predictions (better fitting) and the ability to deal with more data. As this inferential problem shows, using more (and more fine-grained) data is indeed crucial to predicting what a firm should do.
BUT, it is crucial to understand and reason about the underlying causality FIRST, because otherwise even the most sophisticated statistical approach will simply give you wrong results.
Secondly, the counterfactual data for economic issues is usually very scarce. The approach taken by machine learning is problematic, not only because of potentially wrong inference, but also because two points in time may simply not be based on comparable data-generating processes.
In fact, this is exactly the blindness that led to people missing the financial crisis. Of course, with enough data, and long enough samples, one should expect to be become pretty good at predicting economic outcomes. But experience has shown that in economics, these data are simply too scarce. The unobserved variation between two quarters, two years, two countries, two firms (etc.) is simply very large and has fat tails. This leads to spontaneous breakdowns of such predicitive models.
Taking these two issues together, we see that better non-linear function approximation is not the solution to our problems. Instead, it is a methodological improvement that must be used in conjunction with what we have learned about causality.
Indeed the literature moves into a different direction. Good economic science nowadays means to identify effects via natural experiments and other exogenous shifts that can plausibly show causality.
Of course such experiments are more rare, and more difficult, the larger the scale becomes. Which is why Macroeconomics is arguably the "worst science" in economics, while things like auctions and microstructure of markets are actually surprisingly good science (nowadays).
Doors are wide open for ML techniques, but really only to the point that they are useful in operationalizing more and better data.
Anyone trying to understand economic phenomena needs to be keenly aware of how inference can be done, which requires an understanding (or an approach to) - that is, a theory - of the underlying mechanisms.
Whether it is subsidies of farmers, education, tax reduction, minimum wage, austerity measures... history is full of deliciously wrong predictions and policy measures.
Almost all of them can be reduced to the simple fact that the DPG is not stable when varying the policy, and that simple fact is due to people being deliberatively reactive.
In other words, you are missing data. Data about human behavior that is simply not observed, because it didn't happen, or because it happens inside people! And then, no matter how well you fit your conditional expectation (or other moment, or whatever you fit), the errors are simply not predictable.
We miss the counterfactual data, AND we aren't even smart enough to use all the data that we have. The less we theorize, the less we use prior logic, the more we run into these paradoxes where our policy does the exact opposite of what we intended.
This is pretty much the only real constant you can find in the last 100 years of social science.
It is therefore entirely correct that social science focuses more and more on causality - and where it can be identified. Yes, it is much harder, and the opportunities to do it correctly are scarce, but necessary. In this, trusting in more data and AI is precisely the wrong approach.
> In fact, this is exactly the blindness that led to people missing the financial crisis
ML was not necessary to recognize the yield curve inversion as a strongly predictive signal correlating to subsequent contraction.
An NN can certainly learn to predict according to the presence or magnitude of a yield curve inversion and which combinations of other features.
- [ ] Exercise: Learning this and other predictive signals by cherry-picking data and hand-optimizing features may be an extremely appropriate exercise.
"This field is different because it's nonlinear, very complex, there are unquantified and/or uncollected human factors, and temporal"
Maybe we're not in agreement about whether AI and ML can do causal inference just as well if not better than humans manipulating symbols with human cognition and physical world intuition. The time is nigh!
In general, while skepticism and caution are appropriate, many fields suffer from a degree of hubris which prevents them from truly embracing stronger AI in their problem domain. (A human person cannot mutate symbol trees and validate with shuffled and split test data all night long)
> Anyone trying to understand economic phenomena needs to be keenly aware of how inference can be done, which requires an understanding (or an approach to) - that is, a theory - of the underlying mechanisms.
I read this as "must be biased by the literature and willing to disregard an unacceptable error term"; but also caution against rationalizing blind findings which can easily be rationalized as logical due to any number of cognitive biases.
Compared to AI, we're not too rigorous about inductive or deductive inference; we simply store generalizations about human behavior and predict according to syntheses of activations in our human NNs.
If you're suggesting that the information theory that underlies AI and ML is insufficient to learn what we humans have learned in a few hundred years of observing and attempting to optimize, I must disagree (regardless of the hardness or softness of the given complex field). Beyond a few combinations/scenarios, our puny little brains are no match for our department's new willing AI scientist.
> An NN can certainly learn to predict according to the presence or magnitude of a yield curve inversion and which combinations of other features.
> - [ ] Exercise: Learning this and other predictive signals by cherry-picking data and hand-optimizing features may be an extremely appropriate exercise.
If the financial crisis has not yet occurred, how will the NN learn a relationship that does not exist in the data?
The exercise of cherry picking data and hand-optimizing is equivalent to applying theory to your statistical problem. It is what is required if you lack data points - using ML or otherwise. Nevertheless, we (as in humans) are bad at it.
Speaking of the financial crisis. It was not AI's that picked up on it, it was some guys applying sophisticated and deep understanding of causal relationships. And that so few people did this, shows how bad we humans are at doing this implicitly and automatically by just looking at data!
> Maybe we're not in agreement about whether AI and ML can do causal inference just as well if not better than humans manipulating symbols with human cognition and physical world intuition. The time is nigh!
In general, while skepticism and caution are appropriate, many fields suffer from a degree of hubris which prevents them from truly embracing stronger AI in their problem domain. (A human person cannot mutate symbol trees and validate with shuffled and split test data all night long)
ML and AI certainly can do causal inference. But then you have to do causal inference.
Again, prediction on historical data is not equivalent to causal analysis, and neither is backtesting or validation. At the end of the day, AI and ML improves on predictions, but the distinction of causal analysis is a qualitative one.
> I read this as "must be biased by the literature and willing to disregard an unacceptable error term"; but also caution against rationalizing blind findings which can easily be rationalized as logical due to any number of cognitive biases.
No. My point is that for causal analysis, you have to leverage assumptions that are beyond your data set. Where these come from is besides the point. You will always employ a theory, implicitly or explicitly.
The major issue is not the we use theories, but rather that we might do it implicitly, hiding the assumptions about the DGP that allows causal inference. This is where humans are bad. Theories are just theories. With precise assumptions giving us causal identification, we are in a good position to argue where we stand.
If we just run algorithms without really understand what is going on, we are just repeating the mistakes from the last forty years!
> If you're suggesting that the information theory that underlies AI and ML is insufficient to learn what we humans have learned in a few hundred years of observing and attempting to optimize, I must disagree (regardless of the hardness or softness of the given complex field). Beyond a few combinations/scenarios, our puny little brains are no match for our department's new willing AI scientist.
All the information theory I have seen in any of the Machine Learning textbooks I have picked up is methodologically equivalent to statistics.
In particular, the standard textbooks (Elements, Murhpy etc.) treatment of information theory would only allow causal identification under the exact same conditions that the statistics literature treats.
I fail to see the difference, or what AI in particular adds. The issue of causal inference is a "hot topic" in many fields, including AI, but the underlying philosophical issues are not exactly new. This includes information theory.
You seem to think that ML has somehow solved this problem. From my reading of these books, I certainly disagree. Causal inference is certainly POSSIBLE - just as in statistics, but ML doesn't give it to you for free!
In particular, note the following issue: To show causal identification, you need to make assumptions on your DGP (exogenous variation, timing, graphical causal relations ... whatever). Even if these assumptions are very implicit, they do exist.
Just by looking at data, and running a model, you do not get causal inference. It can not be done "within" the system/model.
If you bake these things into your AI, then it, too makes these assumptions. There really is no difference. For example, I could imagine an AI that can identify likely exogenous variations in the data and use them to predict counterfactuals. That's probably not too far off, if it doesn't exist already. But, this is still based on the assumption that these variations are, indeed exogenous, which can never be proven within the DGP.
In contrast, I find that most "AI scientists" care very much about prediction, and very little about causal inference. I don't mean this subfield doesn't exist. But it is a subfield. In contrast, for many non-AI scientists, causal inference IS the fundamental question, and prediction is only an afterthought. ML in practice involved doing correct experiments (AB testing), at best. It will sooner or later also adopt all other causal inference techniques. But, my point stands, I have yet to see what ML adds.
AI, ML and stats will merge, if they haven't already. The distinction will disappear. I believe the issues will not. I employ a lot of AI/ML techniques in my scientific work. Never have they solved the underlying issue of causal inference for me!
All tools are misapplied; including economics professionals and their advice.
Here's a beautiful Venn diagram of "Colliding Web Sciences" which includes economics as a partially independent category: https://www.google.com/search?q=colliding+web+sciences&tbm=i...
Why are theoretic models hand-wavy? "That's just because noise, the model is correct." No, such a model is insufficient to predict changes in dependent variables when in the presence of noise; which is always the case. How does validating a causal model differ from validating a predictive model with historical and future data?
Yield-curve inversion as a signal can be learned by human and artificial NNs. Period. There are a few false positives in historical data: indeed, describe the variance due to "noise" by searching for additional causal and correlative relations in additional datasets.
I searched for "python causal inference" and found a few resources on the first page of search results:
CausalImpact (Python port of the R package):
"What is the best Python package for causal inference?"
Search: graphical model "information theory" [causal]
Search: opencog causal inference
https://www.google.com/search?q=opencog+causal+inference (MOSES, PLN,)
If you were to write a pseudocode algorithm for an econometric researcher's process of causal inference (and also their cognitive processes (as executed in a NN with a topology)), how would that read?
(Edit) Something about the sufficiency of RL (Reinforcement Learning) for controlling cybernetic systems.
Google's CausalImpact model, despite having been featured on Google's AI blog, is a statistical/econometric model (essentially the same as https://www.jstor.org/stable/2981553). It leaves it up to the user to find and designate a set of control variables, which has to be designated by the user to be unaffected by the treatment. This is not done algorithmically, and has very little to do with RNNs, Random Forests or regression regularization.
> If you were to write a pseudocode algorithm for an econometric researcher's process of causal inference (and also their cognitive processes (as executed in a NN with a topology)), how would that read?
 Set up a proper RCT, that is randomly assign the treatment to different subjects
 Calculate the outcome diffences between the treated and untreated
For A/B testing your website, the work division between  and  might be 50-50, or at least at similar order of magnitudes.
For the questions that academic economists wrstle with, say, estimate the effect of increasing school funding / decreasing class size, the effect of shifts between tax deductions vs tax credits vs changing tax rates or bands, or of the different outcome on GDP growth and unemployment of monetary vs fiscal expansion  would be 99.9999% of the work, or completely impossible.
Faced with the impracticallity/impossiblility of proper experiments, academic micro-economists have typically resorted to Instrumental Variable regressions. AFAICT finding (or rather, convincing the audience that you have) a proper instrument is not very amendable to automation or data mining.
In academic macro-economics (and hence at Serious Institutions such as central banks and the IMF), the most popular approaches to building causal models in the last 3 or 4 decades have probably been
1) making a bunch of unrealistic assumpsions of the behaviour individual agents (microfoundations/DSGE models)
2) making a bunch of uninterpretable and unverifyable technical assumptions on the parameters in a generic dynamic stochastic vector process fitted to macro-aggregates (Structural VAR with "identifying restrictions")
3) manually grouping different events in different countries from different periods in history as "similar enough" to support your pet theory: lowering interest rates can lead to a) high inflation, high unemployment (USA 1970s), b) high inflation, low unemployment (Japan 1970s), b) low inflation, high unemployment (EU 2010s) c) low inflation, low unemployment (USA, Japan past 2010s)
I really don't see how a RL would help with any of this. Care to come up with something concrete?
Here are some tools for causal inference (and a process for finding projects to contribute to instead of arguing about insufficiency of AI/ML for our very special problem domain here). At least one AGI implementation doesn't need to do causal inference in order to predict the outcomes of actions in a noisy field.
Weather forecasting models don't / don't need to do causal inference.
> A/B testing
Is multi-armed bandit feasible for the domain? Or, in practice, are there too many concurrent changes in variables to have any sort of a controlled experiment. Then, aren't you trying to do causal inference with mostly observational data.
> I really don't see how a RL would help with any of this. Care to come up with something concrete?
The practice of developing models and continuing on with them when they seem to fit and citations or impact reinforce is very much entirely an exercise in RL. This is a control system with a feedback loop. A "Cybernetic system". It's not unique. It's not too hard for symbolic or neural AI/ML. Stronger AI can or could do [causal] inference.
Any learning model by itself is a statistical model. Statistical models are never automatically causal models, albeit causal models are statistical models.
Several causal models can be observationally equivalent to a single statistical model, but the substantive (inferential) implications on doing "an intervention" on the DGP differ.
It is therefore not enough to validate and run on a model on data. Several causal models WILL validate on the same data, but their implications are drastically different. The data ALONE provides you no way to differentiate (we say, identify) the correct causal model without any further restrictions.
By extension, it is impossible for any ML mechanism to predict unobserved interventions without being a causal model.
ML and AI models CAN be causal models, which is the case if they are based on further assumptions about the DGP. For example, they may be graphical models, SCM/SEM etc. These restrictions can be derived algorithmically, based on all sorts of data, tuning, coding and whatever. It really doesn't change the distinction between causal and statistical analysis.
The way these models become causal is based on assumptions that constitute a theory in the scientific sense. These theories can then of course also be validated. But this is not based on learning from historical data alone. You always have to impose sufficient restrictions on your model (e.g. the DGP) to make such causal inference.
This is not new, but for your benefit, I basically transferred the above from an AI/ML book on causal analysis.
AI/ML can do causal analysis, because its statistics. AI/ML are not separate from these issues, do not solve these issues ex-ante, are not "better" than other techniques except on the dimensions that they are better as statistical techniques, AND, most importantly, causal application necessarily implies a theory.
Whether this is implicit or explicit is up to the researcher, but there are dangers associated with implicit causal reasoning.
And as Pearl wrote (who is not a fan of econometrics by any means!), the issue of causal inference was FIRST raised by econometricians BASED on combining the structure of economic models with statistical inference. In the 1940's.
I mean I get the appeal to trash talk social sciences, but when it comes to causal inference, you probably picked exactly the wrong one.
You are free to disregard economic theory. But you can not claim to do causal analysis without any theory. Doing so implicitly is dangerous.
Furthermore, you are wrong in the sense that economic theory has put causal inference issues at the forefront of econometric research, and is therefore good for science even if you dislike those theories.
If this dependence is broken, your non-causal model will no longer work, because the relationship between yield curve and market crash is not a physical constant fact. What you did to make it a causal inference is implicitly assume a theory about how markets work (e.g. - as they do right now -) and that it will stay this way.
Actually, you did a lot more, but that's enough.
Now, you and me, we can both agree that your model with yield curves is good enough. We could even agree that you would have found it before the financial crashes, and are a billionaire.
But the commonality we agree upon is a context that defines a theory.
Some alien that has been analyzing financial systems all across the universe may disagree, saying that your statistical model is in fact highly sensitive to Earth's political, societal and natural context.
Such is the difficulty of causal analysis.
In lieu of a causal model, when I ask an economist what they think is going to happen and they aren't aware of any historical data - there is no observational data collected following the given combination of variables we'd call an event or an intervention - is it causal inference that they're doing in their head? (With their NN)
> Now, you and me, we can both agree that your model with yield curves is good enough.
Yield curves alone are insufficient due to the rate of false positives. (See: ROC curves for model evalutation just like everyone else)
> We could even agree that you would have found it before the financial crashes,
The given signal was disregarded as a false positive by the appointed individuals at the time; why?
> Some alien that has been analyzing financial systems all across the universe may disagree,
You're going to run out of clean water and energy, and people will be willing to pay for unhealthy sugar water and energy-inefficient transaction networks with a perception of greater security.
That we need Martian scientist as an approach is, IMHO, necessary because of our learned biases; where we've inferred relations that have been reinforced which cloud our assessment of new and novel solutions.
> Such is the difficulty of causal analysis.
What a helpful discussion. Thanks for explaining all of this to me.
Now, I need to go write my own definitions for counterfactual and DGP and include graphical models in there somewhere.
I feel like you will benefit from reading this!
It's up for debate if NN's represent what is going on in our heads. But let's for a moment assume it is so.
Then indeed, an economist leverages a big set of data and assumptions about causal connections to speculate how this intervention would change the DGP (the modules in the causal model) and therefore how the result would change.
An AI could potentially do the same (if that is really what we humans do), but so far, we certainly lack the ability to program such a general AI. The reason is, in part, because we have difficulty creating causal AI models even for specialized problems. In that sense, humans are much more sophisticated right now.
It is important to note that such a hypothetical AI would create a theory, based on all sorts of data, analogies, prior research and so forth, just like economists do.
It does not really matter if a scientist, or an AI, does the theorizing. The distinction is between causal and non-causal analysis.
The value of formal theory is to lay down assumptions and tautological statements that leave no doubt about what the theory is.
If we see that the theory is wrong, because we disagree on the assumptions, this is actually very good and speaks for the theory. Lot's of social sciences is plagued by "general theories" that can never really shown to be false ex ante. And given that theories can never be empirically "proven", only validated in the statistical sense, this leads to a many parallel theories of doubtful value.
Take a gander into sociology if you want to see this in action.
Secondly, and this is very important, is that we learn from models. This is not often recognized. What we learn from writing down models is how mechanics or modules interact. These interactions, highly logical, are USUALLY much less doubtful than the prior assumptions.
For example, if price and revenues are equilibrium phenomena, we LEARN from the model that we CAN NOT estimate them with a standard regression model!
This is exactly what lead to causal analysis in this case, because earlier we would literally regress price on quantity or production on price etc. and be happy about it. But the results were often even in the entirely wrong direction!
Instead, looking at the theory, we understood the mechanical intricacies of the process we supposedly modeled, and saw that we estimated something completely different than what we interpreted.
Causal analysis, among other things, tackles this issue by asking "what it is really that we estimate here?".