Some of the data we have isn't training data. Purely data-driven models tend to be ensnared by Goodhart's law.
For example, suppose we're issuing 30-year term loans and we have some data that shows that people with things like country club memberships and foie gras on their credit card statements have a higher tendency not to miss payments. So we use that information to make our determination.
But people are aware we're doing this and the same data is externally available, so now people start to waste resources on extravagant luxuries in order to qualify for a loan or a low interest rate, and that only makes it more likely that they ultimately default. However, that consequence doesn't become part of the data set until years have passed and the defaults actually occur, and in the meantime we're using flawed reasoning to issue loans. When we finally figure that out after ten years, the new data we use then will have some fresh different criteria for people to game, because the data is always from the past rather than the future.
We've already seen the kind of damage this can do. Politicians see data that college-educated people are better off so subsidize college loans, only to discover that the signal from having a degree that caused it to result in such gainful employment is diluted as it becomes more common, and subsidizing loans results in price inflation, and making a degree a prerequisite for jobs that shouldn't require it creates incentives for degree mills that pump out credentials but not real education.
To get out of this we have to consider not only what people have done in the past but how they are likely to respond to a given policy change, for which we have no historical data prior to when the policy is enacted, and so we need to make those predictions based on logic in addition to data or we go astray.
"Pete, it's a fool who looks for logic in the chambers of the human heart."
Logically, we might have said "prohibition will reduce substance abuse harms" but the actual data indicates that margins increased. Then, we look at the success of Portugal's decriminalization efforts and cannot at all validate our logical models.
Similarly, we might've logically claimed that "deregulation of the financial industry will help everyone" or "lowering taxes will help everyone" and the data does not support.
So, while I share the concerns about Responsible AI and encoding biases (and second-order effects of making policy recommendations according to non-causal models without critically, logically thinking first) I am very skeptical about our ability to deduce causal relations without e.g. blind, randomized, longitudinal, interventional studies (which are unfortunately basically impossible to do with [economic] policy because there is no "ceteris paribus")
"Causal Inference Book" https://news.ycombinator.com/item?id=17504366
> Causal inference (Causal reasoning) https://en.wikipedia.org/wiki/Causal_inference ( https://en.wikipedia.org/wiki/Causal_reasoning )
This is also a strong argument for "laboratories of democracy" and local control -- if everybody agrees what to do then there is no dispute, but if they don't then let each local region have their own choice, and then we get to see what happens. It allows more experiments to be run at once. Then in the worst case the damage of doing the wrong thing is limited to a smaller area than having the same wrong policy be set nationally or internationally, and in the best case different choices are good in different ways and we get more local diversity.
Maybe we're at a local optima, though. Maybe this is a sign that we should just double down, surge on in there and get the job done by continuing to do the same thing and expecting different results. Maybe it's not the spec but the implementation.
Recommend a play according to all available data, and logic.
> This is also a strong argument for "laboratories of democracy" and local control -- if everybody agrees what to do then there is no dispute, but if they don't then let each local region have their own choice, and then we get to see what happens. It allows more experiments to be run at once. Then in the worst case the damage of doing the wrong thing is limited to a smaller area than having the same wrong policy be set nationally or internationally, and in the best case different choices are good in different ways and we get more local diversity.
"Adjusting for other factors," the analysis began.
- [ ] Exercise / procedure to be coded: Brainstorm and identify [non-independent] features that may create a more predictive model (a model with a lower error term). Search for confounding variables outside of the given data.