First, RL and causal inference do fundamentally different things. RL is trying t...

Eridrus · on May 25, 2018

Not an RL expert, but Model-Based RL is a thing, where you try to train a model of how actions affect the world, and then use that model to choose/influence your actions.

But I don't think it's true that we always need a model, or at least I don't necessarily think we always need a human understandable model.

Your doctor example is weird to me tbh. A non-causal ML approach would seek to determine whether a patient has a disease based on some symptoms, and then send them to a doctor based on those results, sidestepping the need for causal models.

To rephrase it in a way that makes a bit more sense to me is: let's assume we want to know if a specific procedure would be good for a patient (basically the same example). With a non-causal approach we would want to predict whether a patient would have a better outcome from doing a procedure than not.

A natural way to solve this (to me) would be to build one model that estimates the probability of various outcomes from the procedure, and one that estimates the probability of various outcomes from not undergoing the procedure.

Or if you're working in the world of Neural Nets/Deep RL, have a model that takes all the non-intervention data as input and outputs the expected outcomes from the procedure and the expected outcomes from not doing the procedure, and when you train it, you only supervise the outcomes that you had data for.

This ignores the Bayesian/Distributional Shift issue, but I don't think the do calculus has a real answer to that either.

I would be interested in knowing if this ad-hoc modelling approach is any different to the causal modelling the Pearl is arguing for, or if Causal modelling is more necessary when you have more complicated causal relationships than a single intervention.

Darmani · on May 25, 2018

> A non-causal ML approach would seek to determine whether a patient has a disease based on some symptoms, and then send them to a doctor based on those results, sidestepping the need for causal models.

I saw an ML presentation a few months ago, on training a decision tree to do the same thing as a neural net, so we can understand what the neural net was doing.

They used this on a neural net trying to diagnose people with diabetes. It showed that having any other diagnosis would increase its probability of diagnosing them with diabetes. Why? Because it meant they're more likely to have gone to the doctor to get diagnosed. (Along with detecting general health indicators that weren't screened out.)

You can try to partition your data into intervention/non-intervention, or do something else to try to stop your model from detecting spurious correlations. Causal models makes this more formal and tells you which things you should include/exclude, gives you formulas for adjusting them out, and how much bias you introduce by failing to do so.

The theory of causal inference is also immune to distributional shift, and serves as a nice guidance for what actual systems should do (usually: failing to return an answer).

(Yes, I've fully drunk the Pearl Kool-Ade.)

Eridrus · on May 25, 2018

Thanks for the example, it does motivate it a bit better when there are more complicated (but still relatively simple) causal relationships.

thanatropism · on May 25, 2018

I think what offpolicy was trying to clumsily say is that policy evaluation (I come from the economic policy econometrics world originally) can be used for RL.

Maybe it can, but isn't Bayesian stuff really costly most of the time?

sjg007 · on May 25, 2018

How? AFAIK policy analysis is based on asking causal and counterfactual questions, primarily by trying to find a quasi-control-population that can be a proxy for the intervention and then regressing against it vs the observational data. I forget the name specifically. Causal models would represent this explicitly and you could reason about the model to see if your economics question is well posed. Nobody does this now but it is important to do because it lays out the assumptions which can't be hidden behind politics.

RL is for training system parameters based on positive or negative reinforcement from a critic. RL is based on a Markov decision process. RL has policy search idea but that is separate from economic policy evaluation.