>We often hear that AI systems must provide explanations and establish causal relationships, particularly for life-critical applications.
>Yes, that can be useful. Or at least reassuring.
>But sometimes people have accurate models of a phenomenon without any intuitive explanation or causation that provides an accurate picture of the situation.
It goes on to argue mostly against the need for intuitive explanations, not the establishing of causal relationships.
>Now, if there ever was a life-critical physical phenomenon, it is lift production by an airliner wing.
>But we don't actually have a "causal" explanation for it, though we do have an accurate mathematical model and decades of experimental evidence.
The physical models we have are causal ones. The intuitive abstractions like bernoulli's principle may not work, but analysis based on navier stokes sure does. You plug your shape (cause) into the equations and see what forces (effect) occur. That's causation.
>You know what other life-critical phenomena we don't have good causal explanations for?
> The mechanism of action of many drugs (if not most of them).
Using an industry that's nearly synonymous with a randomized controlled trial as a refutation for the need of a causal relationship is crazy talk. The mechanism may be missing, but the causal explanation is that via a series of RCTs it's established that the drug causes the effects.
I get that half of this is trying to go against a percieved need for intuitive explanations, but it weirdly lumps causation in there.
"How much does this input seem to confuse this output? What is the pattern across inputs for how this model is systematically confused?"
Causality -> counterfactuals
"How would the outcome be different if x was different? If I acted differently, would I get a more favorable outcome?"
You're right to say these are two different things. They are.
And they're different still from interpretability, i.e., "What are the explicit patterns that this model is seeking in the data?"
DL practitioners routinely mix up explainability and interpretability but I would never in a million years have seen LeCun be so intellectually dishonest as to lump causality in there with them.
And then fails to acknowledge that we don't even have intuitive explanations for a lot of AI models! And for a layperson, it's just a complete black box; for cases where we do have intuitive explanations, I'm not sure experts are at all effective at translating them into something a layperson can understand.
I'm not a physicist or aeronautical engineer, but I can grok intuitive explanations for how airplanes work without much trouble.
The physical laws are correlational, not causal. F=ma doesn't tell you if acceleration causes force or vice versa. Causation is a humanly imposed concept . That's why we need to invent things like the Chronology protection conjecture.
If our physical laws couldn't predict what effect a given cause has, we'd say they're wrong. Like if I couldn't swap out one mass on a spring (in an known scenario with an ideal spring blah blah) for another mass (cause) and predict how much lower the new mass will hang (effect), we'd say we don't understand spring physics.
There are unresolved foundational questions like those discussed in your link, but those aren't practically relevant, just like missing foundations of set theory wouldn't prevent us from balancing checkbooks. There's some notion of arithmetic I'm using for my checkbook, just like there's some notion of causation when I apply a 10N force (cause) to my 1kg mass and get 1 m/s of acceleration (effect). Foundationally formalizing it is neat, but unnecessary.
 Pretty much all of the rest of the time, they're still compatible with this definition.
In ANNs OTOH, even though the systems can be very deterministic, there is very little to make in terms of hypotheses. An explanation of the sort "you have cancer because neurons 10, 18, and 19 fired" is not satisfactory enough to pass the human test. It may be that for some complicated problems, searching for patterns in the neurons in order to explain them may prove to be futile. Not that people should give up on that, but not everything has a neat closed form explanation. Lecun mentioned above that you may have recurrent relationships (which also occur in quantum systems), and these muddy the waters a lot, making it difficult to establish cause and effect. It is also a major pain in neuroscience, when real neurons are seen as an evolving dynamical system.
Take something more dynamic, like a ball being thrown on a level surface (uniform gravity, ideal vacuum, etc). If I throw it with some force, it'll follow a perfect symmetric parabola and land with that exact same force. The force it lands with (y) is exactly identical to the force I threw it with (x). It's basically time symmetry, but simpler. The equation here is the symmetric y = x, so it won't help you define causation. But clearly the usual version of causation says that my cause is x and the effect is y.
Maybe where we differ is that you say it's "the causal model is assumed, then observations are made and a law is formed which can be used to make predictions," while I'd describe it as "interventions are made, then observations are made and a law is formed which can be used to make predictions (and the law itself doesn't depend on the intervention taken)." That's what makes the y = x scenario not symmetric. X is my intervention.
the causality is imposed post-hoc depending on your application, but it's not explained by the law itself. alternatively, causality is a prerequisite in order to have physical laws at all.
Sure, but that doesn't mean we should be okay with not understanding why those neurons fired.
That is: if an ANN can reliably figure out if someone has cancer based on various inputs, then it should be possible to isolate the ANN's rationale for that determination. That probably ain't an easy task by any means, but unless ANNs are magic spells not subject to our physical laws, it is possible nonetheless.
You mean if I extend the spring the weight stretching it will increase ?
So, which was the cause? You cannot tell. You simply can do the math and see that it matches the experiment. Only later do humans try to call it cause and effect, because the math and the experiment match so often and so well.
But this all can change if we find an experiment breaking the models.
Before relativity, most people thought time was constant. They were wrong.
So the above poster is correct in what we usually call cause and effect to simply be stronger correlations that match (current) models.
That can all be changed in future understanding of Nature.
>For any well understood physical situation describing X and Y, the laws of physics tell you what happens to Y when you do X.
Not true. If I create two electrons from photon collision, and measure one, will it be spin up? No way to tell. Only aggregates about large enough systems give any reliable answer to such questions.
If look for radioactive decay from 10 U235 atoms for the next 5 minutes, will I see one? Again, there is no yes/no answer, only probabilistic ones. There is no underlying causality - only purely random events with no detectable cause. These are the dice that Einstein didn't like.
There's plenty of similar questions that don't have a simple answer - only answers about large aggregates.
Here's  a recent result showing causality is no where the neat and tidy thing you think it is. These results are all over modern physics.
For example, the most accurate physical theories, such as QED, get the right answer by summing forwards and backwards in time. In QED the "future" affects results today as strongly as the "past". Causality is a psychological interpretation, but is not in the math
or the underlying theories.
As far as I know, all quantum field theories (which underlie all physics at the moment) all have such ambiguity or uncertainty about causality.
The laws are math models that give answers. But the lowest laws are time reversible or time agnostic for the most part, and the foundational theories require travelling forwards and backwards to get the correct experimental values.
Similarly, the laws of physics at the lowest level are not causal, but probabilistic. Only when aggregated do some experiments seem causal.
From another direction, there's a massive body of literature on what is causal, and can you detect it. Read, for example, Judea Pearl's monograph "Causality" or some of his other stuff, or simply browse wikipedia starting with him.
For example, when you drop a ball, it falls to earth. Quantum mechanically, there is a probability it simply quantum tunnels to another galaxy. That it most often falls to earth becomes a law, but it is imprecise and not completely correct. It's an approximation.
So every law of physics is merely a strong correlation.
And there's currently plenty of experiments trying to disentangle these issues with causality and locality.
So sure, at the freshman physics level causality is a simple thing. But Nature does not follow almost any of those rules with certainty or unerring rigor. Those are approximations and simplifications.
The question is do you need a detailed causal explanation you can understand.
In the case of lift, you can simulate aircraft model and calculate the change of vertical momentum in particles going above the wing and below the wing. Usually 80% of the lift comes from above the wing. If you are modelling a brick, it's the other way around. But do aircraft designers need to know that or are they just satisfied with less?
Aircraft designers are happy to test something in a wind tunnel and establish that the change in shape caused a change in performance. But they knew to try that particular new shape that because of all the physical (causal) models at various levels of intuitive understandability.
But then the article indeed jumps from that observation to the claim that causation isn't something we should worry about - which feels just wrong to me (and I think a lot of people).
I mean, all sort of intuitive explanations feel like causal explanation but, of course, are much more sketchy. But this feeling like causality matters imo, the whole network of explanations that satisfy a human together seems to make things robust, not fragile, whereas AI predictions and models are renown for being fragile. But still, I think a big part of the situation is we haven't characterized fully what humans do here.
I'd argue that it absolutely is simple causality. Sure we have to flail around in the dark a little to get to a better plane, but they physically test things during development. That's simple causality right there. I guess I'm arguing that engaging in active (not watching someone else do it) trial and error is a case of causal reasoning.
That is, sure, they don't tell you why the specific lift properties are there. However, they do enable a practitioner to make a change and know what impact it will have on the lift.
This is argument is complete nonsense. Navier Stokes has a rigorous derivation based on extremely high-fidelity assumptions, such as conservation of mass, momentum, and energy. We understand these assumptions, and we understand regimes in which using N-S would result in catastrophe (such as rarefied gases, relativistic velocities, etc.)
Neural networks require data. The Navier stokes does not need to be 'trained'. Deep networks have very little a-priori knowledge baked-in (from a baysian perspective there are priors such as translation invariance that are intrinsic). They are admittedly extremely useful, because they are high dimensional (and so are universal approximators) and can be trained efficiently.
Furthermore, you can develop an intuitive approach to many fluid flows. I can provide a much better estimate of the drag profile for a given wing geometry than an untrained person. No such analog is possible with deep nets, which as significantly more opaque in terms of dynamics and non-linear response.
The only way his comments make any sense is if you assume he isn't talking about physical models, like Navier stokes, and instead consider turbulence models, such as RANS or LES. These are parameterized models, and are used for turbulence modeling. They also have little physical intuition. However, this is not the same as saying we do not have high confidence in physical-based systems such as Navier stokes.
Source: I have a PhD in CFD and several ML publications.
He's making the point that the ML field's obsession with causal inference (and causal discovery) is overrated, precisely because our gold standards of interpretable, safety-critical systems (airplane flight) are based on Navier Stokes/CFD. Planes were made to fly based on empirical validation of these models, long before we gained a more detailed understanding of how causality (the equations and models themselves are time reversible, implying that they contain imperfect knowledge of causality)
And their success is so repeatable that if they fail once we make a Really Big Deal out of it. I don’t think any ML model is close to that level of engineering rigor, let alone deep learning.
Moving on to aerodynamics, we have a pretty good causal model and can simulate the system given a pattern of boundary conditions. Further, some people (who have studied aero/CFD) can even intuitively predict approximately what happens (otherwise we would have a hard time designing planes!). It just so happens that it’s not as simple as high school physics, and cannot be compressed in to perfect+simple rules (trade off between those two).
Speaking in the context of time reversibility, you are being fast and loose, and using the word “causality” in a sense that is irrelevant to the rest of your comment.
I think you missed the point. Navier-Stokes does not explain lift, nor does Bernoulli's law or Netwon's third. Like the friction under an ice skate, it's a phenomenon we take for granted in science and engineering without a rigorous explanation
>Neural networks require data. The Navier stokes does not need to be 'trained'
But you can, and not just conceivably, but practically train a neural net to approximate navier Stokes to a high degree of accuracy with a fraction of the computation of a 3D finite model. Yeah, it will functionally be a black box, but LeCunn seems to be arguing that it's opacity doesn't make it any less valid than our opaque intuition of flight which does not stop us from flying thousands of jets every day.
In physics and engineering, the challenge is to make mathematical models of the world that accurately predict how it works. Once we've done that we can declare success.
I agree with Yann that it is not useful to focus on "intuitive" explanations and a fixation on qualitative casual reasoning is largely useless, e.g., very few physicists care about the "why" of quantum theory.
But there is a big difference between how we can explain how a plane flies and how we can explain how a neural network works:
- The airplane flying doesn't have a "simple" explanation but it can be explained based on fundamental physical principles and mathematics. These models seem to have nearly unlimited ability to extrapolate (within their known bounds of validity).
- The neural network giving accurate predictions can only (for the most part) be explained empirically. The models fail to extrapolate and worse, most of the time we can't even trust the model to know when it's extrapolating.
> But you can, and not just conceivably, but practically train a neural net to approximate navier Stokes to a high degree of accuracy with a fraction of the computation of a 3D finite model.
This is a lovely idea, but I as someone currently doing research in this space (deep learning for CFD) I don't think its true.
There are loads of neural nets that can accurately approximate Navier-Stokes within some limited domain of applicability (e.g., for flow over a parameterized set of airfoils), but so far nothing is close to matching the generalization power of Navier-Stokes.
With enough epicycles one can make the heliocentric model as accurate as one wants as far as forecasting positions of planets. A system of DNN sequence models will do just as good, if not better.
Since we regard Einstein highly (at least more highly than an undergrad who can train a TensorFlow model on planetary position data), it makes me think there is more to physics than having a mathematical model with good predictive accuracy.
The only way I know to make neural nets work well for this problem is to build in lots of physics into the model architecture, e.g., conservation of energy:
The argument for explainability depends on the risk of harm from an AI model decision. Explaining why airplanes fly is moderately interesting but why did the 737-max crash is much more interesting and needed. While the former is probably only needed for people studying aerodynamics, the latter is meant for passengers, regulators, airlines, governments, etc.
Here is a tweet thread we posted in the past: https://twitter.com/krishnagade/status/1182317508667133952
I nominate this as the worst analogy of the year.
Airplanes are rigorously tested under the same conditions they will operate in. AI by definition is tested under conditions that are different from the environment it will operate in, because that's the whole point of AI - we want algorithms that adapt themselves to novel information.
When people ask for explainable models, what they really want (in my opinion) is calibrated and robust uncertainty estimates .
Good uncertainty estimates would let them know when to trust a model's prediction and when to ignore it.
For example, a model trained to predict dog breeds should know nothing about cat breeds, and there should be some way to quantify when it doesn't know!
I've been doing a review of techniques that are becoming more popular in this area:
The underlying reason why high confidence is not enough is that even strong/confident correlations could be misleading when seen in causal light — a black box model trained to predict credit performance might be very confident in rejecting loans for applicants from “poorer” zip codes and approving those from “richer” zip codes — even though those are not actual causes... therefore somebody could exploit the system by renting an address in a rich neighborhood for a couple of months when taking out a big loan (the analogue of adversarial examples).
Your example points to models that provide low quality uncertainty estimates, but that's not true for all deep learning models.
I believe it's these low quality uncertainty estimates that lead people to look toward "explainability" as a solution, but for the majority of use cases, I think people just want better uncertainty estimates so that they can "know when they're model doesn't know".
There are techniques now to get higher quality, calibrated, uncertainty estimates that don't suffer from the problems you mentioned and I've outlined these solutions in my posted link above.
Additionally if you're interested, there is some nice recent research from google on the subject:
and from oxford:
> calibrated and robust uncertainty estimates NOT low quality and uncalibrated uncertainties estimates.
Could you explain what you mean by “calibrated” and briefly summarize the essential idea behind what allows the learning of robust uncertainty estimates, if not a causal understanding?
If you haven’t already, look up work by Scholkopf, Janzing , Peters and co (over the last decade) for a justification of why causal reasoning is exactly what you want if you want to generalize across covariant earth/dataset shift (which is basically what the Google blog post is about).
Considering that ML has access to orders-of-magnitude greater quantities and resolution of information that we do, why do we expect their decision structure to be intensely linked to our own decision process.
My counterpoint is when a model trained on dog breeds misidentifies a breed with a high certainty factor, the model should be able to justify (explain) why it made that judgement.
All the complications come in the exact details of how they deflect air down. How much is lower than ambient pressure above the wing redirecting slipstreams vs higher than ambient pressure below the wing doesn't fundamentally change the answer, though those details certainly matter (especially when designing a wing).
In contrast, even for a single "AI", how it responds differently to different input is unlikely to be even remotely explainable by the same high level principles, and it's not clear that the details don't matter.
Now granted even with that magic, there would be downward air movement relative to the airfoil, due to the airfoil rising, but that would be an effect of the lift, not a cause.
Now I could well be overlooking something here, and having invoked magic to violate conservation of momentum, this thought experiment is not rigorous, and is discounting the fact that the equal time hypothesis is unfounded, and even wrong. (But in reality the air above moves faster, so we would see a larger effect than predicted, even before accounting for redirected airflow.)
Of course there is no magic canceling out any downward momentum of the air from the top of the wing, and planes normally have a nose up attitude, so the bottom of the wing also directly reflects air downward, so downward directed airflow is definately also a cause of lift.
LeCun is also heavily biased against Bayes.
The point is though, the functions we're trying to learn, like image classifiers, have no responsibility to us to be understandable. In fact, the brightest minds of several generations have worked hard on trying to write down rules to do image classification, and they never came up with anything that works.
There is a huge space of functions that are beyond what a human can understand, where we can't write down rules to express the function. This is precisely where we need to use machine learning to find the functions. Lack of explainability is not a bug, the entire point of a neural nets is to find functions that are beyond what we can understand.
There are a whole set of mathematical functions that are beyond the range of a typical human mind to understand. But there are proofs that exist that explain how they are correct.
I don't think it's a big ask for a new model to be able to justify its own determinations.
Of course, that makes things slightly more difficult for all those in research and those selling snake oil and everyone in between. But that's what the difference between science and alchemy is isn't it?
You are a black box AI. I can nevertheless trust you to classify dogs vs cats.
We're talking about algorithms that are based on a simplified neural architecture, no redundancy, no self reflection and are still quite immature.
Nevertheless we're being asked to trust a black box AI, that you cannot interrogate?
Yes, of course, what's the worst that can happen?
On the other hand, if you place Magnus Carlsen against AlphaZero in a game of chess, I will bet on AlphaZero. If however you reduce the complexity of AlphaZero down to a level that it can produce an explanation I can understand, I would instead bet on Magnus Carlsen.
Of course we should care about the quality of AI systems, but chasing a human understandable explanation is just the wrong way to go about it, since it in many cases necessarily limits quality of the decisions.
You don’t need to reduce complexity to induce explainability. You just need to decompose the function into smaller parts which you can understand.
Contrastive LRP for example is a Function decomposition technique for explaining deep neural networks with high fidelity.
Here's a paper that you may not have read.
You can't know something if it isn't understandable. Knowledge requires some sort of understanding.
That is an exact reproducable procedure that tells you how the neural network works. But you are a human being, and have a short term memory capacity of about 7 items. And 100 million parameters is too much for a human to really understand.
The point is that there are strong reasons to believe that no procedure for classifying cat vs dog is small enough that humans can wrap their heads around it. And why is this a problem? The human vision system is exactly the same, complete black box, yet we rely on it every day.
It shares some features, I grant you, but decoding cats/dogs by welding a classifier to the equivalent of V3/V4 isn't what a mammal does.
Furthermore; A "conscious" short term memory of 7-10 sequences is correct. So we break issues down into manageable chunks and it's turtles all the way down.
Comparing the product of >200m years of evolution Vs a decade or so of human endeavour is a strawman.
Contrastive LRP would be a good starting point for you for generating high fidelity explanations at any point in the network.
However, for many that argue for explainable AI, CLRP falls way short of what they want. In particular, the symbolic AI crowd would scoff at it. This is the crux of the issue in my eyes, that the symbolic AI crowd has taken "explainability" as a way to justify methods that don't work.
I have no issue with methods that allow greater understanding of neural net internals, that's essentially what all neural net researchers spend all their time on (and it's the path towards better performing methods).
My friend, you have not even scratched the surface. First off, an elucidation of the inputs and the procedure by which an output is generated does not an explanation of the system make.
When I look at an image classifier, I want to know what features it's using to make a determination of being cat-like. That way, I can compare that with my own experience to make sure if I cut someone loose with a cat/dog detector, someone doesn't get given the idea that a young bear is a dog. Your trivial AI cat/dog detector may identify cat/dog like features in a still, but that's not equivalent with being able to distill the essence of cat/dog from the reality and common experience of the world around us. If you're going to try to sell me on a system that purportedly knows what something is, I expect it to actually represent the level of intelligence you make it out to possess.The neural networks we manufacture are of a level of magnitude so much narrower than what ML people seem to want the lay person to give them credit for.
Think of it this way:
As a programmer, I am expected to be able to create an accurate enough representation of what is going on in a complex system that a non-programmer can connect what the system is doing to whether or not it is doing what it should be. Given enough time and patience on the non-programmer's part, I should be able to transfer and walk through enough information where the non-programmer suddenly becomes a novice programmer because they have had the same foundational skill and knowledge structures communicated to them.
No one will be satisfied with "I chucked this data in, therefore it's a cat/dog detector now. No more questions." Especially when you start applying that to decisions of life-altering importance. You must be 10% smarter than the piece of equipment for it to be lynchpin in a life-critical application. That means being able to explain what your system does, how changes to inputs will effect it, what it's error margins are, what the safe operating conditions are, when it's plain flat out wrong, and as much as possible, why.
Until such knowledge can be sufficiently communicated, I see no reason to take even the most well-known luminary trying to handwave explicability as anything but trying to avoid having to uncover enough of the mystery of what they are working on in order to meet what has been accepted as sufficient due diligence.
To do so is patently unwise, and implicitly accepts far more egg breaking to make an omelette than we (those whose lives will be in the system's hands) should be willing to entertain.
That was exactly my point. Are we talking past each other?
My point here, if you wish to engage with it, is that when we evaluate trust in an AI system, we care about how good it is. And it is the case that quality is very often anti-correlated with explainability.
Suppose your life depends on winning a game of Go. Would you want AlphaZero on your side, or a Go engine that would present you with a list of the options it evaluated, so you can verify its decision? Of course, AlphaZero would beat the latter program every time.
If this desire for explainability is taken seriously, the result is that we'll end up picking methods that perform worse, and this will cause real world harm as AI becomes a larger part of life critical systems.
people will keep asking "why" and keep escalating, but that's a flawed approach to evaluating a model. every answer is literally constructed and thus can't be trusted on its own. The best that an evaluator can do is to evaluate the training dataset
Well argued! Yes, it is true that we don’t have causal relationships for a number of phenomena. But the absence of evidence is not the evidence of absence. So corner cases of failure may exist in all these phenomena which can cost lives. Does this mean that we halt the fast paced progress of AI research or any other scientific pursuit? No! But leveraging empirical evidence that can’t be fully explained in situations where lives are at stake, should require a very high bar of regulation. Responsible scientists and engineers agree that causality is important to understand and they do all they can to understand how systems work. However there are likely many among us who do not employ similar standards to the application of ill understood techniques. When it comes to regulation, we must pay heed to the worst in us.
Actually, there is no clear definition of causality in classical and quantum physics, simply because the elementary equations of motions are time reversible (if you also reverse charge and parity). For every phenomenon that can occur, the same phenomenon can occur backwards in time (with corresponding anti-particles).
Take a Feynman diagram where an electron and a positron annihilate to produce a gamma ray photon. It can be interpreted as a gamma ray photon spontaneously creating an electron-positron pair. It's the same diagram where time goes right to left instead of left to right.
How can one possibly say that A causes B, when B could very have caused A in a time-reversed view point?
Even worse, most physical phenomena have loopy causal graphs. Motion "causes" friction. But friction limits the velocity of motion. Most differential equations have coupled terms with loopy interactions in which quantity x affects quantity y and vice versa. You rarely have y(t+dt)=f(x(t)) in physics. More often than not, you have coupled equations y(t+dt)=f(x(t)) and x(t+dt)=g(y(t)).
In these (frequent) cases, x cause y, but y also causes x.
There is something like that going on in fluid dynamics, which is why it's difficult to come up with "simple" causal explanations.
Only when collective phenomena are considered does the "arrow of time" appear to have a definite direction (with the 2nd law of thermodynamics).
Edit: para spacing
This is super misguided. Causation doesn't need physical grounding. The usual definitions are grounded in counterfactuals and interventions.
It's like defining the causal effect as the difference in potential lines of code I write today, depending on whether I do or don't eat breakfast this morning, but then arguing that under classical mechanics the universe is deterministic so I couldn't possibly take an intervention to not eat breakfast. Really, it's just another argument that counterfactuals are counter factual, and therefore don't have a clear definition. But yeah, that's in the name already.
Regardless, physical equations are pretty much all causal, because they are stable under intervention. If I intervene to do whatever, they still hold. That lets me cause a change in friction be intervening and swapping out a material.
For example, looking at whether I write this comment now instead of getting back to work after lunch, you could imagine that I'm more inclined to slack off after a heavy meal. We have two hypotheticals. Scenario 1) I eat curry for lunch. Scenario 2) I eat salad for lunch. In scenario 1, I am a bit sleepy and log on to HN and write this. In scenario 2 (hypothetically) I just get right back to work and this comment never exists. The causal effect (hypothetically) is that having curry instead of salad for lunch made this comment exist.
Compare that to the effect of the comment I wrote before lunch. We can still talk about the causal effect of lunch on what I wrote before lunch. Assuming that physics works the way I think it does, the causal effect is that nothing changes, but it's still a thing you can formalize. You could imagine an episode of star trek involving time travel, where my actions today do affect things yesterday. Starfleet statisticians could still run time-reversed randomized trials, they'd just have to be really careful about their experimental designs.
> We can still talk about the causal effect of lunch on what I wrote before lunch.
To the extent that the situation occurs in the physical universe, that is meaningless. The future can never be a causal parent of the past. That says there is much more to the typical causality question than just the direction of time.
The difference between the “time reversibility” in physics and the typical ML example is that once you’ve decided on something as the effect, the other thing must be the cause. So the problem is quite simple if you’re not trying to figure out the nature/direction of time, and especially not with microscopic physics.
Anyone curious about the relation between time reversibility in physics and causal reasoning could look at Scholkopf+Janzing 2016 for an interesting line of thought.
Somebody should tell Yann Lecun that a little knowledge is dangerous.
That's a testable statement. The reason it's testable is because we can define a hypothetical causal effect of the present on the past, and then do experiments and show that it's always zero. I only mean that we can talk about it, not that we can show it to be anything but a non-effect, so I'm not sure I see where we're in disagreement.
There is an old joke about a mathematician and an engineer invited to kiss someone they both fancied if they got there first. The catch was they could only move half the distance towards the inviter each period.
While the mathematician fretted about Zeno's paradox preventing from convergence, the engineer got close enough to the inviter, eventually, to take them up on the offer.
LeCun is the mathematician in this analogy.
Note: he responded to Mukherjee's comment here. Why?
Mukherjee made a comment.
LeCun then responded.
The causal reason that LeCun responded is because Mukherjee commented.
There may be a host of associated factors -- perhaps LeCun only monitors comments while bouncing on a pogo stick in a large conference room and simultaneously sipping a latte. It doesn't matter. Because if LeCun were bouncing on a pogo stick in a conference room and sipping his latte, he would not have made the response if Mukherjee had not first commented.
So, we have clear existence of causality. If needs be, we could build an entire framework with that observational evidence.
So why not just stipulate that the equation is one directional, and there is another (inverse, but otherwise identical) equation for the reverse process?
"How does lithium treat bipolar disorder?"
The thing is, that this isn't a question that's insignificant. There has been quite a bit of debate whether various drugs reverse the basic process happening with a given psychiatric disorder or whether they provide a different change which allows a person function. The phrase "chemical imbalance" has been based on the supposition that various drugs that change brain chemical distributions "cure" various conditions - but the question these drug are directly reversing a condition or whether they are adding something more seems important, even if we assume the drugs are broadly useful for helping people function in society.
Firstly, the Navier-stokes equations existed before mechanized flight was discovered. It was not explicitly invented to reason about flight, in that way that some of the deep learning "theory" is being projected. It also worked perfectly (in sub-sonic speeds) for almost every situation involving flight.
ML has time and again proven to not be internally consistent, with contradictions on nearly every corner.
Batch norm was "theoretically" thought to be an essential part of neural networks, until suddenly it wasn't. Drop out was considered essential to avoid overfitting, until it wasn't. We still do not know if an information bottle neck is good or bad. ADAM's whole formula was incorrect and no one realized that for 6 years.
> The mechanism of action of many drugs (if not most of them).
> An example? How does lithium treat bipolar disorder?
This example does more to disapprove his point that approve of it. Medical and nutritional sciences are among the least understood fields out there, with some of their "fundamentals" flipping on their head over the last 50 years. The only reason these "sciences" continue being used is because medicine is essential. A reasonably effective solution with side effects is still better than dead people. It is a begrudging compromise, not a example to be emulated.
AI is not essential. AI imposes on your life without consent. AI will soon be ingrained into every single aspect of your life.
Yann seems to be conflating explainability with causality. Explanaibility can also mean fully observed correlation. Explainbility can mean predictable and reproducible behavior of ML models given a hypothesis. Explanability can mean the ability to ascertain if the change in model performance was because of the hypothesis or a way to exploit an unintended aspect of the data/model architecture.
Explainability fundamentally allows ML researchers to make strides in the field in a meaningful way, and not blindly throw different computational structures at thousands of GPUs and let luck of the draw decide what works and when.
Looking back at ideas such as Transformers and ResNets, there was literally no way for the authors to guess that this new computational structure would revolutionize the field. It could easily have been an idea someone else tried or rotted in someone's TO-TRY list. Explainbility and some theoretical logic around NN development would allow for a systematic way to go about research in the ML community as whole. That's unlikely to happen, but I would rather see people strive for it than not.
To him, all the precursors to clinical trials (selection of a molecule from restricted set of molecules which satisfy certain causal criteria) then followed by extensive multi million dollar experiments to safeguard the last 1-2% of uncertainty is equivalent to a barely preprocessed neural network.
I mean let’s ignore the basic principles of pharmacology and medicine and just run every possible molecule through humans since that is a valid approach according to him.
For those on a more mathematical bend, check out "Causality" by the same author or "Causal Inference for Statistics, Social, and Biomedical Sciences" by Imbens and Rubin
But it's so easy to prove gravity's properties (on Earth) by consistent results with experimentation, that it's literally a science project in every elementary school.
We get a glimpse of it from recent language models: It's become rather easy to start blurting out language that is convincibly and comfortingly coherent, and it can be nudged to point to one direction or another. that doesnt mean it's true
It takes that long and costs that much because there is no safe way to do this with human physiology other than to carefully, slowly and expensively try it and measure the consequences.
We can take any AI system and model its behaviour with statistical certainty much more cheaply and quickly, and be more confident in its future behaviour.
Remember, AI and ML are fancy words for some form of a regression to a mean more often than not. If we run hundreds of millions of tests of such a system in simulation for a very wide range of contexts/inputs (which is cheap to do), we can have a much higher degree of confidence in a short space of time around behaviour of that system than we will for any drug test, even if we still don't fully understand the causality.
But we are quickly moving beyond the types of applications where you can simply test in simulation. Self driving cars are probably right at the edge of this. It's what comes next that worries me. And I don't pretend to know what it will be.
So even when we do not understand the precise impact of drugs on humans and there is no safer mechanism to test, we leave only 0.5% of candidate molecules to empirical/statistical evidence in the form of clinical trials.
On the other hand if drug discovery was treated as a pure AI problem, we would have thousands of unverified and unsafe molecules in clinical trials.
Causal principles get us to 99.5% of the way in drug discovery. Unfortunately not so in AI.
You're still left with double-blind trials and having to get large sample groups to try those molecules though.
And it's for that reason that drug discovery is always likely to be quite slow, complex and expensive - the efficiency gains will be pushed towards the top of the funnel to make new ideas reasonable to explore, I would imagine.
My point was that when you're not dealing with human physiology and instead dealing with problems that are more tractable through AI - i.e. using regression to tune algorithms through patterns in data - you are going to get quicker and more impactful returns without the same complexity.
And - critically - it's OK to often trust the AI solution you have without understanding causality. If you later find it's doing something odd that is undesirable, you can use that data to help tune the algorithm again without having to understand the causal relationship.
Put another way, you can teach an AI to get better without necessarily understanding the subject completely yourself.
Finding something odd for an algorithm (especially a deep neural network) is hard because they fail in just so many ways. For example, lenet for mnist almost always gives high confidence predictions for random tensors(torch.randn). Most imagenet models fail in the presence of just 20-30% salt and pepper noise. (Both of these are problems solvable through simple preprocessing techniques)
There's a lot of good points in there from both sides, but what really stuck with me is that given the choice between an explainable model and a black box model that works better (more accurate predictions), most people choose the black box model.
And there is no reason to expect that a deep learning model has to be unexplainable. He’s putting up a mystical interpretability vs accuracy tradeoff which does not exist.
There is a ton of research out there making deep neural networks (slightly to significantly interpretable)
People love to be told what to do. Even if in the face of overwhelming real world evidence people blindly follow their GPS, and to ignore it can cause a lot of stress.
The Deductive Nomological Model (Hempel and Oppenheim, 1948) tries to explain a phenomenon using a deductive argument where the premises include particular facts and a general lawlike statement (like a law of nature) and the conclusion is the thing to be explained.
The Statistical Relevance Model (Wesley Salmon) attempts to fix some shortcomings in the DN model that allowed explanations using particular facts and general laws that were not at all relevant to the phenomenon being explained. The idea is that you can explain why X hasn't become pregnant by saying that X has taken birth control, and people who take birth control do not become pregnant, and that would fit the DN model, but this explanation is not statistically relevant if X is male.
Unificationist accounts (Philip Kitcher) seek to unify scientific explanations under a common umbrella as was done with, e.g. electromagnetism. If it is possible to have a unified theory of something, each element becomes more explainable based on its position within that unified theory 
pragmatic and psychological accounts tend to fit more closely with the kinds of rationalizations that we've seen as some explanations of AI. They can be fictional, but they don't have to be 
IMO we don't currently have an adequate account of explanation within the philosophy of science that works for deep neural networks. This is what my dissertation research focuses on.
I also see this very dogmatic mindset that Deep Learning will do prediction and interpretability.
What is stopping you from building two models? a regression statistic model to do interpretability/explanability and another deep learning for prediction?
Like each coefficient in a regression have a t-test for significant in correlation with response. You don't have something like that in deep learning. Also I've seen many MLer use logistic regression as a classifier and ignoring the probability aspect like the Titanic dataset highlight the different mindset between statistician and ML. ML often will see this as a classify problem dead or not dead. Statistician will phrase to "What's the probability of this person dying with these covariates?"
You know why this matter? It really matter in health/medical/social science. Often time inference is what they want and they need to know what affect your health not just shoving in tons of data and covariates/features. Not only that you many not even have enough data for these data hungry ML models.
Another example is biostatistician figure out threshold between the benefit of taking an invasive procedure versus not taking it. We figure it out but giving a percentage and the doctor and experts will tell you where the threshold is, 20%, 40%? It's certainly not 50% that many MLer do.
> We often hear that AI systems must provide explanations and establish causal relationships, particularly for life-critical applications.
Yes, that can be useful. Or at least reassuring.
To me this just an excuse to not learn statistic. He should really look into Propensity modeling under Rubin-Neymer causality model. This is what statistic is going into after regression for observational data.
With all the criticism I have for ML. I think it's just the mind set. I think the ML algorithms have a place and they're very good in certain domain such as NLP and computer vision. But to act as if they're the end all be all when statistic models have been there and use extensive in biostatistic and econometric fields is just hubris and ignorance.
While ML is making excuses for causality. Econometric and statistician are working to build causality model. IIRC econometric is doing structure equation while statistician are going for Rubin-Neyman model. There is debate on which model is better but that's ongoing we'll wait and see from all the research papers.
> Please submit the original source. If a post reports on something found on another site, submit the latter.
The concept of what counts as a causative explanation can be more expansive, and it varies between disciplines. See the work of Nancy Cartwright.
TL;DR we've been explaining "causes" without Bayes Nets for awhile, Bayes Nets unsubtly disregards the common-sense logic scientists use for their practice, including the way that explanations tend to be qualified by context.
Our overestimation of the comprehensibility of the world may very well be some version of the Drunkard's search principle. We are much more likely to know about what's comprehensible than what's not.
Compare with efficiency vs efficacy.
"We often hear that AI systems must provide explanations and establish causal relationships, particularly for life-critical applications.
Yes, that can be useful. Or at least reassuring.
But sometimes people have accurate models of a phenomenon without any intuitive explanation or causation that provides an accurate picture of the situation. In many cases of physical phenomena, "explanations" contain causal loops where A causes B and B causes A.
A good example is how a wing causes lift. The computational fluid dynamics model, based on Navier-Stokes equations, works just fine. But there is no completely-accurate intuitive "explanation" of why airplanes fly.
Is it because of Bernoulli principle?
Because a wing deflects the air downwards?
Because the air above the wing want to keep going straight but by doing so creates a low-pressure region above the wing that forces the flow downwards sucks the wing upwards?
All of the above, but none of the above by itself.
Now, if there ever was a life-critical physical phenomenon, it is lift production by an airliner wing.
But we don't actually have a "causal" explanation for it, though we do have an accurate mathematical model and decades of experimental evidence.
You know what other life-critical phenomena we don't have good causal explanations for?
The mechanism of action of many drugs (if not most of them).
An example? How does lithium treat bipolar disorder?
We do have considerable empirical evidence provided by extensive clinical studies.
This is not to say that causality is not an important area of research for AI.
But sometimes, requiring explanability is counterproductive."
There are two types of explanations here: (1) why did the data come to be as it is, (2) why did my ML make the prediction it did.
Science looks for the answer to (1), and causal models are a great way to think about it. Science and engineering, when they go hand in hand, build a machine by saying "Here is data, let me do science to understand nature's underlying laws, and my machine shall be based on those laws". The machine is inherently explainable because it's based on scientific laws.
In the ML world, we can bypass the "learn scientific laws" part, and jump straight to "build a machine based on data". So the best answer to (2) has got to be "my ML made the prediction it did because of its training data". As Pearl said, ML is just curve fitting, so the only way to "explain" a ML prediction is to say "here are the points that the curve was fitted to". Prediction is just reading a value off the curve. Think the machine is biased? Look for bias in the training dataset! Think the machine is inaccurate? Look for sparsity or conflict in the training dataset!
So the consequence of LeCun's distinction is that, when the GDPR calls for explainability of ML decision making, it is really calling for sharing of the training data. Facebook, watch out!