Hacker News new | comments | ask | show | jobs | submit login
How Judea Pearl Became One of AI's Sharpest Critics (theatlantic.com)
308 points by stevenwoo 9 months ago | hide | past | web | favorite | 121 comments



I really like that people asking fundamental questions about the way AI can be pushed forward beyond just ML seems to gain a little bit of traction again, at least in the public eye. Even Kissinger opined on the topic just a few days ago

https://www.theatlantic.com/magazine/archive/2018/06/henry-k...

I agree with Pearl that there's something deeply misguided about thinking that intelligence truly can be solved as a data optimization problem. I'd be happy if we'd see more research about AI at a symbolic level again.

The biggest hindrance seems to be funding. ML is really successful in a lot of commercial domains, general AI is a big moonshot. With most researchers moving from long term university positions to the business sector I'm concerned about this sort of research, not just in computer science.


> I agree with Pearl that there's something deeply misguided about thinking that intelligence truly can be solved as a data optimization problem.

When all you have is a hammer... I think we’re really thinking about the whole thing backwards, as if intelligence only comes from the cortex of the brain. I think human level intelligence is fundamentally an emotional state of being - our core lizard brain values aren’t something to be swept aside - they’re a fundamental part of real intelligence. It’s like trying to build a car starting from the body and electronics rather than the motor.


Well, if a computer scientist only has a bunch of data, is there any conceivable way to reason about that data that doesn't ultimately boil down to a data optimization problem?

I hear that correlation does not imply causation but also that you can't distinguish correlation from causation merely with a stream of data. Is there any way out of this situation with just data?


I think apart from the work Pearl and others did with the do-calculus, all you can do is suggest experiments. You can only ever have a true ai if it can interact and ask questions about the environment.


Well, even if an AI can interact with it's environment, how can those interactions not simply also be correlations between input and output. We have for example, recurrent neural networks which seem to do this already.

It seems like saying "it's not optimization" has to boil down to "it's not generic optimization, it's a special kind of optimization." But maybe there's some formalism I'm missing.


As a thought experiment, imagine an AI is trying to work out if a rooster crowing makes the sun rise or the other way around (because they are always correlated in its experience). It can do the experiment of putting the rooster in a bag and see if the sun doesn't rise. It can refine it's model after doing this experiment (all with optimization). It is more about untangling correlations that come from latent variables that the system doesn't know about yet.


Well, that sounds an awful lot like an optimization problem, and is actually reminescent of a Machine Learning technique called Active Learning: the model suggests data points for which it has low confidence (say images that resembles both a cat and dog according to its own understanding), and a human labels those (or in your case: the AI itself runs an experiment to get the ground truth) to gain the most information.


It's not quite the same as active learning. Experimentation and active learning are quite different in ways I didn't fully appreciate at the beginning of my career, and they come in the form of (conditional) independence relations, which you can get around with either structural assumptions or randomized intervention.

It's the entire reason why we randomize in experiments. In active learning, the machine doesn't know what happens when the rooster is in a bag, so decides to try it. In a randomized experiment, the machine knows what happens when the rooster is in a bag, but thinks there might be other factors at work, so it decides to try it in a way that it can be sure that all other factors are equal (at least on average).


To be considered a generalized AI, it does not just have to run experiments, it has to create hypotheses and devise experiments that will both test them and reject the null hypotheses, which brings us right back to reasoning about causality.


Yes. I completely agree. I think he (Pearl) is being unfairly provocative and there has been a lot of progress in these areas.


The thing is that "not knowing casualty" along with "lack of understanding" feels very plausible for an explanation of what current AI is missing. And while AI has made lots of progress, it certainly feels like this progress is all of a certain kind (though that could the human reflex to discount any problem solved by robot).

The problem might be that "understanding" isn't something we really have a good working definition of.


Your definition of data is too narrow. The result of any experiment is just data. All the human brain has to work with is just data. Either there is enough information to distinguish correlation from causation analytically, or humans are actually incapable of distinguishing the two. That doesn't mean current analytic techniques are advanced enough to do it, just that it is possible for machines if it is possible for man.


Either there is enough information to distinguish correlation from causation analytically, or humans are actually incapable of distinguishing the two.

Well, what I'd like to know is how you distinguish correlation from causation with any amount of data. What definition, in terms of data, do you use?

Note, probabilistic and algorithmic theories of causation have a long history and many problems, see:

https://plato.stanford.edu/entries/causation-probabilistic/


We can call it a data optimization problem, but that does not tell us anything about how we go about solving it, or even what we are optimizing for.

Alternatively, perhaps what's being optimized is so general that generalized AI is not, and cannot be, 'just' an optimization problem in any useful sense. Something like the biologically-inspired 'maximize lifetime production of descendants' does not get us very far.


"We can call it a data optimization problem, but that does not tell us anything about how we go about solving it, or even what we are optimizing for."

If you have a time-ordered stream of data, the problem of predicting the output given previous input naturally suggests itself. There are other approaches, of course.

"Alternatively, perhaps what's being optimized is so general that generalized AI is not, and cannot be, 'just' an optimization problem in any useful sense."

- Perhaps you mean this as "the problem is so complex that it can't be naively approached only with conventional, local optimization tools". Because a huge optimization problem is still an optimization problem in some meaningful senses, I mean I could consider my life as a "fun" or "flow" or "enlightenment" optimization problem.

In general, it seems like a lot of reactions to particular AI methods go from "this approach isn't enough to get us all the way there" to "real AI is fundamentally, absolutely different from this, this approach is useless", a jump I don't think is justified.


One thing that I never see brought into the equation but I believe is fundamental to human "general" intelligence is emotional response. It's hugely instrumental in terms of us deciding what's important to investigate, which input to analyze and how to mold our output. The importance of emotion in guiding and driving our learning is most evident in children, during the first four or five years of life of development.

I think a huge leap forward in the field could be made by modeling an emotional system similar to ours. It also seems like it would be an incredibly difficult and fuzzy problem to solve.

Addendum: After thinking about it for a few minutes more, I bet an implicit understanding of cause and effect could be derived by a system designed to try to optimize it's emotional state


Not just emotion, but also aversion to certain physical states - such as hunger or pain.

Because of that, I think a "true AI" - at least one without a simulated humanlike body - would have rather different desires. Optimization towards those desires rather than humanlike desires would likely result in something that distinctly does not act human.

I can't help but wonder if we've ignored or disregarded any first steps to a general AI because the resulting building blocks didn't match any of our expectations from real-world models of instinct/intelligence.


I do think the impact of emotions on our thinking is understated, but I also think that focusing on this aspect is going at it backwards. A value system is necessary for emotions, otherwise happy and sad mean nothing. I think this is what you mean when you talk about "modeling emotions". At the same time, some intelligence seems like a prerequisite as blindly following a value system will lead to massive hedonistic AI orgies. Er, I mean, an override mechanism is required, and intelligence is the best possible solution, so we're back at the fundamental problem. I think there is a lot more intelligence behind even our most primitive urges than we realize. Hope that made sense, typing on mobile.


> A value system is necessary for emotion

I think a value system is mostly a post hoc justification of our emotional states. I view emotions as primarily a tight biochemical feedback loop. Think more of the four “f”s than, say, platonic feelings of love.


I think it’s the opposite - they come from innate chemical values (procreation, socialization, base metabolic needs) and emotions are created upwards from reaction to stimuli as it pertains to those values. Maybe we’re saying the same thing, but the starting point is important as it pertains to any artificial intelligence being able to be created.


Or perhaps another angle can be explored - better than human intelligence directly instead of copying humans.

There are certain machines already better than humans(any other life form) in physical tasks — why do we need to have human intelligence as the base scale to measure machine intelligence?

I mean, the early man didn’t create his tools to compete with human nails.


Isn't this the theory of co sciousness presented in After On? I am curious what people are formally studying emotional theories of co sciousness. I'd love to read more.


You've read After On? How was it? I picked up a copy a couple of weeks ago but haven't gotten around to reading it yet. Is it worth bumping it up the priority queue a bit?


It could start with a value system rather than raw emotions - at some point it might decide that it doesn’t really want to do some work. The idea of controllability in all senses I think complicates the issue, but it’s definitely a necessary safety net.


More than funding, then issue is data. Untangling correlation from causation usually requires running additional experiments.

It's important for people to understand these limits of current ML, but they are happening for a good reason. You can apply associative reasoning in many environments that don't support causal reasoning, and we are still scratching the surface of the value discovering associations can create.


Nature herself "solved" intelligence as a data optimization problem. We can too.


I don't disagree. However, the next time you find yourself pondering the question "Why did Bob do X? That's so bizarre, and it makes no sense", you should consider what the repercussions may be if Bob were not just your awkward acquaintance but rather an autonomous computer system distributed across the globe that's assigning credit scores to people. Is there really room for 'Well it just felt right at the time' in those situations?

Humans are often quite bright, but we're also known to do the wrong thing for no discernible reason. This is to be expected when there's no fundamental formal system behind behavior, and behavior is instead driven by a black-box neural network.


There's problems with that attitude. First, it offers no scientific insight. We don't learn anything about the fundamental nature and skeleton of minds by simply replicating evolutionary processes. It does not grant us knowledge.

The equivalent in engineering would be to throw over trees and rocks in the hope of building a bridge. Clearly, that is unsatisfactory, we strive to understand the meaning of systems so that we can reason about them and alter them in predictable and fundamental ways.

Secondly, we don't know how likely it is that evolution produces intelligence. Maybe we're the only intelligent spot in the universe and it's an aberration. It took 4 billion years as well.

That seems to be a fundamentally impoverished way to go about things. We shouldn't forego the ability to understand minds at a deep level just because we have made practical strides in closed domains. That would be to mistake a trojan horse for an actual horse.


>The equivalent in engineering would be to throw over trees and rocks in the hope of building a bridge.

Imagine if aliens dropped a machine learning computer on earth in the 17th century.

Maybe we'd have never bothered to derive the laws of classical mechanics.


I don't follow. Classical mechanics are still very useful, no?


Yes, but we might have never created metal models and theories if we had a magical machine that could predict the outcomes of physical systems.

The machines might have replaced classical mechanics, but the downside is that the machine would be a black box, and we would never really understood how it derived its results.


Fair. That assumes we wouldn't try and understand the black box. I suspect somebody would have, but I have nothing to base that suspicion.


What was the data optimization problem? To pass our genes on? Because lots of critters have been doing it far longer than the big brained ones. Bacteria and viruses are more successful than any life forms, and they're not running any neural equivalent of data optimization.


Well, if you consider nature as an agent, we are part of it. And if we do successfully leave the planet, that will be nature successfully leaving a planet. Which will get nature closer to surviving a sun collapse. Something the rest of nature here is not likely to do.

And, in the scale, nature has had to deal with many stars collapsing.


Sure, I don't concede the point.

But even if it were true, do you want to spend billions of years solving it that way?


Nature has a huge environment to run its agents in. AI can only access games. Robots IRL are too expensive for optimisation. Solution: better simulators.


We have an intelligence, while Nature doesn't. We can do better than Nature.


Worth noting that there has been a fair bit of good research in causal machine learning in the last year or so, for example "Implicit Causal Models for Genome-wide Association Studies" (https://arxiv.org/pdf/1710.10742.pdf).

The key point of this paper is that neural networks really are very good at "curve fitting" and that this curve fitting in the context of variational inference has advantages for causal reasoning, too.

Neural networks can be used in a variety of structures, and these structures tend to benefit from the inclusion of powerful trainable non-linear function approximators. In this sense, deep learning will continue to be a powerful tool despite some limitations in its current use.

I think Pearl, who's obviously remained very influential for many practitioners of machine learning, knows the value of "curve fitting". However I think it's a bit hard for a brief interview to sit down and have a real conversation about the state of the art of an academic field and the "Deep Learning is Broken" angle is a bit more attractive.


It's worth considering that anywhere in graphical models where coefficients of any sort learned can be augmented by neural networks (such as in the last decade of natural language processing, where the SOTA of almost all problems has been successfully neuralized).

I wonder if Deep Belief Machines and their flavor of generative models, which seem closer in nature to Pearl's PGMs, have a chance to bridge the gap involved.

Edit, as an aside: Given the enormously high dimensionality of personal genomes and the incredibly small sample size, for over a decade I've failed to put any trust in GWAS studies and found my suspicion supported on a number of occasions, considering difficulty in reproducibility likely brought about by the above problem. Is there any reason to think that improved statistical methods can possibly surmount the fundamental problem of limited sample size and high dimensionality?


Numerous important biomedical findings have resulted from GWAS. Most GWAS today are inherently reproducible since their hits usually come from multi-stage designs with independent samples. Sample sizes are no longer "incredibly small" either; large GWAS often have in the order of 100s of 1000s of patients. Some have over a million.

I suppose the most important idea is that GWAS aren't really supposed to show causality. "Association" is in the name. GWAS are usually hypothesis generating (e.g., identification of associated variants) and then identified variants can be probed experimentally with all of the tools of molecular biology.

In summary, GWAS have their problems, but I think your statement is a bit too strong.


Mendelian randomization is a good technique to start thinking about causality for epidemiological studies.

This is a good paper that demonstrates the approach: https://www.nature.com/articles/srep16645 Millard, Louise AC, et al. "MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization." Scientific reports 5 (2015): 16645.


Thousands of samples and millions of dimensions still doesn’t strike me as an easy problem, but it makes sense to me that downstream molecular biology can verify putative associations. Thank you for weighing in.


There are a lot of efforts in developing models that understand causal relationships within mainstream machine learning community. Mostly to train models that don't require a lot of training examples. Deep learning usually requires a lot of data and trained models are not easily transferable to other tasks. Yet humans tend to transfer their knowledge from other tasks pretty easily to seemingly unrelated tasks. This seems to be due to our mental models surrounding causal relationships. One example of such efforts is schema networks. It is a model-based approach to RL that exhibits some of the strong generalization abilities that can be key to human-like general intelligence. https://www.vicarious.com/2017/08/07/general-game-playing-wi...


I can't help but see this as another example of the pattern in which a big name in a field gets up and says that the current direction of their field (deep learning) is great and all but not really making progress on the big question (intelligence), and that to solve that question we need to solve another big question (what is causality) before we can make true progress. Other examples to me are Chomsky on consciousness and its implications for language, Einstein on causality w.r.t. quantum theory. This isnt to say the big name is wrong, just to point out a potential pattern.


Just want to point out that in general it's a lot harder to recognize the correct direction, than to make a progress in a direction.


Pearl's words from the Introduction of "BAYESIANISM AND CAUSALITY, OR, WHY I AM ONLY A HALF-BAYESIAN":

"I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases.

Thirty years later, I am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false."


For (ii) what do you use instead of probabilities? And for (iii) what changed for you to think this doesn't improve over time?


Subjective probabilities are based on the model. Increasing observations won't help if you have the wrong model to begin with. So we need causal methods to ask if the model is correct. We also need methods to propose new models or rebuild if it is wrong as well.


For (ii) I'm guessing Pearl's do-calculus, and I agree with sjg007 regarding (iii).


OP was quoting.


In history we call this determinism. The more you know about a historic choice and the complex mechanisms around it, the more it makes perfect sense while at the same time leaving you absolutely clueless about the why.

Christianity being chosen by the Roman Empire is the typical example. To most people the choice makes perfect sense, because we look back at what it brought with it. But when you put yourself in the heads of the decision makers and look at all the options they had, well, it makes no sense at all.

A lot of machine learning tells us trends, but it tells us nothing about the why, and I completely agree with the article about how useless that data is. I mean, it’s probably great at harmless things, but when my elaborate online profile still can’t figure out why I happen to read a cultural, artsy but somewhat conservative news paper, despite the fact that my data shows the algorithm that I really really shouldn’t be doing that, well, then we simply can’t use ML for any form of decision making or even as an advisor. At least not in the public sector.


Yeah, I think it's worth also asking whether humans /actually/ are any good at answering the 'why' with anything but bullshit. I would argue that we're pretty good at understanding causality in very limited circumstances (the window broke because that kid threw the ball), and extremely overconfident in our ability to understand causation in a much broader range of circumstances (the stock price went up because...). This overconfidence drives a lot of the decisions we make, for better or for worse.

It's an area where if we push hard on AI, we'll likely have to come to terms with how bad we are in this area, and ask ourselves whether we feel comfortable deploying 'thinking machines' with similar levels of incompetence and/or arrogance.


I bought Judea Pearl’s new book The Book of Why last night after reading this article. So far I love the book. I manage a machine learning team at work and I appreciate Pearl’s discussion of how deep learning and statistics won’t lead to strong AI.


When I saw another one of these publicity articles, I basically ran to buy the book. It's really nice to have a book that will help me get the intuition and history of causal modeling rather than just giving theorems about graph structure under intervention.


I agree, it is nice to have a very clear high level approach to causal reasoning. I find his other books to be ‘slow going’ so I hope that after reading the Why book, I will have an easier time absorbing his earlier work.


I just started reading The Book of Why too, so far so good. I pre-ordered it once I found out it was coming. I've been telling people my view of it is it's like the primer to the primer (Causal Inference in Statistics: A Primer) to a subject-introduction paper (An Introduction to Causal Inference) to the OG math book (Causality). I'm hoping to eventually get back to the nice Causality hardcover I've had on my shelf for too long.


For me, AI is now in the same category as religion; I don't talk about it because nothing good ever comes out of these type of "discussions."

We should be more open minded and humble about how to approach this problem but almost everyone seems to have a strong opinion about it creating a very low signal to noise ratio.


I don't think you have to go as far as comparing it to religion. Diagnosing the field as being (currently) overhyped should be enough to explain your observations. However, Kissinger's article [1] raises worthwhile and important questions that really should be discussed broadly.

[1] https://www.theatlantic.com/magazine/archive/2018/06/henry-k...


The core issue is trust, explanation is one part of trust, but there are deeper issues. After all, if I explain that I have made this clinical decision because it results in lower mortality and you point out that the mortality statistic is to shit, and then I point out that we can't do the experiment required to work out the mortality statistic properly because that would mean potentially killing children... we have an issue.

We trust doctors and pilots, they offer partial explanations that we can somewhat understand, but they are backed by experience and qualification. Their perspective is informed by science - some good, some bad. Most of us don't think about that.

We have a perspective based on our cultural and social background, the machine must understand this and provide alternative explanations to suit us.

I have written a long article on this all, but I can't finish the game theory off!


Sounds like an interesting article


Yeah, if only I could sort out the sums...

But then.. I guess that's the point!


I don't necessarily agree with the assertion there has been no progress on algorithms that can propose experiments. Isn't this exactly what Bayesian Optimization with regret minimization is all about? Also seems strange to say that AI is just curve fitting in a pejorative sense. Isn't that all of science? Curve fitting is hard!


>Isn't [curve fitting] all of science?

No! Astronomical observations can be shoehorned into geocentrism by adding more and more epicycles. That's curve fitting. At some point you have to realize the Earth revolves around the sun. Currently ML is on a dangerous path because any disagreement with empirical evidence can just be waved away with more data, more computation power, etc. In that sense, it's practically unfalsifiable.

http://wiki.c2.com/?AddingEpicycles


Only if you don't have a complexity penalty in your fit. This is a model selection problem and you should have a prior on the structure of the model that leads to something like the BIC. Curve fitting is hard. The elimination of epicycles was due to Tyco Brahe collecting more data and a lower complexity model being proposed to explain the data.


If someone can ELI5 to me what is Pearl's do-calculus, that'd be quite great.

I have tried to build an understanding of it since he got the Turing prize, but have failed so far.


Rain causes people to carry umbrellas. Rain causes puddles. Probabilistically, there's a strong connection between me carrying an umbrella and you seeing puddles (assuming we're both in SF or whatever).

To be a bayesian, you could model this as a conditional probability: p(you see a puddle | I carry an umbrella). It will look like a strong connection, but it isn't a causal one. If it were, then I could stop carrying an umbrella and clear away the puddles. That intervention of me changing my umbrella carrying behavior is what causation is all about. If we change this, does that change?

So then you talk about the probability of you seeing a puddle given some intervention that forces me to carry my umbrella regardless of anything else. We see that if you force me, independent of rain, to carry an umbrella or not to, then the connection between the umbrella and puddles is gone. p(puddles | do(umbrella)) != p(puddles | umbrella). do(X) means to take an intervention and force X regardless of other things.

As contrast, you can talk about the connection between rain and and puddles. If there were some hypothetical weather machine where we could force rain or sunshine, then you'd see that intervening and forcing rain (a.k.a. do(rain)) still keeps the relationship with puddles. p(puddles | do(rain)) still shows a connection. That is a causal connection.

It's all about counterfactual "what if I changed X?" questions. Using that idea, you can get all sorts of cool theory.


Thank you!


It's a way to take interventions, for example what would happen if we can cigarettes, and model it with observational data. The do-calculus is just a couple of rules which say when we can do this. It essentially boils down to a couple of substitution rules.


Some readers might enjoy looking at code -- I came across this repo:

https://github.com/akelleh/causality


>The language of algebra is symmetric: If x tells us about y, then y tells us about x. I’m talking about deterministic relationships. There’s no way to write in mathematics a simple fact—for example, that the upcoming storm causes the barometer to go down, and not the other way around.

Is this true? It kind of blows my mind if it is.


He's written an entire book [0] on how to add notions of causality to the existing algebraic framework for expressing correlation. He alludes to the core idea in the next part of the interview:

"Mathematics has not developed the asymmetric language required to capture our understanding that if x causes y that does not mean that y causes x. It sounds like a terrible thing to say against science, I know. If I were to say it to my mother, she’d slap me."

His home page [1] links to several presentations (e.g. [2]) where he lays out the key ideas. [0] "Causality: Models, Reasoning and Inference", Pearl

[1] http://bayes.cs.ucla.edu/jp_home.html

[2] http://bayes.cs.ucla.edu/IJCAI99/ijcai-99.pdf


What about the various concurrency algebras that basically have some composable notion of "happened before"? Aren't these developments at least 30 years old?


As a former mathematican, I was at first a little offended and dismissive of his claim. But, perhaps what one can say is that mathematicians don't seem to distinguish "causation" with "implication". After all, if the barometer goes down, that does imply a storm is coming (perhaps with some increased probability), but it still doesn't cause the storm to come (even with increased probability).

In a simplified closed system, where all you have are barometers and storms, maybe there is no difference between implication and causation; all you know is these variables are correlated. Perhaps once you take every atom in the universe into account, the two start to look the same.


That can't be right, because you can take the barometers out of the closed system, and it will still storm. Correlation isn't causation, and for good reasons.


> In a simplified closed system, where all you have are barometers and storms, maybe there is no difference between implication and causation; all you know is these variables are correlated. Perhaps once you take every atom in the universe into account, the two start to look the same.

Implication it is a very mathematical thing. It is like you know, that y=f(x), and then you write x=g(x), where g is inverse of f. It works both ways, there are no cause, no effect, just link between two variables. If you use math to reason about causal links in reality, you need to use some implicit knowledge which is not represented in formula. It doesn't means that math is bad. Geometry likes euclidian space while we know from Einstein that our space is not euclidian one -- it doesn't mean that geometry is bad. Euclidian geometry just solves some specific problems and doesn't solve others.

Causal link reflects ability to change dependant variable by changing independant one. It is not something like "fundamental property of the Universe", it is our subjective way to structure information about the reality. It seems to me, that physicists believe the other way, that causation is the inherent property of reality. Maybe they are right in their field, but it doesn't work in everyday life. Causal link is an abstraction that helps us to know what we can do to change outcomes.

In this sense there are no causal link between barometer readings and a storm: if you change barometer readings to reflect a fine weather the storm will come anyway. Maybe there is causal link between atmosphetic pressure and a storm? I do not know it, because I see no way to change atmospheric pressure and I'm not educated well enough to understand scientific weather models. Though it is relatively safe for me to believe that low atmospheric pressure causes storm: causational link or correlational one -- it will not change my behaviour, because I cannot change atmospheric pressure. If I'll find a way, than it would be cruicial to figure out the kind of the link, because I'll be able to break something if I'm wrong. But I said that it is relatively safe, because if I suppose that link is causational, I would use that link differently while reasoning about the weather, it will change my other beliefs and probably it will change my behaviour somehow.

So, the main idea is: causation is just our way to structure reality. We are free to choose which links are causal, it is all up to us. If we think it will help us to reason about reality, then we should speak about casuality. And the most important difference between causation and correlation is the ability to change dependant variable by changing independant. If we can change dependant variable that way, than we should mark link as causational. If we cannot, that we should think about that link as about correlational. Implication just do not draw this difference.


It's a very interestiny topic. Passive observation can never determine causality ... But active experimentation with control variables can! This is super important, you need an active agent running experiments to narrow the causal structure e.g. a robot. More data collection doesn't help without a feedback loop


Robot, yes, or a sim. Simulators are like dynamic datasets. The "feedback loop" is also called the scientific method. Propose hypothesis, experiment, refine hypothesis.

People needed thousands of years to put together a few causal concepts about the world. AI would need its playtime too. It's not like a single person can come up with a causal model of the world by himself/herself. Just keep in mind that when comparing human intelligence with AI, so as not to ask of AI what no human can do.


There is one catch, if you have written (or have access to the internals of) a simulator that is faithful to what you want to simulate you already have the cause effect things worked out.


Not necessarily. For Go it is easy to simulate, hard to win. Many other things are easy to simulate at low level but have complex higher order behaviour (such as AutoML and ML graph compilation for GPU). A learning approach based on simulation is capable of finding super-human solutions, given enough compute.


Oh yes ! You are right. I was not thinking of artificial systems with well defined dynamics.


This is not true! There are techniques to determine causality passively. Here is one such example, which takes advantage of the behavior of noise to choose between "temperature causes altitude" and "altitude causes temperature": [PDF] http://papers.nips.cc/paper/3548-nonlinear-causal-discovery-...


I think that paper is using a heuristic that, for some problems, is a good guess which direction the causal direciton is.

Use your common sense though. Do you think we would have built the modern world (e.g. electricity) by passive observation of the world, or does really getting to the truth require controlled experiments?


All experiments require repeat measurements for confidence. None of this means that passive methods are impossible, which is what you were saying earlier.


That paper had no proof attached, because it's a heuristic and can easily be fooled by a bad setting. So maybe some causal relationship that are obvious can be learnt passively. But what I am saying is entirely consistent with what Pearl is saying in the article. To get to the next level of ai you need an experimenting agent. Passive methods in general are impossible for untangling causal relationship.


Seems like it's a fallacy of limiting oneself to stationary reasoning. Differential equations and difference equations that capture temporal and causual behaviors are a thing. A large part of control theory is model identification, which tries to fit a temporal model to data.

In a difference equation x(t+1) = f(x(t),u(t)), u causes changes to x and not the other way around


If we don't know that the true equation is x(t+1) = f(x(t),u(t)) but have to infer it from data, then observing that such an equation seems to hold true for some particular u and x does not mean that u causes changes to x.

It may simply be a predictor of x, as in the 'barometer readings may precede and imply a storm but not cause one' example given by others above.


Differential equations say nothing about causality they just describe what happens. Please remember that Pearl literally wrote the book on this topic before assuming he missed an elementary counter example.


I feel like reversiblity is a big complication here. Very interesting and subtle area.


I don't see the relevance. You can have (non-differential )algebraic equations that involve a "time" parameter. They can be non-reversible in the sense that I think you mean - that knowing a point "back" in time tells you values in the future but knowing a value in the future does not (uniquely) determine the solution before that point. [For example x^2 = t^2 for t <= 0 and x^2 = 0 for t >= 0. There are two continuous solutions x(t) which are equal at say t=1.]

For differential or non-differential equations, they're still just describing how something is and not the causality behind it. It's always possible that the equation is merely a result of hidden variables and there is no casual relationship between any two points of the solution in any meaningful sense.


I meant in how people talk about "causality", it is always poorly defined. It seems like there is a implicit assumption about irreversible systems at play.


Did you mean that, under reversibility + determinism, the future “causes” the past just as much as the past “causes” the future?


For some closed physical systems this is pretty much true, but I think the way people think about the world is at odds with this. That is why we find things like this so surprising https://youtu.be/p08_KlTKP50

It gets even murkier when you think of statements like "Hitler coming into power caused world war 2". There are so many things going on in that system that it couldn't possibly be true (e.g. if you change Hitler out for another person maybe world war 2 still happens), but works as a plausible line of causal reasoning for a lot of people.


I think there was some inevitable loss of nuance as that got translated into layman's terms.

For all practical purposes, you can write causality like

p(x|y) = 1 and time(y) < time(x)

I.e. causality is just when one event always happens after another event. Any additional requirements for causality are basically philosophy.

But typical ML systems don't construct networks of causal relations, is basically what he's getting at from my reading of it


> For all practical purposes, you can write causality like

> p(x|y) = 1 and time(y) < time(x)

This isn't true at all. For a counterexample, x and y may both have a common cause.

Pearl's work is on this and extends the language to talk about p(y | do(x)), meaning that you talk about what happens to y when you take some hypothetical intervention to change x. Causation framed in terms of intervention talks about "what if it had been this instead of that?" and is probably the most common model of causation.

For more info look up the rubin causal model, the potential outcomes framework, and pearls's "do" notation.


What if you replace it with

p(x|y) = 1, p(x|!y) = 0, and time(y) < time(x)

That rules out the rooster counter-example. If y is a boolean, I guess the only thing you can "do" to it is negate it.


This is called Granger-causality (and work on it led to a Nobel prize, so it's important and useful)... it's stronger than just correlation, and way easier to determine than true causation, but it's possible that z causes both x and y, and z's effect on x is just more delayed than its effect on y.

But it at least rules out x causing y, which is something.


> but it's possible that z causes both x and y, and z's effect on x is just more delayed than its effect on y.

This is in fact the case with the barometer falling before a storm. Both the falling barometer and the subsequent rain and wind of a storm are consequences of an uneven distribution of heat and moisture in the atmosphere approaching equilibrium under the constraints of Earth's gravity and Coriolis force.


Still doesn't work. Suppose I flip a coin and write the result in two places. I write it on sheet y then sheet x. We have that X == Y, so p(x|y) = 1, p(x|!y) = 0, and time(y) < time(x), but neither causes the other. I can write more later if you have interest, but I gotta run.


p(sun rises|rooster crows)=1

Is the rooster causing the dawn?


I think you'd have a fair argument if you considered a complement, i.e. P(sun rises|no rooster crowing) = 0.


Except if the rooster is dead or you put the rooster in a dark room.


p(sun rises|rooster crows)=1 remains valid when “rooster crows” is false.


I might be wrong but x -> y (read as x implies y) is something taught in discrete mathematics classes. I'm not too sure what he means by mathematics does not have notation for causality. https://en.wikipedia.org/wiki/Material_conditional


A barometer changing implies the storm is coming. It doesn’t however causes the storm to come.


Can't this be modeled with baysian math? The chance that the storm causes the pressure to drop is .99. there could be some slight uncertainty included.


No because conditioning is fundamentally a 'selection' operation. It selects/filters the full data on, say, a predicate (say gender==male) and analyses that subset. This is fundamentally different from an intervention where you turn someone into 'male' who wasn't male to begin with. Those who were turned male might behave differently from those who were male without any intervention -- the sub-population you selected for defining conditional probabilities.

These two scenarios have the potential to exhibit different behavior -- probabilistic models without a notion of intervention or counterfactuals will only capture the former. But just like this 'selection' operator you c an define an analogous operator for -- selecting on those that you interved on -- then you are in the realms of causality.


he's ignoring expressions. it's not true.


It kinda sounds like, despite his accolades, his experience is limited to a strict set of mathematics. For example, logical implication is what he suggests doesn't exist, but it's over in the "logic" category, not pure mathematics like algebra. Association rule mining [0] is a whole category about finding such asymmetric implications.

Also:

> But I’m asking about the future—what next? Can you have a robot scientist that would plan an experiment and find new answers to pending scientific questions? That’s the next step.

Nearly 10 years ago [1], a robot called Adam both made and tested hypotheses about yeast. Certainly not a general AI, or even an award-winning massive breakthrough, but it's a good step in a direction that he doesn't think exists yet.

---

[0] https://en.wikipedia.org/wiki/Association_rule_learning

[1] http://www.sciencemag.org/news/2009/04/robotic-scientists-ma...


Chris Manning gave a really interesting technical yet pretty approachable talk on machine reasoning at ICLR this year. If you're interested in this topic, i recommend watching!

https://www.youtube.com/watch?v=jpNLp9SnTF8


You dont need to go that far. If you want to replace an algorithm in a regulated industry like insurance (say, premium calculation based on risk score), you need to show the audit trail of how you arrived at the result. You cant have a probabilistic model that builds bias without any explanation.

Only a few ML algorithms like decision tree can show the causal relationship today. It is very hard to that in neural network with multiple layers.


Isn't there a lot going on already in insurance based on algorithms - and no explanation? I'm looking at scoring/rating... People getting a lower credit rating for living in the wrong neighborhood and other things, well that would at least be an identifiable reason, but I think that you cannot find out how exactly they arrived at your personal score? How transparent are banks and insurances for consumers, today?

Also on the HN frontpage right now is a link to a Guardian article "how to disappear from the internet", and the top comment in the forum there about his difficulties to deal with the results of identity theft, credit card debt, also shows a complete lack of transparency.


The lack of transparency is for a good reason: if a model's parameters become known, they can be gamed and lose their predictive power.

Not only do banks etc keep their models secret from customers, they keep them secret from other departments. The credit risk strategy team, for instance, won't want to risk customer service staff 'helping' customers alter their application details to get their scores over a cut-off.

(I used to run credit risk strategy, fraud, collections, operations etc for two credit card companies)


Giving customers of banks transparency of the models as a third party could be a good application of causal reasoning. Each individual customer of a bank only has their own parameters, and a yes / no output from the bank.

A third party designed to help customers get approved could aggregate data across multiple customers, generate hypotheses of what changes would artificially lower the bank's perceived risk for a customer (which would also require it understand what sort of changes customers can make easily), and test those hypotheses to refine a model.

It could optimise for revenue, paying customers for information, and receiving income if it succeeds in getting them approved.


It may not be transparent to the consumers. But, these algorithms are designed by actuaries and everyone in the company understands how it works. You need to have an audit trail of how the premium is calculated.

But in deep learning algorithms, features and co-efficients are not determined by humans. In most cases, it cant even be understood by humans. Without this understanding, I highly doubt if they will be accepted in regulatory industries.


The problem is that deep learning is not powered by an intelligent system able to suggest a way to control and filter input data. Today humans are the ones that filter and decide which loss function to use, and in order to achieve GAI deep learning must be provided with a meta-deep learning framework able to filter and adjust the loss function. Perhaps a feed-back network trying to evolve from a small model to a more general one using some kind of generative grammar for filtering and controling the input. A transfer knowledge graph powered by deep learning that select a generative grammar for designing the most valuable filtering and objective function to produce a deep learning system that learn by itself.


Honest question: If two models are each trained with a single training set, tested with a single test set, and used only to predict a single stream of data and predict it approximately equally as well, is there any formal way to say one model is engaging in "casual reasoning" and another isn't?


causal, casual, same thing


Once ML models are routinely provided time-based features; particularly deltas between events and the 3rd derivative rate-of-change; they are going to identify some amazing causal relationships that humans are incapable of today.


Why 3rd derivative? Do you think that because we don't typically work with 3rd derivatives to explain physical phenomenon that there is a lot we are missing? Or are there particular scenarios where you see 3rd derivative as being critical to causality? 1st derivative = direct causality, 2nd derivative = ability to affect change, 3rd derivative = ?? tertiary causes.

Wait. Doesn't a derivative mathematically define a causal relationship?

EDIT: nevermind re: derivative = causal ... that's just a correlation relationship. dx/dt. Still I'm curious as to what is special about the 3d derivative (besides jerk).


3rd derivative is a simple filter on acceleration. Obvious feature in something like fault prediction on heavy machinery.

What you really want, though are the ability to decouple and infer relationships between short and long term features (something like the cepstrum transform from speech analysis).


Aren’t we all?




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: