
How Judea Pearl Became One of AI's Sharpest Critics - stevenwoo
https://www.theatlantic.com/technology/archive/2018/05/machine-learning-is-stuck-on-asking-why/560675/?single_page=true
======
Barrin92
I really like that people asking fundamental questions about the way AI can be
pushed forward beyond just ML seems to gain a little bit of traction again, at
least in the public eye. Even Kissinger opined on the topic just a few days
ago

[https://www.theatlantic.com/magazine/archive/2018/06/henry-k...](https://www.theatlantic.com/magazine/archive/2018/06/henry-
kissinger-ai-could-mean-the-end-of-human-history/559124/)

I agree with Pearl that there's something deeply misguided about thinking that
intelligence truly can be solved as a data optimization problem. I'd be happy
if we'd see more research about AI at a symbolic level again.

The biggest hindrance seems to be funding. ML is really successful in a lot of
commercial domains, general AI is a big moonshot. With most researchers moving
from long term university positions to the business sector I'm concerned about
this sort of research, not just in computer science.

~~~
quotemstr
Nature herself "solved" intelligence as a data optimization problem. We can
too.

~~~
Barrin92
There's problems with that attitude. First, it offers no scientific insight.
We don't learn anything about the fundamental nature and skeleton of minds by
simply replicating evolutionary processes. It does not grant us knowledge.

The equivalent in engineering would be to throw over trees and rocks in the
hope of building a bridge. Clearly, that is unsatisfactory, we strive to
understand the meaning of systems so that we can reason about them and alter
them in predictable and fundamental ways.

Secondly, we don't know how likely it is that evolution produces intelligence.
Maybe we're the only intelligent spot in the universe and it's an aberration.
It took 4 billion years as well.

That seems to be a fundamentally impoverished way to go about things. We
shouldn't forego the ability to understand minds at a deep level just because
we have made practical strides in closed domains. That would be to mistake a
trojan horse for an actual horse.

~~~
gerbilly
>The equivalent in engineering would be to throw over trees and rocks in the
hope of building a bridge.

Imagine if aliens dropped a machine learning computer on earth in the 17th
century.

Maybe we'd have never bothered to derive the laws of classical mechanics.

~~~
taeric
I don't follow. Classical mechanics are still very useful, no?

~~~
gerbilly
Yes, but we might have never created metal models and theories if we had a
magical machine that could predict the outcomes of physical systems.

The machines might have replaced classical mechanics, but the downside is that
the machine would be a black box, and we would never really understood how it
derived its results.

~~~
taeric
Fair. That assumes we wouldn't try and understand the black box. I suspect
somebody would have, but I have nothing to base that suspicion.

------
ml_thoughts
Worth noting that there has been a fair bit of good research in causal machine
learning in the last year or so, for example "Implicit Causal Models for
Genome-wide Association Studies"
([https://arxiv.org/pdf/1710.10742.pdf](https://arxiv.org/pdf/1710.10742.pdf)).

The key point of this paper is that neural networks really are very good at
"curve fitting" and that this curve fitting in the context of variational
inference has advantages for causal reasoning, too.

Neural networks can be used in a variety of structures, and these structures
tend to benefit from the inclusion of powerful trainable non-linear function
approximators. In this sense, deep learning will continue to be a powerful
tool despite some limitations in its current use.

I think Pearl, who's obviously remained very influential for many
practitioners of machine learning, knows the value of "curve fitting". However
I think it's a bit hard for a brief interview to sit down and have a real
conversation about the state of the art of an academic field and the "Deep
Learning is Broken" angle is a bit more attractive.

~~~
stochastic_monk
It's worth considering that anywhere in graphical models where coefficients of
any sort learned can be augmented by neural networks (such as in the last
decade of natural language processing, where the SOTA of almost all problems
has been successfully neuralized).

I wonder if Deep Belief Machines and their flavor of generative models, which
seem closer in nature to Pearl's PGMs, have a chance to bridge the gap
involved.

Edit, as an aside: Given the enormously high dimensionality of personal
genomes and the incredibly small sample size, for over a decade I've failed to
put any trust in GWAS studies and found my suspicion supported on a number of
occasions, considering difficulty in reproducibility likely brought about by
the above problem. Is there any reason to think that improved statistical
methods can possibly surmount the fundamental problem of limited sample size
and high dimensionality?

~~~
et2o
Numerous important biomedical findings have resulted from GWAS. Most GWAS
today are inherently reproducible since their hits usually come from multi-
stage designs with independent samples. Sample sizes are no longer "incredibly
small" either; large GWAS often have in the order of 100s of 1000s of
patients. Some have over a million.

I suppose the most important idea is that GWAS aren't really supposed to show
causality. "Association" is in the name. GWAS are usually hypothesis
generating (e.g., identification of associated variants) and then identified
variants can be probed experimentally with all of the tools of molecular
biology.

In summary, GWAS have their problems, but I think your statement is a bit too
strong.

~~~
samwalrus
Mendelian randomization is a good technique to start thinking about causality
for epidemiological studies.

This is a good paper that demonstrates the approach:
[https://www.nature.com/articles/srep16645](https://www.nature.com/articles/srep16645)
Millard, Louise AC, et al. "MR-PheWAS: hypothesis prioritization among
potential causal effects of body mass index on many outcomes, using Mendelian
randomization." Scientific reports 5 (2015): 16645.

------
hackguru
There are a lot of efforts in developing models that understand causal
relationships within mainstream machine learning community. Mostly to train
models that don't require a lot of training examples. Deep learning usually
requires a lot of data and trained models are not easily transferable to other
tasks. Yet humans tend to transfer their knowledge from other tasks pretty
easily to seemingly unrelated tasks. This seems to be due to our mental models
surrounding causal relationships. One example of such efforts is schema
networks. It is a model-based approach to RL that exhibits some of the strong
generalization abilities that can be key to human-like general intelligence.
[https://www.vicarious.com/2017/08/07/general-game-playing-
wi...](https://www.vicarious.com/2017/08/07/general-game-playing-with-schema-
networks/)

------
323454
I can't help but see this as another example of the pattern in which a big
name in a field gets up and says that the current direction of their field
(deep learning) is great and all but not really making progress on the big
question (intelligence), and that to solve that question we need to solve
another big question (what is causality) before we can make true progress.
Other examples to me are Chomsky on consciousness and its implications for
language, Einstein on causality w.r.t. quantum theory. This isnt to say the
big name is wrong, just to point out a potential pattern.

~~~
p1esk
Just want to point out that in general it's a lot harder to recognize the
correct direction, than to make a progress in a direction.

------
eksemplar
In history we call this determinism. The more you know about a historic choice
and the complex mechanisms around it, the more it makes perfect sense while at
the same time leaving you absolutely clueless about the why.

Christianity being chosen by the Roman Empire is the typical example. To most
people the choice makes perfect sense, because we look back at what it brought
with it. But when you put yourself in the heads of the decision makers and
look at all the options they had, well, it makes no sense at all.

A lot of machine learning tells us trends, but it tells us nothing about the
why, and I completely agree with the article about how useless that data is. I
mean, it’s probably great at harmless things, but when my elaborate online
profile still can’t figure out why I happen to read a cultural, artsy but
somewhat conservative news paper, despite the fact that my data shows the
algorithm that I really really shouldn’t be doing that, well, then we simply
can’t use ML for any form of decision making or even as an advisor. At least
not in the public sector.

~~~
sdenton4
Yeah, I think it's worth also asking whether humans /actually/ are any good at
answering the 'why' with anything but bullshit. I would argue that we're
pretty good at understanding causality in very limited circumstances (the
window broke because that kid threw the ball), and extremely overconfident in
our ability to understand causation in a much broader range of circumstances
(the stock price went up because...). This overconfidence drives a lot of the
decisions we make, for better or for worse.

It's an area where if we push hard on AI, we'll likely have to come to terms
with how bad we are in this area, and ask ourselves whether we feel
comfortable deploying 'thinking machines' with similar levels of incompetence
and/or arrogance.

------
joaorico
Pearl's words from the Introduction of "BAYESIANISM AND CAUSALITY, OR, WHY I
AM ONLY A HALF-BAYESIAN":

"I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The
Foundations of Statistical Inference [Savage, 1962]. The arguments were
unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural
and useful to cast what we know in the language of probabilities, and (iii) If
our subjective probabilities are erroneous, their impact will get washed out
in due time, as the number of observations increases.

Thirty years later, I am still a devout Bayesian in the sense of (i), but I
now doubt the wisdom of (ii) and I know that, in general, (iii) is false."

~~~
evanmoran
For (ii) what do you use instead of probabilities? And for (iii) what changed
for you to think this doesn't improve over time?

~~~
sjg007
Subjective probabilities are based on the model. Increasing observations won't
help if you have the wrong model to begin with. So we need causal methods to
ask if the model is correct. We also need methods to propose new models or
rebuild if it is wrong as well.

------
mark_l_watson
I bought Judea Pearl’s new book The Book of Why last night after reading this
article. So far I love the book. I manage a machine learning team at work and
I appreciate Pearl’s discussion of how deep learning and statistics won’t lead
to strong AI.

~~~
eli_gottlieb
When I saw another one of these publicity articles, I basically ran to buy the
book. It's really nice to have a book that will help me get the intuition and
history of causal modeling rather than just giving theorems about graph
structure under intervention.

~~~
mark_l_watson
I agree, it is nice to have a very clear high level approach to causal
reasoning. I find his other books to be ‘slow going’ so I hope that after
reading the Why book, I will have an easier time absorbing his earlier work.

~~~
Jach
I just started reading _The Book of Why_ too, so far so good. I pre-ordered it
once I found out it was coming. I've been telling people my view of it is it's
like the primer to the primer ( _Causal Inference in Statistics: A Primer_ )
to a subject-introduction paper ( _An Introduction to Causal Inference_ ) to
the OG math book ( _Causality_ ). I'm hoping to eventually get back to the
nice _Causality_ hardcover I've had on my shelf for too long.

------
sgt101
The core issue is trust, explanation is one part of trust, but there are
deeper issues. After all, if I explain that I have made this clinical decision
because it results in lower mortality and you point out that the mortality
statistic is to shit, and then I point out that we can't do the experiment
required to work out the mortality statistic properly because that would mean
potentially killing children... we have an issue.

We trust doctors and pilots, they offer partial explanations that we can
somewhat understand, but they are backed by experience and qualification.
Their perspective is informed by science - some good, some bad. Most of us
don't think about that.

We have a perspective based on our cultural and social background, the machine
must understand this and provide alternative explanations to suit us.

I have written a long article on this all, but I can't finish the game theory
off!

~~~
udfalkso
Sounds like an interesting article

~~~
sgt101
Yeah, if only I could sort out the sums...

But then.. I guess that's the point!

------
shahbaby
For me, AI is now in the same category as religion; I don't talk about it
because nothing good ever comes out of these type of "discussions."

We should be more open minded and humble about how to approach this problem
but almost everyone seems to have a strong opinion about it creating a very
low signal to noise ratio.

~~~
tempodox
I don't think you have to go as far as comparing it to religion. Diagnosing
the field as being (currently) overhyped should be enough to explain your
observations. However, Kissinger's article [1] raises worthwhile and important
questions that really should be discussed broadly.

[1]
[https://www.theatlantic.com/magazine/archive/2018/06/henry-k...](https://www.theatlantic.com/magazine/archive/2018/06/henry-
kissinger-ai-could-mean-the-end-of-human-history/559124/)

------
salty_biscuits
I don't necessarily agree with the assertion there has been no progress on
algorithms that can propose experiments. Isn't this exactly what Bayesian
Optimization with regret minimization is all about? Also seems strange to say
that AI is just curve fitting in a pejorative sense. Isn't that all of
science? Curve fitting is hard!

~~~
xamuel
>Isn't [curve fitting] all of science?

No! Astronomical observations can be shoehorned into geocentrism by adding
more and more epicycles. That's curve fitting. At some point you have to
realize the Earth revolves around the sun. Currently ML is on a dangerous path
because any disagreement with empirical evidence can just be waved away with
more data, more computation power, etc. In that sense, it's practically
unfalsifiable.

[http://wiki.c2.com/?AddingEpicycles](http://wiki.c2.com/?AddingEpicycles)

~~~
salty_biscuits
Only if you don't have a complexity penalty in your fit. This is a model
selection problem and you should have a prior on the structure of the model
that leads to something like the BIC. Curve fitting is hard. The elimination
of epicycles was due to Tyco Brahe collecting more data and a lower complexity
model being proposed to explain the data.

------
BenoitP
If someone can ELI5 to me what is Pearl's do-calculus, that'd be quite great.

I have tried to build an understanding of it since he got the Turing prize,
but have failed so far.

~~~
imh
Rain causes people to carry umbrellas. Rain causes puddles. Probabilistically,
there's a strong connection between me carrying an umbrella and you seeing
puddles (assuming we're both in SF or whatever).

To be a bayesian, you could model this as a conditional probability: p(you see
a puddle | I carry an umbrella). It will look like a strong connection, but it
isn't a causal one. If it were, then I could stop carrying an umbrella and
clear away the puddles. That intervention of me changing my umbrella carrying
behavior is what causation is all about. If we change this, does that change?

So then you talk about the probability of you seeing a puddle given some
intervention that forces me to carry my umbrella regardless of anything else.
We see that if you force me, independent of rain, to carry an umbrella or not
to, then the connection between the umbrella and puddles is gone. p(puddles |
do(umbrella)) != p(puddles | umbrella). do(X) means to take an intervention
and force X regardless of other things.

As contrast, you can talk about the connection between rain and and puddles.
If there were some hypothetical weather machine where we could force rain or
sunshine, then you'd see that intervening and forcing rain (a.k.a. do(rain))
still keeps the relationship with puddles. p(puddles | do(rain)) still shows a
connection. That is a causal connection.

It's all about counterfactual "what if I changed X?" questions. Using that
idea, you can get all sorts of cool theory.

~~~
BenoitP
Thank you!

------
QueensGambit
You dont need to go that far. If you want to replace an algorithm in a
regulated industry like insurance (say, premium calculation based on risk
score), you need to show the audit trail of how you arrived at the result. You
cant have a probabilistic model that builds bias without any explanation.

Only a few ML algorithms like decision tree can show the causal relationship
today. It is very hard to that in neural network with multiple layers.

~~~
ItsMe000001
Isn't there a lot going on already in insurance based on algorithms - and no
explanation? I'm looking at scoring/rating... People getting a lower credit
rating for living in the wrong neighborhood and other things, well that would
at least be an identifiable reason, but I think that you cannot find out _how_
exactly they arrived at your personal score? How transparent are banks and
insurances for consumers, today?

Also on the HN frontpage right now is a link to a Guardian article "how to
disappear from the internet", and the top comment in the forum there about his
difficulties to deal with the results of identity theft, credit card debt,
also shows a complete lack of transparency.

~~~
stevesimmons
The lack of transparency is for a good reason: if a model's parameters become
known, they can be gamed and lose their predictive power.

Not only do banks etc keep their models secret from customers, they keep them
secret from other departments. The credit risk strategy team, for instance,
won't want to risk customer service staff 'helping' customers alter their
application details to get their scores over a cut-off.

(I used to run credit risk strategy, fraud, collections, operations etc for
two credit card companies)

~~~
A1kmm
Giving customers of banks transparency of the models as a third party could be
a good application of causal reasoning. Each individual customer of a bank
only has their own parameters, and a yes / no output from the bank.

A third party designed to help customers get approved could aggregate data
across multiple customers, generate hypotheses of what changes would
artificially lower the bank's perceived risk for a customer (which would also
require it understand what sort of changes customers can make easily), and
test those hypotheses to refine a model.

It could optimise for revenue, paying customers for information, and receiving
income if it succeeds in getting them approved.

------
daenz
>The language of algebra is symmetric: If x tells us about y, then y tells us
about x. I’m talking about deterministic relationships. There’s no way to
write in mathematics a simple fact—for example, that the upcoming storm causes
the barometer to go down, and not the other way around.

Is this true? It kind of blows my mind if it is.

~~~
qmalzp
As a former mathematican, I was at first a little offended and dismissive of
his claim. But, perhaps what one can say is that mathematicians don't seem to
distinguish "causation" with "implication". After all, if the barometer goes
down, that does imply a storm is coming (perhaps with some increased
probability), but it still doesn't cause the storm to come (even with
increased probability).

In a simplified closed system, where all you have are barometers and storms,
maybe there is no difference between implication and causation; all you know
is these variables are correlated. Perhaps once you take every atom in the
universe into account, the two start to look the same.

~~~
goatlover
That can't be right, because you can take the barometers out of the closed
system, and it will still storm. Correlation isn't causation, and for good
reasons.

------
fizx
Chris Manning gave a really interesting technical yet pretty approachable talk
on machine reasoning at ICLR this year. If you're interested in this topic, i
recommend watching!

[https://www.youtube.com/watch?v=jpNLp9SnTF8](https://www.youtube.com/watch?v=jpNLp9SnTF8)

------
dontwantai
The problem is that deep learning is not powered by an intelligent system able
to suggest a way to control and filter input data. Today humans are the ones
that filter and decide which loss function to use, and in order to achieve GAI
deep learning must be provided with a meta-deep learning framework able to
filter and adjust the loss function. Perhaps a feed-back network trying to
evolve from a small model to a more general one using some kind of generative
grammar for filtering and controling the input. A transfer knowledge graph
powered by deep learning that select a generative grammar for designing the
most valuable filtering and objective function to produce a deep learning
system that learn by itself.

------
joe_the_user
Honest question: If two models are each trained with a single training set,
tested with a single test set, and used only to predict a single stream of
data and predict it approximately equally as well, is there any formal way to
say one model is engaging in "casual reasoning" and another isn't?

~~~
visarga
causal, casual, same thing

------
matchagaucho
Once ML models are routinely provided time-based features; particularly deltas
between events and the 3rd derivative rate-of-change; they are going to
identify some amazing causal relationships that humans are incapable of today.

~~~
daveguy
Why 3rd derivative? Do you think that because we don't typically work with 3rd
derivatives to explain physical phenomenon that there is a lot we are missing?
Or are there particular scenarios where you see 3rd derivative as being
critical to causality? 1st derivative = direct causality, 2nd derivative =
ability to affect change, 3rd derivative = ?? tertiary causes.

Wait. Doesn't a derivative mathematically define a causal relationship?

EDIT: nevermind re: derivative = causal ... that's just a correlation
relationship. dx/dt. Still I'm curious as to what is special about the 3d
derivative (besides jerk).

~~~
scottlocklin
3rd derivative is a simple filter on acceleration. Obvious feature in
something like fault prediction on heavy machinery.

What you really want, though are the ability to decouple and infer
relationships between short and long term features (something like the
cepstrum transform from speech analysis).

------
guscost
Aren’t we all?

