
Theoretical Impediments to Machine Learning with Seven Sparks from Causality - magoghm
https://arxiv.org/abs/1801.04016
======
Animats
His best observation is _" Anthropologists like N. Harari, and S. Mithen are
in general agreement that the decisive ingredient that gave our Homo sapiens
ancestors the ability to achieve global dominion, about 40,000 years ago, was
their ability to choreograph a mental representation of their environment,
interrogate that representation, distort it by mental acts of imagination and
finally answer “What if?” kind of questions."_ I can agree with that.

I've been saying something like that since the 1980s, but less abstractly.
I've argued that "common sense" is the ability to examine a proposed course of
short term action and predict generally what will happen. I used to work on
this at a low level, along the lines of "much of life is about getting through
the next 15 seconds in the real world without falling down or bumping into
anything". That's needed to survive in the real world. That led me into
automatic driving, legged running, grasping by touch, and similar low level
problems. Most of the brain in lower level mammals manages things at that
level. Once you've got that, maybe some higher level can back-seat drive the
short-term system to achieve higher level goals. I argued for getting the
lower level right first. AI still isn't very good at this, which is why mobile
robots are not yet useful.

It's clear that machine learning as we know it today has real problems doing
strong AI. We all know that. But this paper does not demonstrate that the
author's pet approach is any better. There are no examples. No working
systems. Also, trying to hammer the world into predicate calculus just doesn't
work. I went through Stanford at the peak of that idea in the mid-1980s, and
watched all the big names hit a wall.

~~~
pacala
Trying to hammer the world into predicate calculus just doesn't work. Trying
to hammer the world into "thought vectors" doesn't work either. Where to?

~~~
Animats
Geometry? Maybe probabilistic geometry, like SLAM.

------
BenoitP
Some say some books need to be chewed and digested, but that it is worth the
effort. Well, I've been chewing this from time to time since his 2011 Turing
prize; and I have yet to be able to build an understanding around it.
Understanding Michael StoneBreaker's was much easier. Build great databases.
Several times. Get prizes.

It feels like this causality formulation begs for modelling, and for a program
to be able to execute and test its own do-calculus actions; to provide the
world with a compelling 'Hello World'. Maybe emit some counterfactuals too.

Reinforcement Learning seems to yield way better results at exploring the
world; at producing actions that help model and test (and then influence) the
world.

Why is not more used? What's the story around it?

~~~
curuinor
turbocodes, belief propagation, bayes nets are more significant in industry

------
chewyshine
Judea Pearl is tough going for me. I've never met him but he comes across as
arrogant and condescending through his writing. I also find his writing very
tough to follow. Even so, his Do-calculus is a valuable perspective on causal
inference. Dawid is my go to source for clear exposition of causal concepts.
Hard to find a clearer thinker and better writer in this area.

~~~
dfan
I agree that this article came off as a bit condescending.

Pearl has a more general-audience book coming out this year coauthored with
Dana Mackenzie called The Book of Why: The Science of Cause and Effect. I'm
hoping that it will be a more gentle introduction to the topic. I'll check out
Dawid in the meantime.

------
orbital-decay
>Such systems cannot reason about interventions and retrospection and,
therefore, cannot serve as the basis for strong AI.

Can't they, though? Humans can't reason about quite a lot of their own
behavior either. Another question, as a complete outsider to the topic: can't
reasoning be implemented as a higher order abstraction over a statistical
model?

------
YeGoblynQueenne
>> To appreciate the extent of this denial, readers would be stunned to know
that only a few decades ago scientists were unable to write down a
mathematical equation for the obvious fact that “mud does not cause rain.”
Even today, only the top echelon of the scientific community can write such an
equation and formally distinguish “mud causes rain” from “rain causes mud.”
And you would probably be even more surprised to discover that your favorite
college professor is not among them.

I know who Judea Pearl is, but this is just conceited. "Only the top echelon
of the scientific community"? What, of _any_ field?

More importantly, the scientific community can express "mud does not cause
rain" in a formal manner and they have been able to do this since the
beginning of the last century:

    
    
      ¬Causes(mud, rain) ∧ Causes(rain, mud)
    

It's not specifically an equation -it's a theory- but it does the job in a
formal language with Turing equivalent expressive power. So does this:

    
    
      p(mud | rain) = 1.0
      p(rain | mud) = p(rain) 
    

Additionally, in modern machine learnign there are theoretical results that
prove the learnability of all computable functions by various different
classes of algorithm -inverse deduction for symbolists (like myself),
backpropagation for the connectionists and so on [1]. It's really hard to
reconcile such theoretical results with the claim that there are, basically,
things you can't learn with, say, neural networks or genetic algorithms, but
can learn with counterfactuals.

_____________

[1] Nice little talk by Pedro Domingos summarising his "Master Algorithm"
ideas, including a bit about Judea Pearl's favourite type of machine learning:

[https://www.youtube.com/watch?v=B8J4uefCQMc](https://www.youtube.com/watch?v=B8J4uefCQMc)

~~~
tr352

      >  p(mud | rain) = 1.0
      >  p(rain | mud) = p(rain)
    

I'm not sure these statements say what you think they do. Assuming P(rain) >
0, the first statement

    
    
      p(mud | rain) = 1.0
    

is equivalent to

    
    
      p(mud & rain) = p(rain).
    

The second statement therefore implies

    
    
      p(rain | mud) = p(mud & rain).
    

Thus

    
    
      p(rain & mud) / p(mud) = p(mud & rain).
    

This, however, leads to the surprising conclusion

    
    
      p(mud) = 1.0.
    

According to your theory, it is always muddy.

~~~
ryanmonroe
I think what's meant is something like p(mud_{t+1} | rain_{t}) = 1 and
p(rain_{t+1} | mud_{t}) = p(rain). Say during 10 units of time it rains at
points 1, 3, 4, 6, and 10 and it's muddy IFF it rained in the previous period.
Then the above probability statements are true and it's only muddy 40% of the
time.

