
Predictive Learning [pdf] - aaronyy
https://drive.google.com/file/d/0BxKBnD5y2M8NREZod0tVdW5FLTQ/view
======
pakl
Lecun has identified a real problem for AI -- the need to understand the real
world, the link between intelligence and prediction over time. But the tools
he is using are not the right ones.

Deep conv nets were not designed with prediction over time in mind. Here's one
reason: Deep convolutional nets do not handle dynamical information from their
lowest layers. By design, conv layers and pooling layers immediately begin
discarding spatial information that could be used for building up predictions.

In contrast it is possible to start with recurrent/feedback networks at the
very first layers of the network. These initial layers can begin building up
predictions at the pixel, color, and lighting level (example: our recent
preprint[1]).

My colleague, who as some of you know enjoys blogging, wrote a more thorough
post in response to LeCun's recent CMU lecture on the same topic as these
slides[2].

[1] [https://arxiv.org/abs/1607.06854](https://arxiv.org/abs/1607.06854)

[2] Blog: "A few comments on the Yann LeCun lecture at CMU, 11.2016"
[http://blog.piekniewski.info/2016/11/21/yann-lecun-
cmu-11-20...](http://blog.piekniewski.info/2016/11/21/yann-lecun-
cmu-11-2016-comments/)

~~~
eli_gottlieb
Yeah, not to be too dismissive or cliched, but these slides come off as if
LeCun just now got around to reading some Andy Clark or reading
neurosci/cogsci papers by Tenenbaum or Friston. The idea that the human brain
has to work via active prediction rather than passive signal processing has
been well-established in cogsci and neurosci for a while now.

The interesting question, then, is how we make machine-learning systems do
prediction well. Probabilistic/generative models have been "wandering in the
desert" for a while now because Monte Carlo methods are _just so slow_ ,
especially for high-dimensional, hierarchical prediction problems like we want
to solve in machine learning. On the upside, STAN now has automatic
variational inference for continuous probability models, and work on things
like the "concrete distribution"
([https://arxiv.org/abs/1611.00712](https://arxiv.org/abs/1611.00712)) can
help us continuously approximate discrete probabilistic reasoning. Maybe as
these techniques move into the mainstream in systems like Venture or Picture
we can start to scale up predictive/generative/probabilistic modelling to
match optimization-based connectionist methods?

~~~
felippee
The idea is not new indeed, Andy Clarks's review paper was very inspiring to
us. But it has not been detailed to a point of implementation/scaling. PVM is
an attempt to implement it in a "connectionist" way, but frankly all I need
are associative memories, and how they are implemented I don't care. So a
"probabilistic PVM" is totally feasible. In fact we discuss in the paper
various possibilities in which a PVM like meta-architecture can be
implemented.

------
daveytea
He also presented at CMU Robotics a few weeks ago (and used these slides).
Video here: [https://youtu.be/IbjF5VjniVE](https://youtu.be/IbjF5VjniVE)

~~~
rdudekul
Another good one here:
[https://www.youtube.com/watch?v=Gwad1cWMcC0](https://www.youtube.com/watch?v=Gwad1cWMcC0)

------
lowglow
For anyone that doesn't know Yann LeCun, he's the head of AI over at Facebook,
but surprisingly he's positively and consistently straightforward concerning
the current hype driving AI and its technologies. He deserves respect because
of this alone.

~~~
hackinthebochs
Not to mention the father of CNNs.

~~~
escap
it's a bit more complicated:
[https://en.wikipedia.org/wiki/Convolutional_neural_network#H...](https://en.wikipedia.org/wiki/Convolutional_neural_network#History)

[https://www.quora.com/Who-invented-convolution-neural-
networ...](https://www.quora.com/Who-invented-convolution-neural-networks)

~~~
jamesblonde
Who are you? Joergen Schmidthueber?

------
earthly10x
Memory Modules in vector space reminds me of some earlier work described as
Associative Memory Modules (AMMs) [http://sumve.com/biomimetic-
cognition/biomimetic-api.html](http://sumve.com/biomimetic-
cognition/biomimetic-api.html)

------
chessweb01
If page 33 depicts the working of the brain on a very high level, the world
model (or simulator) residing inside the agent must contain a model/simulator
of the agent itself.

Could this give rise to self perception or consiousness?

~~~
splike
I happened to be reading The Selfish Gene by Richard Dawkins and Gödel,
Escher, Bach by Douglas Hofstadter at the same time, and both of them point at
exactly this being the reason for consciousness. I was stunning at how both
reached the same conclusion, that consciousness arrises from recursion of self
perception, from very different points.

Also, if anyone is watching Westworld ( __spoilers __), it seems to come to
the same conclusion funnily enough. What finally gives the androids
consciousness is some kind of recursive idea of listening to themselves.

~~~
TFortunato
Re Westworld: The theory of consciousness explored in the show is explored in
more detail in Jaynes' _The Origin of Consciousness in the Breakdown of the
Bicameral Mind_ , as alluded to both in the show and in the title of the final
episode. I've just picked it up, and it's a pretty interesting read so far.
I've also noticed that a lot of little details from the book used in the show,
such as referring to memories as "reveries" at points, and talking of minds as
"hosts" of consciousness. I may need to re-watch the show after finishing the
book!

I found a pdf version of the book here if you are interested.
[http://selfdefinition.org/psychology/Julian-Jaynes-Origin-
of...](http://selfdefinition.org/psychology/Julian-Jaynes-Origin-of-
Consciousness-Breakdown-of-Bicameral-Mind.pdf)

------
ehudla
Interesting how what "common sense" is evolved since the days of symbolic AI
(cf. Cyc). The history presented begins rather late; show some love to the
founders of the field...

~~~
ehudla
For Cyc, see here:
[https://news.ycombinator.com/item?id=11300567](https://news.ycombinator.com/item?id=11300567)

------
raverbashing
Very interesting set of slides

It seems the current state of prediction is only slightly better than the
state of image recognition pre multiple level NNs

There might be still a theoretical jump that's needed

~~~
felippee
Yes, the field has emerged out of MNIST/ImageNet and that is what those
algorithms are optimised for. For modelling actual dynamics different design
is necessary. It happens that the design that makes sense also seems to agree
very well with the observed biology of the cortex. You can find links to our
Predictive Vision Model in this thread as well as a few additional thoughts
here: [http://blog.piekniewski.info/2016/11/30/learning-physics-
is-...](http://blog.piekniewski.info/2016/11/30/learning-physics-is-the-way-
to-go/)

------
vowelless
Will the other slides be posted? And how about videos?

------
sushirain
My summary of the slides: to achieve "common sense", Unsupervised Learning is
the the next step. Use all information in the data for predicting everything
you can including the past. Use adversarial networks to do that.

------
edtufte
These slides remind me of the worst PowerPoints ever.

~~~
thomasahle
Really? They have pictures and everything.

~~~
MawNicker
A _wonderful_ demonstration of copyright violation/enforcement.

