
MIT AGI: Conversation with Yoshua Bengio [video] - AlanTuring
https://www.youtube.com/watch?v=azOmzumh0vQ
======
iandanforth
Some of his points I want to emphasize and ones I'd like to discuss further.

1\. The method of training in current use won't lead to general systems.

It doesn't matter how large your image set is you'll never learn all of
language and motor behavior from it. (In realistic terms)

Multi-modal learning where an agent is interacting with an environment will be
critical for general systems due to a common expectation that general systems
will be able to explore and problem solve. The requirement to explore and
problem solve needs to be built into training.

I go a bit further and believe that you need a fairly high-fidelity world in
which physically modeled agents are given progressively more difficult tasks
if you want to achieve this generalization.

As he notes at the end of the talk some variant of reinforcement learning will
be key for this.

2\. Disentangling representations will be useful for both generality and
preventing catastrophic forgetting.

Current algorithms update all parameters by default which means that all
knowledge is vulnerable to loss. If you can disentangle representations (and I
disagree that this can't be done in pixel space) then you can selectively
prevent updates to some parameters and better prevent accidental forgetting.

Concretely, mimicking the sparsity of activation seen in the brain through
implementing lateral inhibition (and further exploring top-down attentional
sparsification mechanisms) should be useful in this area. (some good papers on
this in 2018!)

3\. The ability for agents to learn from teachers and the ability to teach
agents interactively will improve generality and learning speed.

Agents today are largely trained from scratch. Some are fine-tuned from other
agents, but relatively few are instructed in the way you would a dog or a
child. Cultural transmission of information underlies a lot of what we think
of as modern human intelligence so it's likely that systems that can benefit
from this method of skill acquisition will have an advantage.

A point not mentioned was the crucial distinction between knowledge and
abilities which are innate to biological creatures (developed over
evolutionary timescales) and knowledge gained during a single lifetime.

Right now we try to teach agents everything and don't give them the benefit of
starting with evolved bodies, reflexes, and adapted perceptive abilities. So
even before we get to the cultural transmission of knowledge through teaching
I think we need to (in parallel) be developing base agents which are "evolved"
with solutions to lower level problems built in so that the networks don't
have to re-learn how to catch yourself when falling every single time.

~~~
yazr
Could u give your opinion on Starcraft2 mastery as the next level?

So this includes opponents, long term goals, short term combinatoric
optimizations, etc. But it is still a very simple, discrete, well-defined
world with clear rewards.

The consensus is that brute-force DRL is insufficient.

My question is, if we add some currently known modules such as relational
reasoning, memory, attention, hierarchical attention (maybe even human
demonstrations), plus deepmind-scale computing, can we expect to see a super-
human SC2 agent soon ?

Or are we really waiting for a more radical mechanism ?

~~~
iandanforth
Given the performance of the DOTA2 bots I don't see any reason why PPO + fancy
network architecture won't solve this as well. I expect OpenAI to publish on
this in 2019 if not sooner.

------
joe_the_user
So my impression of the way that Face recognition works is that the neural-net
part of the system recognizes the faces in a photo, recognizes the features on
the faces and provides a map of the features to the system. The system takes
these parameters and stores and recovers a given face with them. The ability
to recognize individual faces comes because conveniently these parameters
don't change much (but clearly it hinges on this). Of course, there are a lot
of small enhancements to this process and these evolve but this is still basic
outline.

Which is to say that convolutional neural networks, the most common and
successful neural networks, fundamentally only operate by recognizing a broad
category (face, nose-on-face, mouth-on-face). This limitation seems as
inherent in NNs operating with a black-box/training-set/testing-set cycle as
it is in any architecture consideration. If all you can "tell" the NN is "this
is X, this is not X", then it's plausible that's all it will give back.
Moreover, "X/not-X" testing allows a LOT of training to happen in an
unambiguous fashion, which seems necessary when what you're doing gradually
pushing a giant bunch of semi-arbitrary detectors into place.

Machine learning theorists can do a lot with nothing but this category-
recognition tool - like having a go program recognize "good moves" and "good
positions". Note, that the more elaborate "neural net within neural net"
approaches involve _vast_ levels of computing power - you're brute-forcing
brute-force. AlphaGoZero probably involved the largest level of computing
power ever harnessed for a single-use app. AlphaGoZero involve megaflop levels
close to those estimated for the human brain though such estimates are always
debatable (perhaps why we didn't see many games by it).

The thing is there are plenty of ordinary computing algorithms do "more" than
neural nets - perform logical operations or sort and search by vague
similarity or classify on spectrum of multiple qualities (rather than the
binary X or not-X of a neural network). It is just that these break down at an
"industrial scale", whereas neural nets are shine at this scale.

However, despite these virtues, it seems likely we'll see neural nets
approaching their particular limits reasonably soon, if they haven't (in at
least a sense) done so already.

And it seems like any "real" extension/alternative will have to be able to use
big data but use it in a more sophisticated fashion than neural nets.

Oh, this relates to the video most in that when Bengio talks about how to
extend neural networks, he's mostly just mentioning extension in terms of how
to add abstract qualities like "causation", where one can certainly see crude,
simplistic things in computation that could be called "working with causation"
but you don't have that yet when you go to the level of a large-scale neural
network.

~~~
mgurlitz
NNs are not binary -- they're fundamentally analog, so for classification
problems, they produce probabilities that a test case is in each possible
class. A binary "X/not-X" test often comes from applying a threshold to the
NN's output.

Quoting from the Universal approximation theory's Wikipedia, "neural networks
can represent a wide variety of interesting functions." While they may be much
better at pattern recognition, it's possible to produce almost anything with
one, including Go moves, if you can devise a method to interpret the outputs.

~~~
iandanforth
>> NNs are not binary -- they're fundamentally analog

That's incorrect.
[https://arxiv.org/abs/1602.02830](https://arxiv.org/abs/1602.02830)

~~~
candiodari
You could say they're "probabilistically binary", that would at least be
intuitively accurate.

------
JabavuAdams
Great talk! What's my best option for getting / making a transcript of this
talk?

~~~
make3
does this work for you ? [https://ccm.net/faq/40644-how-to-get-the-transcript-
of-a-you...](https://ccm.net/faq/40644-how-to-get-the-transcript-of-a-youtube-
video)

~~~
JabavuAdams
That does! Did not know one could do that. This is super useful for my
learning, thanks!

