
Escaping the Local Minimum: Where AI Has Been and Where It Needs to Go - kennethfriedman
http://kennethfriedman.org/projects/escaping-local-min/
======
Animats
The big difference this time is that AI makes money. This matters. The first
two AI booms never made it to profitability, or produced much usable
technology. This time, there are applications. As a result, far more people
are involved. This AI boom is at least three orders of magnitude bigger than
the first two.

I've had similar criticisms to the parent author for years, but I thought of
it as a hubris problem. In each AI boom, there was a good idea, which promoter
types then blew up into Strong AI Real Soon Now. The arrogance level of the
first two AI booms was way out of line with the results achieved. This time,
it's more about making money, and much of the stuff actually works. Machine
learning may hit a wall too, but it's useful.

The field isn't going to get trapped in a local minimum with neural nets
because the field is too big now. When AI was 20 people each at Stanford, MIT,
and CMU, that could happen. With 50,000 people taking machine learning
courses, there are enough people for some to focus on optimizing existing
technologies without taking away from new ideas.

We're going to get automatic driving pretty soon. That's working now, with
cars on the road from about a half dozen groups. Not much question about that.

The author rehashes symbolic systems and natural language understanding as
areas of recommended work. This may or may not be correct. Time will tell. He
omits, though, the "common sense" problem. There's been work on common sense,
but mostly as a symbolic or linguistic problem. Yet the systems that really
need common sense are the ones that operate in the real, physical world. What
happens next? What could go wrong? What if this is tried? That's what Google's
self driving car project is trying to deal with. Unfortunately, Google doesn't
say much about how they do this. That project, though, is really working on
common sense.

Incidentally, Danny Hillis did not found Symbolics. He founded Thinking
Machines, which built the Connection Machine, a big SIMD (single instruction,
multiple datastream) computer with 1024 dumb processors each executing the
same instruction on different data.

~~~
kennethfriedman
Author here. Wow, fantastic feedback! Thank you for taking the time to read it
and respond.

I have a few responses. As an undergrad, I won't assume my background is
strong enough for direct opinions, so apologies in advance as my responses are
mostly pointers to other people.

Professor Patrick Winston[0] at MIT would argue that the expert systems, the
successes of the '80s also about making money. Obviously not as much present
day, but that is due to the tech sector infiltrating more areas/markets simply
because of the exponential growth processing power. It would be very
interesting to compare AI's value-add to the world between 1980 and today when
adjusted for the "inflation" of moore's law.

It's true I didn't consider the number of people in the field currently
compared to the past, and that's an interesting point.

The instructor of the class that I wrote this paper for, Joscha Bach[1], would
argue that real physical world results are not very significant, since the
real world can be thought of as a simulation itself.

The idea of "what happens next? What could go wrong?" with self driving cars
is interesting. A question arises, should you be able to ask the car, "why did
you just stop short?" and receive an answer? This is a question Gerald Sussman
has been discussing recently[2]. If we do want systems that can explain their
behavior, then they must speak in human-language and therefore must be
symbolic at some level. In general, the idea of "Common Sense" seems too much
of (what Minksy would describe as) a suitcase word[3] -- because the
definition is so abstract, it's not worthy of debate without defining the
term.

Great call on Hillis, thanks! I completely mixed up Symbolics and Thinking
Machines. Fixed!

[0]: [http://people.csail.mit.edu/phw/](http://people.csail.mit.edu/phw/) [1]:
[http://cognitive-ai.com/](http://cognitive-ai.com/) [2]:
[https://vimeo.com/151465912](https://vimeo.com/151465912) [3]:
[https://alexvermeer.com/unpacking-suitcase-
words/](https://alexvermeer.com/unpacking-suitcase-words/)

~~~
Animats
_" A question arises, should you be able to ask the car, "why did you just
stop short?" and receive an answer?"_

I can't get that answer from my horse, but he has the common sense to stop
before getting into trouble. Mammals with 99% DNA commonality with humans can
run their lives successfully but can't talk. If we can get to good mammal-
level AI performance, we should understand how to get to human. Right now,
we're still having trouble getting to lizard level. Even OpenWorm [1] isn't
working yet.

[1] [http://www.openworm.org/](http://www.openworm.org/)

~~~
kennethfriedman
Ah, now we get into a pretty interesting debate: is there a fundamental
difference between human cognition and other animals?

But specifically for the horses example: I think it would be pretty hard to
defend the case that horses have commonsense, unless you use the definition of
commonsense as "basic survival skills." Horses can walk around until they find
food, and they can run if they see a fast moving object. But I can't think of
any examples that seem like commonsense, in the definition of "sound judgment
in practical matters."

When thinking of it from a bottom-up approach, Rodney Brooks[0] comes to mind.
He tried to build rat like creatures in the early 90s, with the goal that
modeling rat behavior would enable modeling human behavior. However, the
results were unsuccessful, and the implementations did not scale well. (Which
is another case for the humans-are-fundamentally-different side of the
argument)

[0]:
[https://www.cs.nyu.edu/courses/fall01/G22.3033-012/readings/...](https://www.cs.nyu.edu/courses/fall01/G22.3033-012/readings/representation.ps)

~~~
LionessLover
As someone who has a basic neuroscience background, your comment is a sequence
of empty statements.

> Ah, now we get into a pretty interesting debate: is there a fundamental
> difference between human cognition and other animals?

Actually, that's not an interesting debate at all, unless you like vacuous
debates. You will have to be very - _very_ \- specific if you want that
"debate" to be grounded in actual science.

> pretty hard to defend the case that horses have commonsense

Given that there is no neuroscience-based knowledge of what "common sense"
even means, you are making up your own criteria.

I don't even see what point you are trying to make in your last paragraph.

> (Which is another case for the humans-are-fundamentally-different side of
> the argument)

Eh.. what? You name some random experiment, don't even say much about it at
all, and then try to draw a general conclusion. And of course, as in all your
other statements, you refrain from any specifics but remain a "politician",
just playing with words that don't mean anything specific.

~~~
kennethfriedman
Thanks for the comment, but let's try to keep this discussion friendly and not
ad hominem.

I'd politely disagree that my comment was a sequence of empty statements, but
allow me to provide some clarifying details that might help your
understanding.

The debate I describe is actually critical. If humans are not fundamentally
different, than the field should be able to model more simplistic animals
(such as a rat), and slowly build up to a model of human level intelligence.
On the other-hand, if there is a fundamental difference between humans and
other animals, then simply modeling other animals will not scale, and will
leave the field wanting.

I don't think we have to get too specific to debate this. I would point to
Winston, Tattersall, Chomsky as three widely respected individuals who present
the case that symbolic language (and the uniquely human ability to combine two
concepts into a new concept, indefinitely) is the keystone that separates
humans from other animals.

In your second criticism, we agree exactly. As you can see in my previous
comment, I agree that "commonsense" is an arbitrary term. Here, I was simply
providing an example of how debating commonsense is not a useful exercise.

Finally, in my last paragraph I was providing an example of how attempting to
use "simple animals" as a basis for modeling human behavior has not effective.
The previous commenter said that we haven't reached "lizard level" and pointed
to the OpenWorm project. I pointed to a related project, from Brooks, that had
a similar mission. It did not work, perhaps, because modeling simple animals
(such as a rat or lizard), won't scale to humans. I did not meant to draw a
general conclusion (since I said that it is simply another case, not a proof).

As to your final line, let's keep this a lively discussion and not a personal
attack.

------
visarga
What I knew was that local minima are not that problematic when the state
space is highly dimensional. Only saddle points appear in high dimensional
spaces. The 2D example is not realistic.

Here is a quote from a paper I randomly sampled on arxiv:

> For an expected loss function of a deep nonlinear neural network, we prove
> the following statements under the independence assumption adopted from
> recent work: 1) the function is non-convex and non-concave, 2) every local
> minimum is a global minimum, 3) every critical point that is not a global
> minimum is a saddle point, and 4) the property of saddle points differs for
> shallow networks (with three layers) and deeper networks (with more than
> three layers).

[https://arxiv.org/abs/1605.07110v1](https://arxiv.org/abs/1605.07110v1)

Also, if your critique is related to the perceived lacks of backpropagation,
keep in mind than reinforcement learning is also a kind of backpropagation of
a reward, but this time the reward is much sparser and low dimensional. Thus,
they are somewhere in-between supervised and unsupervised learning, not quite
enjoying the full supervision of backpropagating at every example, but still
learning based on an external critic.

The way forward is to implement reinforcement learning agents with memory and
attention. These systems are neural turing machines, they can compute in a
sequence of steps.

~~~
kennethfriedman
Author here, thanks for the comment!

While this is an interesting read (and NTMs are great), it is not particularly
relevant to the model I describe or this paper.

I think you are misunderstanding my generality of my paper: I am not
discussing a particular method of deep learning. I am using the idea of
gradient descent as an metaphor for the field of AI, itself.

As described in the second paragraph of the "Gradient Descent" section, this
analogy is not high dimensional. In fact, it is only three dimensional:
distance the field is from General AI, time, and a hypothetical "method of
attack".

~~~
phreeza
I think the reasoning still holds. If the dimensions are reasonably
independent of each other, the probability of a local minimum occurring in any
kind of high-dimensional surface falls rapidly as the number of dimensions
increases. And I would argue that machine learning methods are a pretty high-
dimensional search space, describing them in a single 'method of attack'
dimension is probably not right.

~~~
jal278
But by that line of reasoning, shouldn't we never hit dead ends in AI research
at all -- why has AI progress been so difficult, then? Wouldn't any field of
research with many dimensions of variation never get stuck on its path towards
its ultimate goals, ever?

Couldn't different objective functions be structurally more difficult than
others to optimize? No matter how high-dimensional the search-space, trying to
create a gaming laptop in the middle ages would have been a pretty frustrating
experience.

~~~
phreeza
Because there are plenty of saddle points. Gradient descent slows down quite a
bit at those, too.

------
nl
Have you read Pedro Domingos's[1] "The Master Algorithm: How the Quest for the
Ultimate Learning Machine Will Remake Our World"?

You should. It directly addresses the idea of blending different fields of AI.

[1]
[http://homes.cs.washington.edu/~pedrod/](http://homes.cs.washington.edu/~pedrod/)

~~~
kennethfriedman
No, I haven't seen this before but it looks like a great read, thanks for the
recommendation! I'll check it out

~~~
nl
(Also I think it's really good the way you have been engaged and open to new
ideas and criticisms here. That isn't always easy, and very refreshing to see)

------
PaulHoule
Rule based systems have to get ergonomic; intelligent systems by definition
are not stupid -- I.e. if the behavior of the system is unacceptable you need
to be able to patch it quickly not add another 150 million training examples.

The #1 threat to a.I. right now is kaggleism, that is, training data is more
valuable than talent, algorithms and all that.

------
munawwar
I'd say that speeding up learning times for image related "AI" technology are
important as well. Think of it..my nephew (1.5 year old) sees a cat just two
times and is able to identify cats, whereas these neural nets needs huge
training sets + performant machines and gpus.

~~~
sushirain
If we trained a net on many classes except for cats, then it would require a
few examples of cat to recognize cats.

------
sushirain
Local minimum? In the next 10 years many deep learning applications will
materialize: speech recognition will reach human level in production,
autonomous trucks on highways, autonomous cars for consumers, AR, personal
robot cleaners. The following 10 years will not fail us, too.

~~~
kennethfriedman
Author here. Thanks for commenting, interesting point!

However, as you can see in the section "1960s", many similar comments were
predicted 50 years ago. The point I address in this paper is that optimism is
easy when things are going well.

This is a great example. Lots of people are saying today that we will have X
in 10 years. But in 2006, no one was saying we will have X in 20 years.

I'm not saying you're wrong, but I'm saying it's worth questing why you are so
optimistic, and whether it's because we aren't looking at the big picture.

~~~
sushirain
The difference from the 60's mis-predictions is that today we already have
speech and visual object recognition systems working with 95% accuracy and we
only need to get them to 99%. This gap will be closed by data collection and
faster processors, no algorithmic leaps necessary.

