Hacker News new | past | comments | ask | show | jobs | submit login
Escaping the Local Minimum: Where AI Has Been and Where It Needs to Go (kennethfriedman.org)
96 points by kennethfriedman on May 30, 2016 | hide | past | favorite | 50 comments

The big difference this time is that AI makes money. This matters. The first two AI booms never made it to profitability, or produced much usable technology. This time, there are applications. As a result, far more people are involved. This AI boom is at least three orders of magnitude bigger than the first two.

I've had similar criticisms to the parent author for years, but I thought of it as a hubris problem. In each AI boom, there was a good idea, which promoter types then blew up into Strong AI Real Soon Now. The arrogance level of the first two AI booms was way out of line with the results achieved. This time, it's more about making money, and much of the stuff actually works. Machine learning may hit a wall too, but it's useful.

The field isn't going to get trapped in a local minimum with neural nets because the field is too big now. When AI was 20 people each at Stanford, MIT, and CMU, that could happen. With 50,000 people taking machine learning courses, there are enough people for some to focus on optimizing existing technologies without taking away from new ideas.

We're going to get automatic driving pretty soon. That's working now, with cars on the road from about a half dozen groups. Not much question about that.

The author rehashes symbolic systems and natural language understanding as areas of recommended work. This may or may not be correct. Time will tell. He omits, though, the "common sense" problem. There's been work on common sense, but mostly as a symbolic or linguistic problem. Yet the systems that really need common sense are the ones that operate in the real, physical world. What happens next? What could go wrong? What if this is tried? That's what Google's self driving car project is trying to deal with. Unfortunately, Google doesn't say much about how they do this. That project, though, is really working on common sense.

Incidentally, Danny Hillis did not found Symbolics. He founded Thinking Machines, which built the Connection Machine, a big SIMD (single instruction, multiple datastream) computer with 1024 dumb processors each executing the same instruction on different data.

The field isn't going to get trapped in a local minimum with neural nets because the field is too big now. When AI was 20 people each at Stanford, MIT, and CMU, that could happen. With 50,000 people taking machine learning courses, there are enough people for some to focus on optimizing existing technologies without taking away from new ideas.

Riffing off the OP's argument of AI as gradient descent, one could say the 50,000 new people aren't necessarily doing purely random-restart hillclimbing so much as sequential monte carlo, sampling around a distribution laid out by the main pillars of the ML community. That is to say, it should only be robust against weak local minima. Hopefully we should be self-aware enough to know when we're trapped in a strong one, and await a true random-restart.

No, we don't even need to be very clever: evolution is no better able to evade weak local minima than any but the dumbest search strategies humans use, and it quite easily discovered a neural algorithm that lead to intelligence when scaled sufficiently.

> when scaled sufficiently

Whilst technically correct, this obscures the truly collosal scale at which evolution computes.

If we consider each organism as a processor executing a single fitness observation, we can fit hundreds of thousands of such processors (e.g. bacteria) on the few square millimetres of a pin head. These processors are spread (less densely) across the 510 million square kilometre surface of the Earth. They are suspended in oceans, which have an average depth of 3.7 kilometres; and in the atmosphere, at least up to 10 kilometres (based on our observations so far).

Whilst the generation time (AKA execution time) of such organisms varies wildly, it is very commonly less than an hour. Yet even with these truly staggering resources at its disposal, evolution still took 3.3 billion years of wall clock time to come up with multicellular organisms.

Relying on evolution to come up with meta-level optimisation mechanisms like intelligence is also rather flaky; we only have one example of biological evolution to study, acting on one representation (DNA + RNA + proteins). This just-so-happened to stumble upon aerobic respiration, multicellularity, neurons and brains, but it's certainly not a "goal" of the system, and sampling bias prevents us knowing how likely an outcome that is (i.e. whether it's an attractor). Still, organisms with nervous systems are a vanishingly small proportion of all lifeforms; and in fact, every "advance" is dwarfed by the success of single celled microbes.

We could always leave an evolving system running for longer, but setting up conditions which lead to open-ended evolution is still an unknown area; simulations always tend to level off/saturate at some point. As a "last resort" we could provide the system with a mechanism to sample organisms completely at random, and use that as proof that anything is possible given enough time; but in that case why use evolution at all, when we can just perform that sampling directly, or alternatively just enumerate organisms?

I nodded along with pretty much everything you said, so I think we're arguing different points but not necessarily disagreeing.

I agree, evolution's scale is colossal, but restrict your view to the tiny bit of the evolutionary machine that is implicated in figuring out how to arrange neurons (which evolution "discovered" previously, to be fair, with incredible effort) in a fashion that renders their behavior intelligent, and it becomes a vastly smaller effort.

These beings are all rather large, at least worm-sized or greater. Their lifecycles are measured in days to years, not minutes or hours. And they only had 250 million years or so to make the leap from "bunch of neurons hard-coded to do specific jobs" to "generally intelligent arrangement of neurons capable of higher level thought". All of this cuts the scale by at least 5-10 orders of magnitude compared to what you correctly point out as the overall colossal scale of evolutionary computation. And to me, it says that compared to all the other amazingly difficult stuff that evolution has discovered, intelligence was a damn easy find, especially since there's almost no fitness benefit - hell, worms probably do better than we do as far as evolutionary fitness goes, they're tiny, numerous, and reproduce like crazy.

My money is on the fact that human intelligence guiding a search cuts another couple orders of magnitude off of how much of a long-shot discovering an intelligent algorithm would be via a random search. I could definitely be wrong, it's very hard to be precise when you're vaguely arguing about orders of magnitude, but to me it doesn't seem like the algorithmic "magic" in the brain can possibly be very complex if evolution was able to get there, there just aren't many other circumstances where evolution stumbles on something that damn clever.

> My money is on the fact that human intelligence guiding a search cuts another couple orders of magnitude off of how much of a long-shot discovering an intelligent algorithm would be via a random search.

I certainly agree that directing/biasing evolutionary processes can be a reasonably efficient search strategy. The difficulty is that it still seems to be a fallback, suited to problems where we don't have a more informed strategy. For example, if we can calculate gradients, we're probably better off doing gradient descent. If we can perform deduction/induction on symbols, we're probably better off doing that. Those problems where evolutionary processes seem well suited are those where we might not know how to bias the search.

I think the best place for such approaches at the moment is meta-level algorithms, where "smarter" algorithms (like gradient descent) are applied to the underlying problems we care about, but the parameters, policies, etc. evolve on top (e.g. step sizes, which algorithms to use, when to restart, scheduling concurrent attempts, etc.).

That's the approach taken by NEAT for example, where a genetic algorithm comes up with neural network topologies, whilst those neural networks are trained using backpropagation.

FWIW, I'm in no way suggesting that evolutionary strategies are the way forward in the search for AGI, though I suppose they might be part of the solution.

My 95% bet is that a random or evolutionary search might be how people optimize an AGI, not how they find it. They'll get there via a minor but innovative twist on the RNN work we've already seen (first people need to realize that backdrop through time is a complete dead end, and they should be looking closer at reinforcement learning methods and figuring out how to bring them online).

Easily? It took 4 billion years and multiple global extinction events, and it has only happened once. Flight, the eye, air breathing all evolved multiple times independently, within distantly related evolutionary lines. Intelligence is rare in nature, in spite of its advantages, probably because it is a very distant jump from local maxima.

Cephalopods are reasonably smart and their intelligence evolved mostly idependently. That's reassuring.


Do you define human intelligence as the only real intelligence, then? Or are you talking about some super-early primitive predecessor of a brain? (I'm sorry, I have no idea whether the latter has only evolved once.)

That's the real question though, isn't it? I think what we are talking about is the kind of intelligence that could qualify as a "general AI" if implemented in a computer. Although I'm sure that there many different types of intelligence that could match up, in nature there really is only one that we know about.

I don't think such a thing as general AI, or even general intelligence, is sensible to talk about. There are always tradeoffs: an algorithm good at finding some sort of patterns, necessarily has to be worse at finding other sorts of patterns.

There are actually ways to formalize and prove that, but it's intuitively obvious that an algorithm that can find the answer to a set of questions, can't be faster than an algorithm to find the answer to a subset of those questions.

Author here. Wow, fantastic feedback! Thank you for taking the time to read it and respond.

I have a few responses. As an undergrad, I won't assume my background is strong enough for direct opinions, so apologies in advance as my responses are mostly pointers to other people.

Professor Patrick Winston[0] at MIT would argue that the expert systems, the successes of the '80s also about making money. Obviously not as much present day, but that is due to the tech sector infiltrating more areas/markets simply because of the exponential growth processing power. It would be very interesting to compare AI's value-add to the world between 1980 and today when adjusted for the "inflation" of moore's law.

It's true I didn't consider the number of people in the field currently compared to the past, and that's an interesting point.

The instructor of the class that I wrote this paper for, Joscha Bach[1], would argue that real physical world results are not very significant, since the real world can be thought of as a simulation itself.

The idea of "what happens next? What could go wrong?" with self driving cars is interesting. A question arises, should you be able to ask the car, "why did you just stop short?" and receive an answer? This is a question Gerald Sussman has been discussing recently[2]. If we do want systems that can explain their behavior, then they must speak in human-language and therefore must be symbolic at some level. In general, the idea of "Common Sense" seems too much of (what Minksy would describe as) a suitcase word[3] -- because the definition is so abstract, it's not worthy of debate without defining the term.

Great call on Hillis, thanks! I completely mixed up Symbolics and Thinking Machines. Fixed!

[0]: http://people.csail.mit.edu/phw/ [1]: http://cognitive-ai.com/ [2]: https://vimeo.com/151465912 [3]: https://alexvermeer.com/unpacking-suitcase-words/

"A question arises, should you be able to ask the car, "why did you just stop short?" and receive an answer?"

I can't get that answer from my horse, but he has the common sense to stop before getting into trouble. Mammals with 99% DNA commonality with humans can run their lives successfully but can't talk. If we can get to good mammal-level AI performance, we should understand how to get to human. Right now, we're still having trouble getting to lizard level. Even OpenWorm [1] isn't working yet.

[1] http://www.openworm.org/

Ah, now we get into a pretty interesting debate: is there a fundamental difference between human cognition and other animals?

But specifically for the horses example: I think it would be pretty hard to defend the case that horses have commonsense, unless you use the definition of commonsense as "basic survival skills." Horses can walk around until they find food, and they can run if they see a fast moving object. But I can't think of any examples that seem like commonsense, in the definition of "sound judgment in practical matters."

When thinking of it from a bottom-up approach, Rodney Brooks[0] comes to mind. He tried to build rat like creatures in the early 90s, with the goal that modeling rat behavior would enable modeling human behavior. However, the results were unsuccessful, and the implementations did not scale well. (Which is another case for the humans-are-fundamentally-different side of the argument)

[0]: https://www.cs.nyu.edu/courses/fall01/G22.3033-012/readings/...

As someone who has a basic neuroscience background, your comment is a sequence of empty statements.

> Ah, now we get into a pretty interesting debate: is there a fundamental difference between human cognition and other animals?

Actually, that's not an interesting debate at all, unless you like vacuous debates. You will have to be very - very - specific if you want that "debate" to be grounded in actual science.

> pretty hard to defend the case that horses have commonsense

Given that there is no neuroscience-based knowledge of what "common sense" even means, you are making up your own criteria.

I don't even see what point you are trying to make in your last paragraph.

> (Which is another case for the humans-are-fundamentally-different side of the argument)

Eh.. what? You name some random experiment, don't even say much about it at all, and then try to draw a general conclusion. And of course, as in all your other statements, you refrain from any specifics but remain a "politician", just playing with words that don't mean anything specific.

Thanks for the comment, but let's try to keep this discussion friendly and not ad hominem.

I'd politely disagree that my comment was a sequence of empty statements, but allow me to provide some clarifying details that might help your understanding.

The debate I describe is actually critical. If humans are not fundamentally different, than the field should be able to model more simplistic animals (such as a rat), and slowly build up to a model of human level intelligence. On the other-hand, if there is a fundamental difference between humans and other animals, then simply modeling other animals will not scale, and will leave the field wanting.

I don't think we have to get too specific to debate this. I would point to Winston, Tattersall, Chomsky as three widely respected individuals who present the case that symbolic language (and the uniquely human ability to combine two concepts into a new concept, indefinitely) is the keystone that separates humans from other animals.

In your second criticism, we agree exactly. As you can see in my previous comment, I agree that "commonsense" is an arbitrary term. Here, I was simply providing an example of how debating commonsense is not a useful exercise.

Finally, in my last paragraph I was providing an example of how attempting to use "simple animals" as a basis for modeling human behavior has not effective. The previous commenter said that we haven't reached "lizard level" and pointed to the OpenWorm project. I pointed to a related project, from Brooks, that had a similar mission. It did not work, perhaps, because modeling simple animals (such as a rat or lizard), won't scale to humans. I did not meant to draw a general conclusion (since I said that it is simply another case, not a proof).

As to your final line, let's keep this a lively discussion and not a personal attack.

You've never heard common sense called "horse sense"?

The original 1 hp self driving vehicles.

They too were saddled with drunks and had to know the correct way home without direction.

> However, the results were unsuccessful, and the implementations did not scale well.

Can you define "unsuccessful" and "did not scale well"? From what you've said, I cannot see any relation to the "humans-are-fundamentally-different side of the argument". You just make it sound like a failed experiment, without saying why it failed.

Nothing to debate. There isn't. Let's stop wasting brainpower on this. Too much empty and meaningless philosophical discussion has unfortunately been revived by recent advanced in ML/AI.

This is sad. Especially because I believe philosophy could have a meaningful contribution to the progress of human understanding. It was a "science of interfaces" in the beginning, and now with more and more specialization we also end up with more and more "interfaces" between sub-sub-fields until 90% of everything will be "an interface between something and something else". Too bad we'll have to re-invent it under a different name, and probably disguise it as some sub-sub-field of engineering, to make sure we sever the the connection with all people willing to waste everyone's time with empty-talk and empty-think...

You should absolutely be able to find out why your car stopped and what it is waiting for. Not doing that would be a huge fail from the self driving car companies and would make it a lot harder to get people to trust self driving cars.

Yet animals and humans DO communicate to each other. Especially horses.

Happens all the time.

I am hungry. I am angry. I need comfort. I want to go somewhere. I am pleased to see you.

And so on. It is elemental communication but it is real.

Funny, I believe that much of what differentiates us from machines is the fear of the death that underlines every part every layers of our system, and our ancestry back to bacterias.

You may have heard of how new neural network architectures are designed? GDGS? Gradient Descent by Grad Student?

It's a joke, but funny because it is true.

That's one of the artifacts of the number of people in the field.

>> The big difference this time is that AI makes money.

This is not in support of the parent's position, but expert systems also made money and in fact continue to do so today, as they're still in wide use in industry for instance in airliner maintenance, fraud detection etc. Obviously those are legacy systems and sometimes they're only de facto expert systems (as in, they're a big database of rules with an inference engine, though nobody calls them an "expert system"). Still- the technologies invented back then did make a lot of money to many people. But, the market for expert systems went bust and people lost their money, and that's what really killed the field.

The number of people working on AI then and now is not very easy to compare. There was no Google, Facebook or even Apple and Microsoft in the first AI boom. IBM was around and it did do a lot of research in GOFAI, particularly logic programming. I've read IBM papers discussing Prolog type systems for instance. And let's not forget that Deep Blue was essentially an expert system: it searched a database of domain knowledge compiled from the opinions of experts. So, mutatis mutandis (if I may), there was inerest in GOFAI from the industry and there was money invested in it. The difference in that respect with what's going on today is not that clear-cut.

>> This time, it's more about making money, and much of the stuff actually works. Machine learning may hit a wall too, but it's useful.

I'm not going to disagree with that- not entirely. That machine learning is useful so far, there's no doubt, but it remains to be seen how useful it is in the long term. Like with expert systems, the obstacles may be more political, than anything else. I'm finishing a Masters in AI (as in, lots of machine learning) that was paid by my former employer, a big financial corporation- and yet, there was no interest in machine learning in the whole company for most of the time I was there. Maybe that says more about my ability to sell stuff (it's approximately 0) but I really did get the feeling that the corporate world doesn't understand the tech and doesn't care to understand the tech, so if machine learning rises or falls will entirely depend on politics and not how well it works, or doesn't.

Can you elaborate on what is the "common sense problem"?

What I knew was that local minima are not that problematic when the state space is highly dimensional. Only saddle points appear in high dimensional spaces. The 2D example is not realistic.

Here is a quote from a paper I randomly sampled on arxiv:

> For an expected loss function of a deep nonlinear neural network, we prove the following statements under the independence assumption adopted from recent work: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) the property of saddle points differs for shallow networks (with three layers) and deeper networks (with more than three layers).


Also, if your critique is related to the perceived lacks of backpropagation, keep in mind than reinforcement learning is also a kind of backpropagation of a reward, but this time the reward is much sparser and low dimensional. Thus, they are somewhere in-between supervised and unsupervised learning, not quite enjoying the full supervision of backpropagating at every example, but still learning based on an external critic.

The way forward is to implement reinforcement learning agents with memory and attention. These systems are neural turing machines, they can compute in a sequence of steps.

Author here, thanks for the comment!

While this is an interesting read (and NTMs are great), it is not particularly relevant to the model I describe or this paper.

I think you are misunderstanding my generality of my paper: I am not discussing a particular method of deep learning. I am using the idea of gradient descent as an metaphor for the field of AI, itself.

As described in the second paragraph of the "Gradient Descent" section, this analogy is not high dimensional. In fact, it is only three dimensional: distance the field is from General AI, time, and a hypothetical "method of attack".

I think the reasoning still holds. If the dimensions are reasonably independent of each other, the probability of a local minimum occurring in any kind of high-dimensional surface falls rapidly as the number of dimensions increases. And I would argue that machine learning methods are a pretty high-dimensional search space, describing them in a single 'method of attack' dimension is probably not right.

But by that line of reasoning, shouldn't we never hit dead ends in AI research at all -- why has AI progress been so difficult, then? Wouldn't any field of research with many dimensions of variation never get stuck on its path towards its ultimate goals, ever?

Couldn't different objective functions be structurally more difficult than others to optimize? No matter how high-dimensional the search-space, trying to create a gaming laptop in the middle ages would have been a pretty frustrating experience.

Because there are plenty of saddle points. Gradient descent slows down quite a bit at those, too.

"reinforcement learning agents with memory and attention"

Here's a hybrid approach


Have you read Pedro Domingos's[1] "The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World"?

You should. It directly addresses the idea of blending different fields of AI.

[1] http://homes.cs.washington.edu/~pedrod/

No, I haven't seen this before but it looks like a great read, thanks for the recommendation! I'll check it out

(Also I think it's really good the way you have been engaged and open to new ideas and criticisms here. That isn't always easy, and very refreshing to see)

Rule based systems have to get ergonomic; intelligent systems by definition are not stupid -- I.e. if the behavior of the system is unacceptable you need to be able to patch it quickly not add another 150 million training examples.

The #1 threat to a.I. right now is kaggleism, that is, training data is more valuable than talent, algorithms and all that.

I'd say that speeding up learning times for image related "AI" technology are important as well. Think of it..my nephew (1.5 year old) sees a cat just two times and is able to identify cats, whereas these neural nets needs huge training sets + performant machines and gpus.

If we trained a net on many classes except for cats, then it would require a few examples of cat to recognize cats.

Local minimum? In the next 10 years many deep learning applications will materialize: speech recognition will reach human level in production, autonomous trucks on highways, autonomous cars for consumers, AR, personal robot cleaners. The following 10 years will not fail us, too.

Author here. Thanks for commenting, interesting point!

However, as you can see in the section "1960s", many similar comments were predicted 50 years ago. The point I address in this paper is that optimism is easy when things are going well.

This is a great example. Lots of people are saying today that we will have X in 10 years. But in 2006, no one was saying we will have X in 20 years.

I'm not saying you're wrong, but I'm saying it's worth questing why you are so optimistic, and whether it's because we aren't looking at the big picture.

The difference from the 60's mis-predictions is that today we already have speech and visual object recognition systems working with 95% accuracy and we only need to get them to 99%. This gap will be closed by data collection and faster processors, no algorithmic leaps necessary.


Haha, author here. I can assure you I did proofread it multiple times, but I'm sure I didn't catch every typo. Though I would love to hear thoughts on the content rather than the writing!

Also, if anything is confusing, I'd be happy to clarify.

That was an incredible piece. What an interesting way of looking at it.

Matching it to Minsky's model really shows the limitations of current AI - but I'm going to read more about the validity of Minsky's model before setting any thoughts in stone.

Thanks for sharing that.

Thanks! I'm glad you found the perspective interesting.

It's definitely worth reading more about Minksy's models. I have a lot of references in-line, but here are a few more starting points. The near-final-draft version his book Emotion Machines is available for free on his MIT site[0] and WashPo gave it a very positive review[1].

It would be interesting to investigate some criticisms of the model, though I haven't found any that specifically target the 6 layer model.

[0]: http://web.media.mit.edu/~minsky/Introduction.html [1]: http://www.washingtonpost.com/wp-dyn/content/article/2006/12...

Sorry if I offended but it was meant to be constructive. No doubt much effort went into the research but two mistakes in 2nd paragraph of the introduction do not reflect well on the presentation to readers like myself who are not in the field but are interested to know more.

There are a lot of typos!

Thanks for reading! I'd be happy to correct specific problems if you'd be kind enough to email them to me (see the about section of my website for my email).

If not, let's stick to substantive comments on the content rather than grammar or spelling.

Ok. To pick on one part. This argument is self-referencing and redundant:

"However, it would be useful to have a neural network that is able to perform symbolic algebra. There are two clear reasons for this desire. First, this hypothetical system would demonstrate that neural networks can be used as a substrate for previously-achieved AI systems. Second, a neural network that could perform symbolic algebra would, by definition, be able to manipulate symbols."

Then directly following the above, this sentence makes very little sense:

"This would show that high level knowledge representations can be grounded in statistical models."

What does it mean for "high level knowledge representations" to be "grounded in statistical models?" It sounds to me like your saying to implement symbolic algebra on NNs would prove that NNs can implement symbolic algebra.

There is some good content in the article but I find the conclusions and even the premise to be overreaching. It would read better if it were condensed into a history of AI booms and busts with out making wild predictions.

The term "grounded" is to have meaning which references some external reality. He meant that a NN which can perform symbolic algebra would demonstrate that symbols are grounded in some meaning outside the identification/pattern matched sense. They have meaning outside of "an or symbol, an and symbol, a true symbol" with additional information in their relationships that a NN can understand well enough to use them in context.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact