
A Deep Learning Dissenter Thinks He Has a More Powerful AI Approach - espeed
http://www.technologyreview.com/featuredstory/544606/can-this-man-make-ai-more-human/
======
yconst
I found the first comment to this article quite interesting:

"Marcus has a point - even is some of what is said about deep neural networks
is incorrect (for instance, they can learn and generalize from very few
example, one shot learning).

However, he got it wrong with the answer. The key for machines to reach the
symbolic abstraction level is the way we train them. All training algorithms,
supervised, unsupervised or reinforcement learning with LSTM rely on the
assumption that there is an "utility function" imposed by some external
entity. Problem is, by doing so, we are taking away their capacity [of the
machines] to make questions and create meaning.

The most important algorithm for learning is "meaning maximization" not
utility maximization. The hard part is that we cannot define what is meaning -
maybe we can't, I'm not sure. That is something I will be glad to discuss."

~~~
emcq
This is perhaps a bit philosophical, but assuming we had a "meaning
maximization" function, what stops us from writing it as a loss function and
using our current supervised machine learning frameworks?

~~~
sawwit
Just because we can formulate an optimization objective, it is not guaranteed
that we will find an algorithm that solves it within a reasonable amount of
time. In case of humans, these objectives or preferred states are possibly
very simple ones, like hunger and pain avoidance, reproduction and curiosity;
and it is actually easy to write down an algorithm that optimizes these
objectives (if you ignore how reality actually works): Simply try out _all
possible_ ways of reacting to the environment and choose the best one.

This works in theory, but in practice you only have a limited amount of
chances to try something out (because of the arrow of time). This makes
learning a necessity. You need to keep a record of all trials you have
performed so that you can reuse this information later when the same situation
reoccurs. How to do this in an optimal way is described by Bayes' theorem.

The key to AI will be a certain set of priors, biases and fixed function units
that make this computationally tractable; we'll likely need things like
invariance of the learned information to various changes so that it can be
reused in different settings, segmentation of data coming from the world into
episodes (hippocampus), attention, control (basal ganglia), mental rotation
(cortex) and path integration (hippocampus, grid cells).

~~~
emcq
That's true, there are certainly many optimization objectives computationally
intractable, or perhaps too abstract to be useful for learning.

However, I would argue the prior of Bayesian modeling can be just as nebulous
and computationally intractable as an optimization objective. Like supervised
learning, Bayesian modeling is just a tool.

I'm skeptical that we will reach AI through a deep understanding or modeling
of the brain. Technology and computer science advances more quickly than the
biological sciences, at least in recent times. You might argue a success in
robotics like [0] is a motor control system. But they built this extending
mathematical frameworks not being biologically inspired, and the big wins
there didn't come from fixating on a learning framework or biological mimicry;
just like humans learning to fly didn't come about from flapping wings like a
bird. At some point we hacked an engine (invented for other purposes) onto a
wing and came up with powered flight.

As an aside, only seeing input a limited number of times would likely improve
your ability to find models that generalize as your model must be able to take
these one off learnings and unify them in some way to achieve high training
performance. With respect to human learning, a specific individual only has
one chance, but nature has had many. We are only a selection of those chances
that seemed to work well enough. There are many commonalities to existence
that allow for this to work well in practice.

[0]
[http://groups.csail.mit.edu/rrg/papers/icra12_aggressive_fli...](http://groups.csail.mit.edu/rrg/papers/icra12_aggressive_flight.pdf)

------
vonnik
I'm glad people are working on better approaches to AI, but I don't think
articles like this, based on sources like Marcus, contribute much.

Mostly because Marcus isn't willing to talk about what he's working on right
now. Hand-wavy stuff about how children learn, a tired analogy that's been
trotted out many times before. This article was too early, and it doesn't
contain real news. Reproducible results, or it didn't happen...

Fwiw, the statement below is simply untrue.

> _“If you want to get a robot to learn to walk, or an autonomous vehicle to
> learn to drive, you can’t present it with a data set of a million examples
> of it falling over and breaking or having accidents—that just doesn’t
> work.”_

The entire premise of Deepmind's work combining DL and reinforcement learning
shows that autonomous agents learn from millions of examples.

And in fact, data is not the limiting parameter here. Trying to create an AI
that doesn't need big data will be great for edge computing -- but we're
actually swimming in data and using it to set new records every year --
[http://deeplearning4j.org/accuracy.html](http://deeplearning4j.org/accuracy.html)
\-- so we might as well rely on it. Ignoring big data in the pursuit of AI is
like ignoring sunlight in the pursuit of clean energy.

Deep learning is one of the best ways we have to use that data. In a sense,
deep is the new normal.

~~~
_dps
> but we're actually swimming in data ...

I think this is a "looking under the lamppost for your keys because that's
where the light is" situation. We hear about tons-of-data and deep learning
because teams like Google/FB have problems that fit those situations well.
This is not representative of the vast sea of learning applications most
organizations face.

I work on learning applications day-in-day-out. In a typical year I work with
over 50 different organizations/projects. Almost none of them have data
anywhere near what deep learning requires. Here are just a few examples from
the past year:

1) optimizing recruiting pipeline. Maybe you have ~1000s of data points per
year

2) medical billing applications: typically hundreds of data points per year

3) novel but slow-moving financial instruments, maybe 1 data point per day

4) automated sensor calibration where each run of the hardware costs you a few
hundred dollars: depends on your budget, but thousands of data points in a
year are representative

I think Google/FB and the "web+mobile is everything hivemind" (I'm not saying
that applies to you) have deeply distorted people's expectations for how much
data is available to solve a typical problem.

------
wisevehicle
I hope that anyone who incorporates this concept into their AI also measures
the number of mistakes which become heuristics and biases in the AI. I would
hypothesize that there is a link between the willingness or ability of the
human mind to learn this way and the propensity to accept ( even DESIRE ) an
answer to a question that, in truth, there is not adequate information to
answer correctly.

This tendency leads to both helpful and dangerous heuristics and biases. From
these biases humans build false beliefs that are damaging to themselves, their
communities, and in the long run the species as a whole. If AI is about
enabling humans to do more and better, should we not accept the failings of
the current technologies in favor of assuring that they do not fall prey to
the same biases and heuristics that lead humans to slaughter each other over
the religious dogma of the past, destroy the environment with impunity, and
accept ideas that 'feel' correct over being correct?

~~~
rm_-rf_slash
A reason to fear human AI would be that they would realize how destructive
humans are and how little they perceive us necessary to the evolution of the
species.

------
mindcrime
OK, to be fair, I skimmed TFA and didn't read every word. But to the extent
that I get the gist of it, I'd say this:

I don't know that anybody seriously proposes that deep learning is the be-all
end-all of AI techniques. It's VERY powerful for a lot of things, but I think
DL researchers are aware of things DL doesn't do / isn't good at. Look at the
recent book _The Master Algorithm_ which breaks down a lot of what it would
take to create a truly general purpose learning algorithm: If you believe the
author's thesis, Deep Learning (or something like that) is just one piece of a
much larger picture.

And without trying to start a debate over the merits of ML versus "GOFAI" or
symbolic computation, etc., I think it's fair to say that DL doesn't really
add anything in terms of reasoning. It's great at saying "this picture has a
cat in it" or "this wav file says 'Hello, my name is mindcrime'", but that's a
pretty small part of what human intelligence can do.

~~~
cjauvin
I'm curious about that book ( _The Master Algorithm_ ): I began reading it,
but stopped early because I got the impression it would be too "entry-level"
for me, thus a waste of my time. Should I consider keeping on with it, and
why?

~~~
mattmcknight
I would say no, if you already understand the varieties of techniques it
covers. It doesn't get more advanced. However, it is a decent historical
overview.

He also pushes the same idea over and over again that there must be a master
algorithm, when I got the feeling very early on that it wasn't a necessary
thing or a practical concept to guide research. It's like arguing that you
have to choose between electricity and magnetism, when they might just be
aspects of the same underlying forces. Also, his reasons for claiming that
certain algorithms are not the master algorithm are pretty weak and based on
current progress, not theoretical limitations.

~~~
eli_gottlieb
What does he have to say about the No Free Lunch Theorem and the bias-variance
tradeoff?

------
cshimmin
Does someone have a link to a concise technical explanation of what this is
about? I can't handle the vague long-form writing about some guy's startup and
2-year-old on my morning pass through the news. I'm reminded of the current
southpark season, feeling like this is an ad being passed off as news.

~~~
eli_gottlieb
TL;DR: Deep learning can't generalize to novel inferences as well as
probabilistic programming.

------
rdtsc
> In contrast, a two-year-old’s ability to learn by extrapolating and
> generalizing—albeit imperfectly—is far more sophisticated.

How does he know? He only sees the output and not how the brain works or how
much information has been processed before hand. So it seems like the brain
took an easy shortcut and generated a new rule, and you could just write a bit
of code to do that as well.

But isn't this rule learning just a few more layers in a learning network.
Basically a learning network that operates on concepts not on just raw pixels.

One thing I noticed with my kids is they have a ridiculously good memory. For
a while they almost fooled me that they could read. They would remember a word
in a book and then a month later point to it and pronounce the word. So I
thought, oh wow, you could read that. But then realized they just remembered
it. It seems a lot of how a brain learns is just be absorbing and processing
massive amounts of data, not unlike a learning project Google or these other
companies have.

Now on the subject at hand, haven't there been expert systems that do what he
proposes. In a way IBM's Watson is a successor to many of those expert
systems. So one can say this has already been implemented and tried, it works
in some instances but not others.

------
tinco
Until our current big data AI revolution, AI was one investment catastrophe
after another. Now money finally is flowing into AI again, and intuitions like
this idea, backed by a little big data power can be explored again.

So even though the idea is fairly simple, I think anyone who thinks about AI
for any trivial amount of time will come up with the idea that AI's somehow
have to automatically generate imperfect generalizations, the final result
might be a truly innovative product.

Actually building an AI that does this on a scale that results in useful
applications is something that probably has not been practical for a long
time, and might or might not be now.

~~~
0xdeadbeefbabe
Do you have reference for one investment catastrophe after the other? I know
it failed to deliver on the promise of translating human language, for
example.

~~~
fractallyte
This article lays out those failures:
[https://en.wikipedia.org/wiki/AI_winter](https://en.wikipedia.org/wiki/AI_winter)

------
arnia
An interesting article, but (understandably, if frustratingly) very light on
the details. It reminds me of some of the work being done in the Artificial
General Intelligence (AGI) community. In AGI, you are looking to come up with
more universal approaches to mimicking intelligence, rather than bake-an-
architecture-solve-a-specific-problem.

In particular, this reminds me of some of the operational logic work done in
OpenCog ([http://opencog.org/](http://opencog.org/)) and, especially, Pei
Wang's Non-Axiomatic Reasoning ([http://cis-
linux1.temple.edu/~pwang/papers.html](http://cis-
linux1.temple.edu/~pwang/papers.html)). I've liked Non-Axiomatic Reasoning for
a long time, and this sounds in the same broad area.

Pei Wang, incidentally, came up with the best operational definition of
intelligence I've seen (and one I apply to other systems and approaches):
Intelligence is the ability to act appropriately with limited knowledge and
limited resources (including time and space).

For what it's worth, my (published, although very open to change) position is
that a layered architecture (akin to old-school subsumption architectures) is
probably going to be most effective here. Combining the 'symbol creation'
abilities of a deep neural network with the evidential reasoning,
generalisation, and planning capabilities of a cognitive layer will allow us
to get the best of both worlds.

~~~
mindcrime
_For what it 's worth, my (published, although very open to change) position
is that a layered architecture (akin to old-school subsumption architectures)
is probably going to be most effective here._

I have been going back and spending some time with the old "blackboard
architecture" idea lately. I harbor a suspicion that that, or something like
that, will turn out to be a useful way to integrate the capabilities of
various different elements of cognition.

~~~
arnia
Yes, although it is probably going to have a notion of flow to it too. As
someone who subscribes to the cognitive science idea of embodiment, the
concepts in our head have meaning entirely because of how they relate senses
to actions (however indirectly, and vice versa). I believe AI will be no
different.

------
murbard2
If I had to guess I would say that:

\- It involves generative models (needed to make inferences from very few
examples)

\- It is still a connectionist approach (typical probabilistic programming is
great if you have a lot of insight into the model, but not if you're trying to
solve a general case... unless you're doing program induction but you need to
represent programs...)

\- It doesn't involve MCMC sampling for inference, because that's too slow or
even intractable.

Some type of variational program induction where programs are represented as
differentiable neural networks would be in that corner. Or it could be
something totally different, but speculating is fun.

~~~
BenoitEssiambre
I dunno, I still think it might be possible to make tractable program
induction through MCMC.

I've been trying (but failing) to do so by clustering similar generative rules
of different complexity together then getting the algorithm to search the
grammar space in a way that it recursively tries simpler models first then
goes on to try generating (and learning) more complex rules that are known to
have similar output to the best simpler ones.

My intuition is that you have to cluster your generative rules into an
taxonomy that your algorithm can navigate from the top. It's exponentially
inefficient to try to recognize "dog" until you have recognized that the
simpler "animal" is a good approximation.

I also think that in the ultimate solution, the output leaf statements of the
grammar will be parametrized like a normal programming language, so that for
example, the algorithm can generate a color, a radius and then generate 100
circles referencing this single color and radius to represent repeated
patterns without having to learn their size and color individually when they
are clearly homogeneous or invariant across a bunch. The bayesian, occam's
razor, solution to a bunch of similar things that you don't have a category
for yet is shared parameters. These parameters enable learning with very few
examples. The algorithm doesn't have to learn a full new category to make good
predictions, it can simply notice a functional parametric pattern in a part of
a scene and extrapolate immediately.

I haven't cracked it however. It's grueling to debug probabilistic algorithms
compared to clear cut binary ones!

~~~
murbard2
Having some type of hierarchical structure is probably the right inductive
bias to have (it may be one of the reasons why deep learning is so
successful), but you also need a search direction. It's hopeless to randomly
hop in a high dimensional space until you find matches.

~~~
BenoitEssiambre
The hierarchical structure means you don't hop randomly. You recognize simple
and vague approximate models first then bias your search towards the more
complex rules that are under you best top level approximations in the
hierarchy.

It's like a probabilistic binary search in the model taxonomy.

One thing that makes me hopeful that the learning process is possible is that
the non-terminal grammar rules are generic and can always become anything. You
don't tend to get stuck in local maxima even if you only search a small part
of the program space when any node can always be swapped for a node of any
another category.

My problem right now is that my MCMH doesn't mix well. Even though I tried to
shorten the jumps between models of different complexity for a particular
thing, once the vague model traces are well fitted to the training examples,
there is too deep a score chasm to jump to the next level of complexity.

My simulations seem to get stuck in simpler modelisations of things. I want
simpler models to recursively open the door to learning and recognizing
slightly more complex models just under them in the hierarchy but this doesn't
seem to be happening. The hierarchy is not really even forming properly.

I might be missing something stupid. I don't consider myself an expert in
these things. I just dabble for fun. It's one of the reason I'm discussing it
here. Maybe someone more knowledgeable will make me have an aha! moment.

There is still a bunch of things I could try but like I said, it's long and
grueling debugging work.

------
nickpsecurity
Interesting work to see as I've promoted this view myself. Specifically, that
the brain is designed to rapidly acquire knowledge with relatively little data
in childhood then changes overtime somehow. Not to mention people's reasoning
ability that often seems closer to a self-modifying, expert system that NN's
don't seem anything like. So, it all uses neurons but different types that
smoothly integrate.

Hence, the DNN's modelling a small part of the brain that doesn't even do our
thinking isn't going to make thinking machines. The machines will be really
stupid and hard to train. Humans are hard enough to train: takes two decades
on average. The ability to automate abstraction, understanding, and feedback
via example generation all on small amounts of data is critical.

If that's not found, then my next question is, "Will DNN's deliver acceptable
models for non-recognition tasks they're aiming at or will they experience The
Second AI Winter when truth sets in?"

------
emcq
Perhaps Gary Marcus said it best himself when critiquing Numenta, yet another
brain inspired company [0]:

[Brain inspired] models are arguably closer to how the brain operates than
artificial neural networks. "But they, too, are oversimplified," he says. "And
so far I have not seen a knock-down argument that they yield better
performance in any major challenge area."

There are a few of these brain inspired machine learning companies, yet I cant
think of a single one acquired by Google.

Probabilistic modeling is great when you dont have much data and want to
inject prior knowledge in a cohesive way. However Google's approach appears to
be to create large and robust training sets to feed into somewhat conventional
supervised learning frameworks.

[0] [http://www.technologyreview.com/news/536326/ibm-tests-
mobile...](http://www.technologyreview.com/news/536326/ibm-tests-mobile-
computing-pioneers-controversial-brain-algorithms/)

~~~
daveguy
It seems that he implies that brain models aren't good because they haven't
piqued Google's interest. Almost immediately he goes on to say that his method
is great even though Google isn't really interested in it either. Seems a
little contradictory.

------
amoruso
The article doesn't give any technical details. The website of the profiled
company, Geometric Intelligence, doesn't have any technical details.

Cofounder Gary Marcus has a publication list at his academic webiste:

[http://www.psych.nyu.edu/gary/marcus_pubs.html](http://www.psych.nyu.edu/gary/marcus_pubs.html)

Cofounder Zoubin Ghahramani has his research available at his academic
website:

[http://mlg.eng.cam.ac.uk/zoubin/](http://mlg.eng.cam.ac.uk/zoubin/)

~~~
scottlocklin
If you look around Zoubin's website you'll find this:

[http://www.automaticstatistician.com/about/#](http://www.automaticstatistician.com/about/#)

"The current version of the Automatic Statistician is a system which explores
an open-ended space of possible statistical models to discover a good
explanation of the data, and then produces a detailed report with figures and
natural-language text. While at Cambridge, James Lloyd, David Duvenaud and
Zoubin Ghahramani, in collaboration with Roger Grosse and Joshua Tenenbaum at
MIT, developed an early version of this system which not only automatically
produces a 10-15 page report describing patterns discovered in data, but
returns a statistical model with state-of-the-art extrapolation performance
evaluated over real time series data sets from various domains. The system is
based on reasoning over an open-ended language of nonparametric models using
Bayesian inference."

------
discardorama
Talk is cheap. He is more than welcome to compete in ILSVRC or on Kaggle or
any other competitions of his choosing. Maybe release some code on Github or
even an API. There are a lot of options out there to really _show_ how your
technique is better, not just talk about it.

"How humans do it" isn't necessarily the only way to solve a problem. We don't
know how we add 2 numbers in our minds; but the computer can add 2 numbers
orders of magnitude faster than us.

~~~
hyperbovine
From tfa it does not sounds like he is interested in competing in ILSVRC. His
point is that fitting a 500M-dimensional model to hundreds of millions of
training examples != intelligence. That is, deep learning is incredibly good
at pattern recognition but it is not going to get us to AI as most people
understand it.

~~~
tachyonbeam
Pattern recognition is just one component of intelligence. It's a big,
important one though. People need to stop thinking that we'll one day discover
some one algorithm for strong AI. It will be a composition of different parts,
just like the human brain.

------
spdionis
The part that always looked strange to me of comparing AI to humans/children
is that humans have had _years_ of time to learn while e.g. neural networks a
few days at most?

~~~
doctorpangloss
Yeah, but it's not obvious if every single frame of the whole movie children
have experienced is necessarily important.

To put it one way, what is the _rate_ of learning? How many _concepts_ do
children learn per unit time?

Concepts: Our computer systems take whole images and train on them, but before
a pixel-based image gets to the learning part of a child's brain, it has
probably been conceptualized in some way already.

Rates: And then, from this feature vector—maybe 1 feature vector per 10
seconds, for a child—how often is it incorporated in a learned concept? Maybe
one concept per hour? Per day? Think about how challenging it is for children
to learn vocabulary. A young child doesn't know that much.

The important thing is that a child is a self-learning machine. It chooses
what's important and has a way to shift its attention, a critical enhancement
of new (2015) AI approaches, deep learning and otherwise. It can explore its
own world and somehow choose the "training instances" that matter.

By comparison, a neural network training for days looks at millions of images.
A very high rate. But the most effective approaches (the massively-multilayer
networks, the deep in deep learning) are making concrete what a child's brain
must do already: separate perceptual components from learning components, in a
conclusively more structured and hierarchical way.

~~~
spdionis
What if you can achieve such high learning rate only because (in ML) you
process a really narrow view of the data which cannot lead even close to how a
human perceives things? I mean in terms of conceptualizing and connecting
things/thoughts.

------
AndrewKemendo
Gary isn't the first to think of this.

In fact I was working with Dr. Frank Guerin [1] back in 2008 about this exact
approach. He wrote an interesting paper that approached ML from a pedagogy
perspective titled _A Piagetian Model of Early Sensorimotor Development_

[1]
[http://homepages.abdn.ac.uk/f.guerin/pages/](http://homepages.abdn.ac.uk/f.guerin/pages/)
[2]
[http://homepages.abdn.ac.uk/f.guerin/pages/EpiRob2008.pdf](http://homepages.abdn.ac.uk/f.guerin/pages/EpiRob2008.pdf)

------
lovelearning
He "refused to explain exactly what products and applications ...for fear that
a big company like Google might gain an advantage".

Does this actually happen, or is it baseless paranoia?

~~~
tachyonbeam
In my experience, if you start spreading a good idea out there, what can
happen is that people will initially act as though your idea is outlandish and
worthless, but later on, this idea will percolate in other people's mind,
they'll do something with it, and claim it was purely their own invention.
Google has more resources than this guy, so they can prototype things much
quicker.

------
lacker
How exactly does someone run a company for one year during a sabbatical? It
seems like you are signaling that you don't believe it will succeed, if you
are pre-announcing that you only intend to run this new company for a year.
Has this structure ever succeeded?

------
nl
The _Talking Machines_ podcast (which is great BTW) has an interview with
Zoubin Ghahramani[1], one of the founders mentioned here.

He spoke about how he was determined to prove that non-deep learning methods
could perform as well as deep learning at tasks like ImageNet. The interview
was in March 2015, and AFAIK they haven't published anything yet.

OTOH, his work on the Automatic Statistician sounded very interested.

[1]
[http://www.thetalkingmachines.com/blog/2015/3/26/3mixrq61fb0...](http://www.thetalkingmachines.com/blog/2015/3/26/3mixrq61fb0tff4kn0mrkzsw2xma98)

------
meeper16
There were others thinking of this approach in AI before this guy:

"The idea for a search engine that maps associations came to Franks by way of
his three young children. He noticed how each child processed information by
taking two pieces of knowledge, combining them, and coming up with something
new. Franks wondered whether he could get a computer to do the same thing"
From:

A Search Engine that Thinks, Almost
[http://newscenter.lbl.gov/2005/03/31/a-search-engine-that-
th...](http://newscenter.lbl.gov/2005/03/31/a-search-engine-that-thinks-
almost/)

------
yeukhon
I just realized why wearable like Google Glass was such a big deal for Google.
We human brain is constantly feeding in data and rationalizing objects we see,
things we do. We are supervised as we grow and we continue to rationalize
things on our own terms.

With wearable, if you allow the eyes of machines to read the world with you,
hear your conversation, with enough time, your machine should be capable of
learning enough to become like you. With millions of people doing so, you
essentially create an AI that is like human able to answer complex questions.

~~~
aoeusnth1
I feel like YouTube data should be enough for doing this.

~~~
yeukhon
The problem with YouTube data is that they are either very noisy and poor
quality, or they are very short and meaningless. Surely having a GG on you can
procedure similar result, but the idea is that the data is more consistent
with how we grow up. We don't travel every day and the things we encounter
every day are pretty consistent. Once you refine an AI that represents you,
you can now take others' data and rationalize the data. I have never been to
France but I kind of have an idea what Paris as a city may look like, the
buses they have, how they are different from MTA here in NY.

------
spyder
From the article:

 _" A deep-learning system can be trained to recognize particular species of
birds, but it would need millions of sample images and wouldn’t know anything
about why a bird isn’t able to fly."_

Deep-learning ≠ only image recognition

Of course if you train it only on images it will only know how a bird looks
like, but if you train it on the properties of the birds and their ability to
fly, then it will learn why a bird isn't able to fly.

------
sgt101
How weird that there isn't a single mention of SOAR in this article or the
comments. [http://soar.eecs.umich.edu/](http://soar.eecs.umich.edu/) I guess
fashion is a powerful thing!

------
_0ffh
I have read far too much about some guy's groundbreaking new theory for AI in
the past few decades.

Shut up and show us the code!

