
AI Pioneer Wants to Build the Renaissance Machine of the Future - andreshb
https://www.bloomberg.com/news/articles/2017-01-16/ai-pioneer-wants-to-build-the-renaissance-machine-of-the-future
======
cr0sh
Honestly, I think the title from the New York Times as "the father of AI" is a
bit presumptuous. I'm not an expert by any means in machine learning or
artificial intelligence, but I do know a fair amount about computer history.

Potential "father of modern AI" \- even that is stretching a bit!

The fact is, machine learning and artificial intelligence is a story in
history of fits and starts; of springs and winters; of successes and failures.

If anyone could be called such, the fathers of AI would belong to Warren
McCulloch and Walter Pitts in 1943, who came up with the model for an
artificial neuron:

[https://en.wikipedia.org/wiki/Artificial_neuron](https://en.wikipedia.org/wiki/Artificial_neuron)

From there, it was people standing on each other's shoulders, building ever
upward and outward. These steps and systems have ultimately led to today's
deep learning networks.

Other machine learning techniques trace to various techniques and methods in
statistics and probability theory; then you have the whole arena of
computer/machine vision research.

Right now, we're in the midst of yet another AI/ML "spring", after a fairly
long "winter". Mr. Schmidhuber can certainly claim a bit of status as being
one of many who help institute a thaw leading to today - but he is by no means
alone (I'd argue that one of the earliest for today's thaw might be Lecun).

~~~
YeGoblynQueenne
>> If anyone could be called such, the fathers of AI would belong to Warren
McCulloch and Walter Pitts in 1943, who came up with the model for an
artificial neuron

The work of McCulloch and Pitts was seminal in the field of connectionism,
itself a sub-field of AI (and the field that encompasses neural networks, but
not machine learning in general). The Pitts-McCulloch neuron was only one of
their many contributions. Pitts in particular later published a great deal on
inductive inference, which is more straight-forward, Good, Old-Fashioned AI.

AI goes way back, with the work of Turing and Von Neumann, and even Russel and
the Logicians- not a Ska band, but a group of mathematicians who wrote about
symbolic logic: Hillbert and Gödel being the most well-known among them. We
have digital computers today, because these gentlemen (and some ladies among
them) came up with the maths that enabled it. In their time, "artificial
intelligence" pretty much meant a machine like the one we're using right now
to communicate.

There was a small army of bright personalities that followed in their wake.
Just off the top of my head (and completely arbitrarily; most of them are my
personal heroes of AI):

    
    
      Marvin Minsky
      Roger Schank
      Mark E. Gold
      Dana Angluin
      J. Ross Quinlan
      Peter Norvig
      Eugene Charniak
      David H. Warren
      Daniel Jurafsky
      Robert A. Kowalksi
      John McCarthy
      Steve Russel
      Christopher D. Manning
      Terry Winograd
      Edward Feigenbaum
      Geoff Hinton
      Carl Eddie Hewitt
      Richard O' Keefe
      Pat Langley
      George F. Luger
      Ryszard S. Michalski
      Seymour Aubrey Papert
      Judea Pearl
      Fernando Pereira
      Steven Pinker
      Frank Rosenblatt
      David Everett Rumelhart
      James Lloyd McClelland
      Stuart Russel
      John Rogers Searle
      Rodney Allen Brooks
      Claude bloody Shannon
      Paul Smolensky
      Gerald Jay Sussman
      Richard S. Sutton
      Katia Sycara
      Leslie Gabriel Valiant
      Vladimir Naumovich Vapnik
      Norbert Wiener
      Joseph Weizenbaum
      Paul J. Werbos
      James H. Martin
      Hinrich Schütze
      Ehud Shapiro
      Stephen Muggleton
      Leon Sterling
      Alain Colmerauer
      ... 
    

I've taken care to list all those people with as much detail in their names as
I could find on wikipedia. This should help you identify them and read about
their contributions to the field, which I strongly suggest anyone does before
attempting to discuss the history of AI, in any detail and at any depth.

~~~
nl
Claude bloody Shannon ;)

I was going to say that I think Marvin Minsky deserves to be on that list,
especially if we are talking about the rise and fall and rise again of neural
networks. Especially the fall part.

Then I noticed he was first on the list and I'd just scrolled.

------
ttam
> "Juergen Schmidhuber, often referred to as the father of AI"

What?

reading further...

> he New York Times recently referred to him as a would-be father of AI.

Clicking the link to the NYTimes article

> When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’

English as a second language person here, but is it just me or the Bloomberg
subtitle does not reflect at all what the NYT title is saying?

~~~
kobeya
The phenomenon of "citogenesis":

[https://xkcd.com/978/](https://xkcd.com/978/)

------
blueyes
This is a sloppy piece making grand claims about a man who likes to make grand
claims about himself. By most accounts, Schmidhuber is only marginally
involved with nnaisense, lending his name to the endeavor so that some of his
graduate students can raise money. They have no use case, no product and
frankly no business sense, which is one reason why they only money backing
them comes from an unknown investor based in Madrid.

------
dharma1
Good to see. I had the opportunity to see Jurgen speak last year - he was
excellent - entertaining, surprisingly deep and able to field any questions
with humour and insight. He gets a lot of flak for being obsessed with being
credited for his work, but I think he does it for historical accuracy rather
than ego.

Research oriented AI startups are experiencing serious brain drain to large
companies because of the money they can afford to pay, I hope this secures
Nnaisense for a couple of years and they make the most of that time

------
ponderingHplus
I attended my first NIPS this year, and found Juergen to be a very engaging
speaker, with the RNN symposium organized by him and his colleagues being my
favorite part of the conference. A popular phrase that was being thrown around
during the conference was "learning to learn" or "meta learning", with one of
the papers even being titled "learning to learn by gradient descent by
gradient descent"[1] Juergen seemed very passionate about the subject and he
gave a cool talk around his Godel Machine[2], and sparked interesting
conversation during the panel discussion. I wouldn't be surprised if "learning
to learn" or "meta learning" replaces "deep learning" as the AI-word of 2017.

[1][https://papers.nips.cc/paper/6461-learning-to-learn-by-
gradi...](https://papers.nips.cc/paper/6461-learning-to-learn-by-gradient-
descent-by-gradient-descent.pdf)

[2][https://en.wikipedia.org/wiki/G%C3%B6del_machine](https://en.wikipedia.org/wiki/G%C3%B6del_machine)

~~~
deepnotderp
Learning to learn is a deepmind paper, not schmidhuber.

~~~
ponderingHplus
Agreed, but the linked paper was one of the more talked about ones during the
conference, and has a fairly accessible discussion on the topic, including a
history of related work in section 1.2 with references to many of Jurgen's
papers.

------
canistr
Interestingly enough, Googling "Father of AI" yields several results including
John McCarthy, Marvin Minsky, and Alan Turing.

Google's provided results point at McCarthy while the first mention of
Schmidhuber is at the 9th result from the New York Times.

------
visarga
I like his ideas, especially the one about consciousness being reinforcement
learning, and self being emergent by the process of data compression.

~~~
fnl
Not citing prior work is a grave issue in the academic world. Papers can get
accepted/rejected based on their novelty status. Missing citations of central
topics of a work can therefore lead to retractions. I am not an expert on the
issue in this particular case, but if Schmidhuber is right, from an academic
perspective, he has all the right to be pretty pissed off. I certainly would
recommend to reject a paper if the author were to refuse to cite prior art.

~~~
visarga
In that case the technique was similar but the way it was used (purpose) was
different.

> The new approach seems similar in many ways. Both approaches use
> "adversarial" MLPs to estimate certain probabilities and to learn to encode
> distributions. A difference is that the new system learns to generate a non-
> trivial distribution in response to statistically independent, random
> inputs, while good old PM learns to generate statistically independent,
> random outputs in response to a non-trivial distribution (by extracting
> mutually independent, factorial features encoding the distribution). Hence
> the new system essentially inverts the direction of PM - is this the main
> difference? Should it perhaps be called "inverse PM"?

[http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips27...](http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips27/reviews/1384.html)

And this is from Ian Goodfellow himself, giving 3 counter arguments:

> Some previous work has used the general concept of having two neural
> networks compete. The most relevant work is predictability minimization
> [Schmidhuber, J. 1992]. In predictability minimization, each hidden unit in
> a neural network is trained to be different from the output of a second
> network, which predicts the value of that hidden unit given the value of all
> of the other hidden units. This work differs from predictability
> minimization in three important ways: 1) in this work, the competition
> between the networks is the sole training criterion, and is sufficient on
> its own to train the network. Predictability minimization is only a
> regularizer that encourages the hidden units of a neural network to be
> statistically independent while they accomplish some other task; it is not a
> primary training criterion.2) The nature of the competition is different. In
> predictability minimization, two networks’ outputs are compared, with one
> network trying to make the outputs similar and the other trying to make the
> outputs different. The output in question is a single scalar. In GANs, one
> network produces a rich, high dimensional vector that is used as the input
> to another network, and attempts to choose an input that the other network
> does not know how to process. 3) The specification of the learning process
> is different. Predictability minimization is described as an optimization
> problem with an objective function to be minimized, and learning approaches
> the minimum of the objective function. GANs are based on a minimax game
> rather than an optimization problem, and have a value function that one
> agent seeks to maximize and the other seeks to minimize. The game terminates
> at a saddle point that is a minimum with respect to one player’s strategy
> and a maximum with respect to the other player’s strategy.

[http://papers.nips.cc/paper/5423-generative-adversarial-
nets...](http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf)

~~~
fnl
Thanks for the details, very enlightening! But its a bit one-sided, still:
What are Schmidhuber's arguments on this issue?

------
cosmoharrigan
For more details, refer to Schmidhuber's own website:
[http://people.idsia.ch/~juergen/](http://people.idsia.ch/~juergen/) and his
excellent review paper, "Deep Learning in Neural Networks: An Overview":
[https://arxiv.org/pdf/1404.7828.pdf](https://arxiv.org/pdf/1404.7828.pdf)

~~~
kriro
From the linked website:

"""Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been
to build a self-improving Artificial Intelligence (AI) smarter than himself,
then retire."""

I like this approach.

------
lucidrains
Schmidhuber cracks some hilarious jokes when he is on stage.

"The other day I gave a talk and there was just a single person in the
audience, a young lady. I said young lady it's very embarrassing but
apparently today I'm going to give this talk just to you. She said okay but
please hurry I gotta clean up here."

------
mikefinley
Hinton. Hands down. Just read his early work on deep nets. Unsupervised, no
ensemble, nailed MNIST and self-awareness of features. And he has a great
sense of humor.

------
choxi
They sound like a startup version of Watson, it's nice to see an AI startup
having some success in the industry.

------
partycoder
There's a long line of succession... take your number, Jurgen: McCulloch,
Pitts, Fukushima, Kohonen, Hopfield, Jordan, Elman, Werbos, LeCun, Hinton,
Bengio... only to name very few.

You can also go even further...
[https://www.youtube.com/watch?v=laJX0txJc6M](https://www.youtube.com/watch?v=laJX0txJc6M)

