

New approach to AI to improve many Google products - mgl
http://www.technologyreview.com/news/429442/google-puts-its-virtual-brain-technology-to-work/?

======
karpathy
Lets go easy on the virtual brain parts and hype. These approaches basically
all consider some simple, fixed function f that is parameterized by some
unknown numbers W, and everything comes down to finding the numbers W that
give some desired result (for voice recognition example, correct answers to a
huge dataset of pairs of f( <some sound wave features> ) = <some word>). Now,
it is hard to do this efficiently and it's all very interesting, but are we
going to reduce AI to finding a conveniently-tuned input -> output function?
This is much closer to simple regression than a thinking brain.

Also, this approach is not new but has been around since 1980's. We do have a
richer family of models to draw from and a few new tricks, but much of the
progress comes primarily from faster computers and infrastructure.

The correct article is as follows: Google is building an amazing piece of
infrastructure that allows them to run stochastic gradient descent across many
machines very quickly. This engineering feat is particularly convenient for a
certain family of models that have been around for the last many years and, as
it turns out, scaling up these models actually seems to work well in several
common and important scenarios. Equipped with this hammer, Google is currently
busy trying to figure out just how many of their problems are a nail.

~~~
rm999
> that have been around for the last many years

Unsupervised deep learning is the heart of what they are doing, and this is a
relatively new method (first I saw it was at NIPS in Dec 2006). The 1980s were
dominated by two layer supervised feedforward neural networks, which are quite
different.

~~~
mark_l_watson
Correct, and an excellent point. Also, using unsupervised learning to process
very complex input, that builds simpler models - and then using supervised
learning on this simpler input is useful.

I wrote a commercial Go playing program in the late 1970s that was all
procedural code. I have been thinking of hitting this problem again (on just a
9x9 board) using this combination of unsupervised learning to reduce the
complexity of the input data, and then use a combination of supervised
learning and some hand-written code.

Fast hardware was not available before to use deep neural networks (> 1 hidden
layer) the last time I did Go playing programming. (There are also new monte
carlo techniques that are providing really good results.)

~~~
777466
Are 80s-style neural networks better at anything, or is the new stuff going to
always be better?

~~~
rm999
Keep in mind that they are a supervised technique so they can't be compared to
what google is doing.

80s style neural networks are flexible and powerful learners. Theoretically
they can learn any function, and in practice they often come up with decent
solutions. They aren't perfect - there is no guarantee they will find the best
solution ('local minima'), and they operate as a black-box, meaning we can't
properly interpret them.

I've heard the saying "80s style neural networks are usually the second best
solution", which is oversimplified but close to correct.

------
mark_l_watson
I don't consider neural networks "real AI" (whatever that means) but they are
extremely useful and with faster hardware, I expect to see a lot more use in
the future. At SAIC in the 1980s we built special expensive hardware using ECL
just to get about 25 megaflops to simulate neural nets; now I probably get
better than that on my cellphone :-)

In the late 1980s I was on a DARPA 'neural network tools' advisory panel and
at my company we used neural nets to solve some otherwise very difficult
problems. In a nutshell: one or more hidden neuron layers provide complex
decision surfaces. In a simple toy scenario, with just two input variables,
decision space is a 2D surface. In the 1980s, people often used linear models
where a decision surface in this toy example would be a straight line. With
neural networks (again in this toy scenario) a one hidden layer neural network
allows a fairly arbitrary continuous curve as a decision surface; two hidden
layers allow "islands," basically allowing arbitrary non-contiguous decision
surfaces. Sorry if this is unclear; if we were at a white board together this
would be easy.

