Lets go easy on the virtual brain parts and hype. These approaches basically all consider some simple, fixed function f that is parameterized by some unknown numbers W, and everything comes down to finding the numbers W that give some desired result (for voice recognition example, correct answers to a huge dataset of pairs of f( <some sound wave features> ) = <some word>). Now, it is hard to do this efficiently and it's all very interesting, but are we going to reduce AI to finding a conveniently-tuned input -> output function? This is much closer to simple regression than a thinking brain.
Also, this approach is not new but has been around since 1980's. We do have a richer family of models to draw from and a few new tricks, but much of the progress comes primarily from faster computers and infrastructure.
The correct article is as follows:
Google is building an amazing piece of infrastructure that allows them to run stochastic gradient descent across many machines very quickly. This engineering feat is particularly convenient for a certain family of models that have been around for the last many years and, as it turns out, scaling up these models actually seems to work well in several common and important scenarios. Equipped with this hammer, Google is currently busy trying to figure out just how many of their problems are a nail.
Unsupervised deep learning is the heart of what they are doing, and this is a relatively new method (first I saw it was at NIPS in Dec 2006). The 1980s were dominated by two layer supervised feedforward neural networks, which are quite different.
Correct, and an excellent point. Also, using unsupervised learning to process very complex input, that builds simpler models - and then using supervised learning on this simpler input is useful.
I wrote a commercial Go playing program in the late 1970s that was all procedural code. I have been thinking of hitting this problem again (on just a 9x9 board) using this combination of unsupervised learning to reduce the complexity of the input data, and then use a combination of supervised learning and some hand-written code.
Fast hardware was not available before to use deep neural networks (> 1 hidden layer) the last time I did Go playing programming. (There are also new monte carlo techniques that are providing really good results.)
Keep in mind that they are a supervised technique so they can't be compared to what google is doing.
80s style neural networks are flexible and powerful learners. Theoretically they can learn any function, and in practice they often come up with decent solutions. They aren't perfect - there is no guarantee they will find the best solution ('local minima'), and they operate as a black-box, meaning we can't properly interpret them.
I've heard the saying "80s style neural networks are usually the second best solution", which is oversimplified but close to correct.
are we going to reduce AI to finding a conveniently-tuned input -> output function?
You have missed exactly what makes this new research exciting. This is unsupervised learning. That means there is no input->output function during most of the process.
The neural net is presented with piles of undifferentiated data (e.g. YouTube videos) and given no objective function; it decides for itself what the output should be based solely on the features of the input data. Then later we define an objective function (e.g. face detection) and we discover that the neural net has already learned how to compute this function. Furthermore, when we look at the behavior of the neurons in the lower layers, we find that they correspond remarkably closely to the measured behavior of neurons in the lower layers of our own visual cortex.
In the sentence you quoted, I was referring to my voice recognition example, which is usually trained in supervised fashion. Unsupervised pretraining has been thought to help things, but some of the most recent state of the art architectures do not use this step and do just fine.
Then there are unsupervised approaches such as the one Google used to find cats. Even in this case, and even when no labels are given, the way you actually train the network is you force it to reconstruct its input patch after passing through the network (The model used is called RICA, which stands for Reconstruction Independent Component Analysis). Then, you are just learning the same function with the same parameters, but the objective now is f^{-1}(f(x)) = x -- so passing the data to the network and propagating it back to input layer should match the original patch well. The reason this works is that the network is an information bottleneck, so the network must find efficient codes to represent the patches with, if it hopes to do a good job in reconstruction. And, as it turns out, little gabor edges are better description of natural patches than individual pixels so this is what it learns.
Oh and yes the network finds gabor edges that are just like V1 cells in cortex!!! (Neuroscientists will scold if you tell them this, by the way). What they don't mention is that over the years there were gazillion different models in literature that all give rise to gabor edges as efficient code for images. Simple k-means clustering does this too. Signal processing approaches do, there are many many many ways to arrive at this. Therefore, this piece is more of a sanity check than evidence that we are on the right track to untangling how the visual cortex works (still has basically nothing to do with AI or any kind of cognitive functions).
What is fantastic about the times we live in is that finally we have companies out there having both really big data sets and hardware infrastructure allowing them to apply all these soft computing algorithms we used to learn. Google, Facebook, Apple (iTunes), Amazon, etc. they all have amazingly deep real world data sets just begging for multidimensional analysis.
What is close to insane is that practical analysis of big data is not even restricted to these top companies only, as any one of us can set up one own's computing cluster with EC2 and e.g. open-source Weka. I'm sure the next decades will surprise us with many breakthroughs and discussions on data privacy will be moving us closer and closer to total surrender...
"Now, it is hard to do this efficiently and it's all very interesting, but are we going to reduce AI to finding a conveniently-tuned input -> output function? This is much closer to simple regression than a thinking brain."
you're certain your brain processes data through "conveniently tuned functions"? my remote knowledge of the topic has lead me to believe the opposite. how inaccurate are our models of neural network behavior? and in what ways?
I don't consider neural networks "real AI" (whatever that means) but they are extremely useful and with faster hardware, I expect to see a lot more use in the future. At SAIC in the 1980s we built special expensive hardware using ECL just to get about 25 megaflops to simulate neural nets; now I probably get better than that on my cellphone :-)
In the late 1980s I was on a DARPA 'neural network tools' advisory panel and at my company we used neural nets to solve some otherwise very difficult problems. In a nutshell: one or more hidden neuron layers provide complex decision surfaces. In a simple toy scenario, with just two input variables, decision space is a 2D surface. In the 1980s, people often used linear models where a decision surface in this toy example would be a straight line. With neural networks (again in this toy scenario) a one hidden layer neural network allows a fairly arbitrary continuous curve as a decision surface; two hidden layers allow "islands," basically allowing arbitrary non-contiguous decision surfaces. Sorry if this is unclear; if we were at a white board together this would be easy.
Also, this approach is not new but has been around since 1980's. We do have a richer family of models to draw from and a few new tricks, but much of the progress comes primarily from faster computers and infrastructure.
The correct article is as follows: Google is building an amazing piece of infrastructure that allows them to run stochastic gradient descent across many machines very quickly. This engineering feat is particularly convenient for a certain family of models that have been around for the last many years and, as it turns out, scaling up these models actually seems to work well in several common and important scenarios. Equipped with this hammer, Google is currently busy trying to figure out just how many of their problems are a nail.