

Google & Stanford create a digital brain that learns to identify a human face - evo_9
http://www.extremetech.com/extreme/131717-google-and-stanford-create-a-digital-brain-that-like-an-infant-learns-to-identify-a-human-face-from-scratch

======
iandanforth
I think about this stuff a lot as part of my day job. I'll discuss the good
and the bad. I would say the most important part of this paper is the 1. the
Size and 2. the Number of images. The size of the model isn't huge.

Most people don't realize that computer vision research is done using tiny
sets of tiny images. Think hundreds to thousands of images that are 32x32
grayscale or black and white. Google did 10 million 200x200 color images.

For a conservative comparison human vision can be thought of as two 15
megapixel cameras capturing at 30 frames per second. (The real numbers are
higher and much more complicated.)

I suspect that given this amount and quality of input data most machine
learning algorithms would have produced better than state of the art results.

Now the bad. Their model totally ignored time.

Most computer vision researchers work with static images. In biology there is
no such thing as a static image. Everything moves. You move, your eyes move,
the environment moves. The brain, as an exception, can understand static
images, but the rule is motion.

A majority of features in biological vision are tied to motion. Light/dark
change cells, moving edge change cells. A _huge_ portion of the information
you derive from sight comes from the temporal context of a given moment. The
change from moment A to moment B is directly encoded and learned and is often
more important than the state at A or the state at B.

I'm not faulting them, in fact, I am extremely happy with their results, but I
want to point out that these results can and will get _much_ better in the
near future as the input size increases and people start to integrate temporal
features into their models.

~~~
fghh45sdfhr3
On the one hand I am an experienced software engineer with a computer science
degree. On the other hand I was never interested in AI and I basically know
"nothing" about AI, computer vision and artificial neural networks. AI always
seemed too far off and bothered my engineering spidey sense for intractable
problems.

So I may embarrass myself with the following...

The more software engineering experience I get the more neural networks bother
me. I have not studied them in detail... and yet, they seem like a crappy
solution.

Throw a lot resources at something, add a bit of math and push a pile of data
through.

Does that ever lead to result that can't be bested though simpler more
traditional methods?

For example, the math behind how the network is organized and learns, that
same math can be used to write algorithms which reveal patterns in data.
Patterns which can then be studied further.

Then there is the doing of things. What ever pattern is responsible for doing
something in a network, the same function can be described much more simply by
an algorithm.

It just seems like neural networks would be interesting only if an alien
spaceship with stupefying computing resources fell on us. And it fell on us in
such a way that we could fully utilize the power of this new super computer
while at the same time we could not learn anything new about algorithms or
anything else. Only then does it seem like "train the neural network" is a
smart strategy.

In every other scenario is just seems like a decent if somewhat lazy start to
an investigation. Really just a way to kick the tires of a problem.

So did I just completely embarrass myself with my lack understanding of ANNs?

~~~
lmkg
If you try to design an algorithm, what you get is a model of how _you think_
your system behaves. If you train a neural network, what you get is a model of
how your system _actually behaves_. In other words, a neural network does not
embed any assumptions or preconceived notions of how the system ought to work.
Intuitively, not incorporating knowledge of how the worlds works ought to be a
disadvantage, but in practice it turns out to be not just an advantage, but an
embarrassingly large one.

Neural Networks aren't the only common approach in modern AI research, but the
other techniques (e.g. Support Vector Machines) have similar properties:
they're dumb methods of finding correlations in data. The field of AI is
slowly coming to grips with the fact that more training data trumps better
algorithms.

That said, I sympathize with your frustrations. You can solve a lot of
problems by throwing more cycles at it, but you don't _learn_ anything, either
about the algorithms or about your target domain. For example, a human plays
the game of Go differently than a machine. A human has an intuition of what
moves are good and what are bad, and only considers a handful; a machine
relies less on evaluating board position and more on brute-forcing through
possible board configurations. As machines get more powerful, eventually the
machines catch up to humans in ability, with that same approach. While their
methods will become impressive in their efficacy, we learn nothing about how
humans are so much better at evaluating board position and selecting moves to
speculate on.

~~~
sanxiyn

      What are you doing?
      I am training a randomly wired neural net to play Tic-Tac-Toe.
      Why is the net wired randomly?
      I do not want it to have any preconceptions of how to play.
      Minsky then shut his eyes.
      Why do you close your eyes?
      Sussman asked his teacher.
      So that the room will be empty.
      At that moment, Sussman was enlightened.
    

\-- Jargon File

------
cs702
The system apparently did not require _any_ training by humans; instead, it
learned to catalog different types of images, including those of human faces,
_on its own_. (In other words, all images were unlabeled.)

Paper here: <http://arxiv.org/abs/1112.6209v3>

[UPDATE: fixed erroneous statement.]

~~~
laserDinosaur
With all the random hyphens in the text, I can't help but read it in the voice
of G-Man from Half Life. Creepy.

------
agrubb
The original work was posted and discussed here previously:

<http://news.ycombinator.com/item?id=4145558>

~~~
ya3r
And the title is absurd.

------
jmduke
This article is incorrect.

>Historically, machine learning has generally been supervised by humans. There
are plenty of examples of computers identifying human faces (or cats) with
incredible accuracy and speed — but only if human operators first tell the
computer what to look for. That the Google/Stanford system starts from scratch
and develops its own ability to classify objects is amazing.

Unsupervised machine learning has been around long before this feat -- which
is nonetheless very impressive.

~~~
marshallp
The algorithm they used did not exist before 2005. Also since 2005 this
algorithm had been mostly trained using just 1 computer at time (usually with
gpu's). The supercomputer setup is novel for this.

Most of the machine learning community had been disregarding this algorithm
since 2005, until the past couple of years it started making dramatic
showings, such as

\- Microsoft Research speech recognition system in 2011 (done by one of geoff
hinton's students interning there)

\- NEC labs, ronan collobert - real-time natural languaege parser - first
real-time parse ever - 2010 (trained on one computer for 3 months using
wikipedia)

\- Google Research 16,000 node billion connection network - december 2011

~~~
jmduke
Again, I don't mean to disparage the achievement! But the article implied that
Google invented unsupervised learning, which they didn't.

------
peterhajas
The article doesn't go in depth in the way of technical details (instead going
for Terminator-joke knee jerkers), but I wonder how different this is when
compared to a traditional neural net.

Sure, a neural network wired up to pixels of images wouldn't be able to come
up with the word "face", but it would certainly group things correctly, right?
Using keypoint matching to "grade" guesses, you could seemingly train the
network appropriately.

Fascinating stuff. I just wish for a less fluffy article. Excited for the
white paper.

------
tluyben2
What are the current limitations of 'just' creating this on a much (much)
larger scale? For instance, 160.000 or 1.6 million cores and would it be a
significant improvement? And for which other fields would that work? I do know
about neural net theory and I worked with them in college assignments quite a
bit 20 years ago, but I'm wondering if anyone here knows what differences this
kind of scale will make or if it's just interesting for still images (in which
i'm very much not interested).

~~~
marshallp
The amazing thing about this algorithm is that it works for any data type (my
other comment talks about speech and nlp).

With more computers and more data it will keep getting better - i've thought a
lot about this and following this stuff for a couple of years - so it's not
just some idle speculation. (I broke this recent story on hacker news a few
days ago, and from here it got picked up by the new york times and spread
everywhere else).

------
caycep
is this covered in the Coursera.org course? :P

~~~
otoburb
Indirectly, yes. The Coursera course teaches K-means clustering as an example
of an unsupervised learning algorithm, along with a technique to work with
larger data sets in a more manageable way (e.g. PCA to reduce dimensionality).

With the basic building blocks from the ML course you have a much better
understanding of what Google and Stanford did. Andrew Ng is also the course
instructor for the Coursera ML course[1].

I am presuming that supplemental readings and Youtube videos by Prof. Ng (e.g.
<http://www.youtube.com/watch?v=ZmNOAtZIgIk>) allows a sufficiently motivated
individual to at least partially replicate the results in the OP.

[1] Andrew Ng is also a co-founder of Coursera.

