Hacker News new | past | comments | ask | show | jobs | submit login
Building high-level features using large scale unsupervised learning (research.google.com)
279 points by marshallp on June 22, 2012 | hide | past | web | favorite | 183 comments

Can we perhaps edit "singularity is near" out of the title? This sounds impressive, but having a bunch of racks able to classify the outline of a face is vastly disconnected from machine and humanity merging.

I was going to make the same request. The singularity should be discussed where relevant, not added to everything. This paper is producing high level features from noisy data in an unsupervised fashion -- a human still needs to indicate the task it should be targeted for and a human still needs to provide labelled training data for these high level features to be of use.

This work is interesting enough to warrant detailed discussion on the topic at hand, large scale machine learning, rather than just rehashing discussions of the singularity.

Added: As I can't reply to the comment below I'll do it here =] The network provides learned representations that are discriminative. The aim of the network is to learn high level features representative of the content. One of the many features it produced was one which accurately indicated the presence of a face in the image. Note that they said train a face detector and not classify. For example, from the same network there was a feature which accurate detected cats yet they didn't explicitly train a cat detector either (see the section "Cat and human body detectors"). As the network represents the content as generic features it is clear that, if it reaches a high enough level, those features are essentially classifications themselves.

tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", others with "has cat in image", etc, but these features cannot be used without labelled training.

Actually, what's significant about this work is that labeled training data was not required:

"Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not."

I replied by adding to my comment above as it wouldn't allow me to reply earlier. Reference that.

tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", another to "has cat in image" (see the section "Cat and human body detectors") and so on. Note however that they select the "best neuron" for face classification -- the only way they can do that is via using labelled data and testing all the neurons (where each neuron's activation is a feature). Thus, these features cannot be used without labelled training.

But the difference is that you can show it 1 billion unclassified images, then show it 1000 images you know to be faces, analyzing how its neurons respond to the known inputs to use it to classify the rest of the images.

Strictly speaking, you do need to have some labeled data at the end in order to determine how the neural net views faces, but I think that obscures what's notable about this system.

The amount of human participation involved in training is potentially six or more orders of magnitude less. That's a breakthrough, and a change in kind, not just degree.

In a more general response: I don't think what I stated obscures what's notable about the system, I feel I stated exactly what was notable and specifically avoided overstating it.

Overhyping when it comes to machine learning and AI seems to be the norm and has already hurt AI/ML severely in the past[1].

More specifically: I didn't disagree with anything you've stated, simply pointed out that labeled training data is necessary in response to the statement that it wasn't.The high-level feature extraction the paper discusses is unsupervised but the classifiers it produces are semi-supervised. It's an important distinction.

[1]: http://en.wikipedia.org/wiki/AI_winter

Having a bachelor's emphasis in AI, I think you described it perfectly. I was wondering too from their abstract how they were recognizing "faces" entirely without labels, this makes it clearer. As you said, unsupervised they can find extremely high-level categories. That is pretty impressive.

How does this work? I thought neural nets only learned when they got some kind of feedback that let them know whether what their classification was right (back propagation).

The neural network in this paper, an autoencoder, doesn't require labelled data.

Autoencoders take high dimensional input, map it to a lower dimensional space and then try to recreate the original high dimensional input as closely as possible. The idea is to learn a compressed representation for the data and hope that this compressed representation works as a high level featureset.

As the model is just trying to represent the original input, no labelled data is required for the initial part. Labelled data is later introduced when the high level features are used for classification. What's most interesting about this paper is that one of the features learned by the model maps quite well to "image contains a face" without any prompting by the researchers.

For more details, check out http://www.stanford.edu/class/cs294a/sparseAutoencoder.pdf

Unsupervised training is not particularly significant, and was the original form of neural network.

"our experimental results reveal"

No, they don't. We had one of these in the 1980s.

Did we? A lot of early works in AI were ... /overstated/.[1] While a lot of concepts were created way back when, a lot of results weren't really generated. It's extremely valuable for someone to actually go and do a thing, now that we can, even if someone had the idea for the thing eons ago.

[1] http://dl.acm.org/citation.cfm?id=1045340

The early works in AI with regards to unsupervised learning were in the 1940s and 1950s. Claude Shannon had demonstrated a chess learning system which taught itself by playing him to defeat him in under two weeks as early as 1949.

No, they weren't overstated. They were hyped by a clueless press. There's a pretty critical difference. It's a bit like how the early web pioneers didn't say that the web was going to revolutionize the delivery of dog food; it was a journalist who said that.

"It's extremely valuable for someone to actually go and do a thing, now that we can"

Self organized unsupervised learning was in use for optical classification of potatoes in the feeding of Frito Lay automated processing plants in the late 1970s.

Please distinguish that you haven't actually looked for earlier examples from that you imagine none exist. Thanks.

I find both of your comments extremely condescending, both toward saalweachter and the authors of this article.

1. The fact that Claude Shannon succeeded in training a chess system has virtually no impact on sallweachter's claim that many AI results were overstated.

2. Certainly the press overstated them, which supports saalweachter's premise rather than weakening it. Even if the _implied_ claim was that _researchers_ overstated results, your argument does nothing to weaken this claim.

3. Frito Lay solved a problem several orders of magnitude easier that of face recognition in natural images, which is still very much an open problem in computer vision.

4. Similar to 1., the Frito Lay example contributes nothing to your goal of weakening saalweachter's claim that this is valuable research--a claim which is exceedingly innocuous.

I understand that you've probably got a bone to pick against the many AI naysayers and saalweachter's comments conjured a few common misrepresentations (i.e. (a) that the "AI revolution" burnt-out because it's researchers were somehow naive and (b) that neural networks are something new invented by computer vision researchers). You'd be justified in arguing against these claims, and I'm sure your father (respected AI researcher of the same name) would make them too, if saalweachter had tried to make them (which he didn't). But even if you were justified in making the argument, I would expect a less condescending one that made better use of evidence than the argument you've made here.

"I find both of your comments extremely condescending"

When a comment opens with a tone like this, I usually don't bother to respond, but I'll give you a chance, because you seem to have done a lot of honest mis-reading.

To wit, it may be of value for you to inspect your own tone, if you find public condescention inappropriate.


"1. The fact that Claude Shannon succeeded in training a chess system has virtually no impact on sallweachter's claim that many AI results were overstated."

It wasn't meant to. Sallweatcher's claim was silly. Who cares if many things were overstated? That has zero bearing on that valid work was, in fact, being done.

The purpose of that statement was to remind us that as early as the 1940s, machine learning was able to defeat its own creator at what remains today regarded as a highly intellectual pursuit. My goal was to ignore the FUD of "some people got it wrong" as an attempt to suggest that there was nothing right.

Some people always get some of everything wrong. His claim is tautological and disinteresting. I was politely declining to shame him for it, but since you've presented me as having false goals, I now have no choice but to clarify.

It is generally inappropriate, for reasons like these, to chastize strangers over imagined motivations. Frequently, you don't know strangers' motivations as well as you might imagine from a simple read of a few paragraphs.


"2. Certainly the press overstated them, which supports saalweachter's premise"

You are now repeating something I said to me back to me. From that, you are deriving the false conclusion that because a journalist somewhere said something wrong, an important thing has been discovered.

What I'd like to point out is that the net result of observing that journalists made mistakes is still "so what?"

"Even if the _implied_ claim was that _researchers_ overstated results"

It isn't.

"your argument does nothing to weaken this claim."

You have not correctly identified what I was speaking to. This is akin to telling someone discussing environmental damage that some farmer is talking about crop yield and the speaker hasn't weakened their claim.

Again: so what? I never argued that there are journalists who got things wrong. I'm the one who brought it up.

What does that have to do with my original discussion?


"Frito Lay solved a problem several orders of magnitude easier that of face recognition in natural images"

Discovering defects in potatoes moving at 45 miles an hour inside a water sluice from a single blurry image from a single angle in hard realtime using 1970s hardware is not several orders of magnitude easier than locating things on a face in slow time on modern hardware.

It's actually quite a bit more difficult even in fair conditions. Potato defects are under the surface, and have to be located by subtle color variation. It is not hard to find the characteristic shape and shadow of the nose.

With respect, sir, it's quite clear that this is not something you've done. You're claiming that easy things are more difficult than hard things, and you're forgetting the 40 year technology gap inbetween in your rush to show that a 2012 project is more impressive than a 1973 project.

To be clear, Babbage's mechanical calculator is also more impressive than an algebra solving system made in prolog. Why? Because it's more work and it's more difficult.

Your claim of several orders of magnitude simpler suggests that you are inventing data for the sake of feeling correct in an argument, and that you do not actually have the experience to show correct guesses in this field. That, combined with a tone suggesting that you feel it appropriate to rebuke strangers in public, suggests that I don't really want to much talk to you anymore.


"Frito Lay example contributes nothing to your goal of weakening saalweachter's claim that this is valuable research"

Again, you've misidentified my goal, and the way by which you've done that is to drop a critical piece of his actual claim.

I don't know why you feel that it's okay to guess at people's goals, then tell people how morally wrong your guesses are. I really don't.

My actual goal was to point out the jarring unfamiliarity with the field that both he and you evidence:

"It's extremely valuable for someone to actually go and do a thing, now that we can, even if someone had the idea for the thing eons ago"

The thing I was focussing on was to show him that this thing that he's applauding someone for doing in 2012 for the first time now that it's practical, even though it isn't being used in industry, was actually outclassed by a much more difficult problem on much more limited hardware in realtime 40 years ago by a company that nobody would think of as a technology giant.

The goal was to display just how far out of touch saalweatcher was with the state of the industry.

Please don't speak to my goals anymore. For someone who'd like to speak about condescention (when I think you actually mean arrogance,) for you to tell me what I meant and what I was getting at - incorrectly - then lambast me for it in a tone far more severe than that which you're criticizing is, I admit, difficult to swallow politely.


"I'm sure your father (respected AI researcher of the same name) would make them too"

Do not speak for, or involve, my recently deceased father in your attempt to be correct, sir. Especially not while you're telling someone else they're being rude.

"I would a less condescending one that made better use of evidence than the argument you've made here."

Unfortunately, though you suggest this, taking a brief look through your comment shows that this is not in fact correct. You have been radically uglier than that which you are criticizing, involving personal attacks, false claims of other people's intent, false claims of other people's goals, and the repeat involvement of a recently deceased relative.

I would prefer not to hear from you again. Thanks.

I think you guys are getting a little serious.

Also, this paper is about 20,000 object categories, not just 1 (faces). And the neural network is not the standard type but of the deep learning variety which has only existed since 2005 (invented by geoff hinton, who was also big in neural net circles in the 80s so he's not some newcomer who hasn't done his literature search). One of the couthors of the paper is andrew ng, head of the stanford ai lab, so he's pretty legit.

Claude Shannon had demonstrated a chess learning system which taught itself by playing him to defeat him in under two weeks as early as 1949.

I call "citation needed", and needed quite badly: Shannon was a much better chess player than any program available in 1949.

I have no interest in your presenting your unwillingness to do basic research as if it was a valid form of skepticism.

Whether or not you believe me, everyone else just went ahead and took a quick look, and learned something.

Frankly, I would be happier, given your seeming inability to be a part of this conversation in a polite way, yet also your seeming unwillingness to depart this conversation even after it was requested, that you actually believe I'm wrong, and go around "calling people on this," so that everyone has early warning just how much you actually know about this field, instead of having to wait to listen to you speak.

"Shannon was a much better chess player than any program available in 1949."

On a technicality, this is correct: he started his work on December 29, and it wasn't until five days later, January 2 of 1950, that it was able to beat him.

All the same, you have no idea what you're talking about, and are asserting your beliefs as fact.

The correct way to handle "that doesn't sound right" is a search engine, not putting your hands on your hips and telling someone they're wrong in public.

Go check now, little bird.

At one endgame, perhaps, with a winning position to start with, but not a chess program as currently understood.

Normally I advocate adherence to posting the original article title on HN, but if that had been the case I doubt this article would have ever got enough attention to be upvoted. Singularity is near is over the top.

Agreed. My first reaction was "You're half a dozen zeroes short there chief".

It does this for 20,000 different objects categories - this is getting close to matching human visual ability (and there are huge societal implications if computer reach that standard).

This is the most powerful AI experiment yet conducted (publicly known).

"It does this for 20,000 different objects categories - this is getting close to matching human visual ability"

No, it isn't. This classifier cannot identify theme variations, unknown rotations, will confuse new objects for objects it already knows, is unable to cope with camera distortion, needs fixed lighting, has no capacity for weather, does not work in the time you need to run away from a tiger, requires hundreds of times more data than a human eye presents, and does a far lower quality job, all while completely losing the ability to give a non-boolean response.

To say this is approaching human abilities is to have no idea what human abilities actually are.

"This is the most powerful AI experiment yet conducted"

No, it isn't. Please stop presenting your guesses as facts. Cyc runs circles around this, as do quite a few things from the Netflix challenge, as well as dozens of other things.

I personally have run far larger unsupervised neural networks than this, and I am not a cutting edge researcher.

I'm not a Machine Learning / AI expert, so I have to ask: if running a neural network on 16,000 cores with a training set of 10 million objects isn't cutting edge research -- and if running "far larger" networks than this, as you say you have, also isn't cutting edge research -- then please tell me: what is cutting edge research?

I ask this question in all seriousness; I'd really like to know.

(And yes, I see that your username is that of a noted AI researcher. Who died in 2010. So if you're actually his beta simulation, then I'll indeed be rather impressed...)

I'm his son.

Let's take the example of The Netflix Prize, a $1 million bounty that the movie shipping organization ran several years ago. Their purpose was to improve their ratings prediction algorithm, under the pretext that people frequently ran out of ideas of what to rent, and that a successful suggestion algorithm would keep people as customers longer after that point.

So, they carefully defined the success rate of their algorithm - that is, make it predict some set of actually-rated movies X on a 1-5 half-integer scale, take the arithmetic mean of (the sum of (the square of each error from the real rating)) - which we'll call root mean square error, or RMSE - and you have your "score," where towards zero is perfect.

Their predictor had a score of I think 0.973 something (it's been years, don't quote me on that.) Their challenge was simple.

Beat their score by ten percent, and you trigger a one month end-of-game. At the end of that month, whoever's best wins le prize. One million dollars, obligatory Doctor Evil finger and all.

Netflix provided (anonymized) a little over 100 million actual ratings, where all you had was a userID, a movieID, a real rating, and separately, a mapping "this movieID is this title." You were only allowed to use datasets in your solution that were freely available to everybody, and you had to reveal them and write a paper about your strategy within one month after you accept le prize, honh honh honh.

Seriously, it was awesome. They were going to do a second one, but lawyers, and the world sadded.

So, there, you've got a ten times larger dataset. So surely sixteen thousand cores is the drastic thing, right?

Well, not really. I was running my solution on 32 Teslas, which in the day were $340 in bulk and had 480 cores each. So I actually "only" had 15,360 cores, which falls a whopping four percent short of Google's approach, which several years ago cost me about the price of a recently used car, and which I was able to resell afterwards as used, but without the bulk discount, for almost exactly what I paid for them in the first place.


And I mean, I've got to imagine that someone else chasing that million dollar prize who thought they were going to get it invested more than I did. There were groups of up to a dozen people, data mining companies, etc.

So if one dude sitting in his then-Boise apartment can spend like $11k on a ten times this dataset dataset over a commercial prize?


Cyc still pantses all of us.

An image is a lot more complicated than a pair of ids and a rating. Counting the number of rows in the training database is misleading. I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time (http://councilroom.com , predict player actions given partial game states).

Don't get me wrong, the Netflix prize was cool.

What's cool about this is that Google hasn't given the learning system a high level task. They basically say, figure out a lossy compression for these 10 million images. And then when they examine that compression method, they find that it can effectively generate human faces and cats.

"An image is a lot more complicated than a pair of ids and a rating."

Predicting someone's reaction to a given movie is a lot more complicated than a pair of IDs and a rating, too, it turns out.

Let's take the speculation out of this.

You can get features of an image with simple large blob detection; four recurring boltzmann machines with half a dozen wires each can find the corners of a nose-bounding trapezoid quite easily. They'll get the job done in less than the 1/30 sec screen frame on the limited z80 knockoff in the original Dot Matrix Gameboy. You'll get better than 99% prediction accuracy. It takes about two hours to write the code, and you can train it with 20 or 30 examples unsupervised. I know, because I've done it.

On the other hand, getting 90% prediction accuracy from movie rating results takes teams of professional researchers years of work.


"I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time"

And you won't get anywhere near the prediction accuracy I will with noses. That's the key understanding here.

It's not enough to say "you can do the job." If you want to say one is harder than the other, you actually have to compare the quality of the results.

There is no meaningful discussion of difficulty without discussion of success rates.

I mean I can detect noses on anything by returning 0 if you ignore accuracy.


"What's cool about this is that Google hasn't given the learning system a high level task."

Yes it has. Feature detection is a high level task.


"They basically say, figure out a lossy compression for these 10 million images."

I have never heard a compelling explanation of the claim that locating a bounding box is a form of lossy compression. It is my opinion that this is a piece of false wisdom that people believe because they've heard it often and have never really thought it over.

Typically, someone bumbles out phrases like "information theory" and then completely fails to show any form of the single important characteristic of lossy compression: reconstructibility.

Which, again, is wholly defined by error rate.

Which, again, is what you are casually ignoring while making the claim that finding bounding boxes is harder than predicting human preferences.

Which is false.


"they find that it can effectively generate human faces and cats."

Filling in bounding boxes isn't generation. It's just paint by number geometry. This is roughly equivalent to using a point detector to find largest error against a mesh, then using that to select voronoi regions, then taking the color of that point and filling that region, then suggesting that that's also a form of compression, and that drawing the resulting dataset is generation.

And it isn't, because it isn't signal reductive.

Here, I made one for you, so you could see the difference. Those are my friends Jeff and Joelle. Say hi. The code is double-sloppy, but it makes the point.


See how I'm getting a dataset that isn't compression? See how that dataset is being used to make the original image, but nothing's being generated?

Same thing.

The person who invented the boltzman machines - is - the inventor of this technique. He invented boltzman machines in the 80s and spent over 20 years trying to get them to actually work on difficult tasks.

Your rant about this not being compression or whatever you're trying to say is completely off the mark. You don't seem to understand what this work is about.

The netflix challenge is a supervised learning challenge. You have lots of 'labeled data'. This technique is about using 'unlabeled' data.

(Side note: At one point, Geoff Hinton and his group using this technique had the best result in the netflix challenge, but were beaten out by ensembles of algorithms.)

Cyc has nothing to do with this and is huge failure at AI.

tldr; You don't seem to be knowing what you're talking about after having reading your comments, and seem to readily discount the some of the most prominent machine learning researchers in the world today. You're obscuring important results that newcomers might have found interesting to follow up on.

"and seem to readily discount the some of the most prominent machine learning researchers in the world today."

Your reading skills seem to be up to par, since I have discredited a list of zero people.

"You're obscuring important results that newcomers might have found interesting to follow up on."

Not only have I obscured no results, but this isn't actually something I have the power to do.

I appreciate your insight into the original article and your help placing it into the context of the broader field.

It's a kind claim, but I'm actually pretty much an outsider who dabbles. I haven't a clue what the state of the art is; Cyc is from the 1980s.

Ha' then I'm appreciating your honesty :-)

I'm not a large scale ML person, and not intending to take away from the achievement of the team in the OP, but experiments in large scale, unsupervised learning have been going for a long time (even using the autoencoder approach). When you think about it, large scale requires unsupervised...

Here is an old example with hundreds of millions of records and instances:


Both authors are now with Google.

Also, people here may not be as up to speed on the state of the art in face rec as they think they are. It's not as much of an unsolved problem as it was even 10 years ago.

"When you think about it, large scale requires unsupervised..."

Not necessarily. Crowdsourcing is another option, like Google's image tagging game, reCAPTCHA, et cetera.

Pay a herd of people to do things, and they'll do things for you. You don't have to pay them in money. Telling them they have a high score is often enough.

Yes, "requires" was too strong. I should have said they go well together. I was trying to get at the fact that it's highly common for large-scale work to be unsupervised.

Face recognition usually uses a hand-coded layer followed by a machine learning algorithm. This technique automatically devises that hand-coded layer. It also did this for 20,000 other categories and can also be applied equally well to audio or any other data type. Huge difference.

The large-number-of-categories result was the most novel and surprising to me.

> It does this for 20,000 different objects categories

With 15.8% accuracy.

> This is the most powerful AI experiment yet conducted (publicly known).

It's only powerful because they threw more cores at it than anyone else has previously attempted. From a quick skimming of the paper, there does not appear to be a lot of novel algorithmic contribution here. It's the same basic autoencoder that Hinton proposed years ago. They just added in some speed ups for many cores.

It's a great experiment though. You shouldn't detract from its legitimate contributions by making outlandish claims.

That in itself is fairly interesting, it says we can make dramatic improvements just throwing more processing power at the problem. Whatever happens on the algorithms research side of the problem in coming years, you can count on us having access to more processing power.

I think this is the most important aspect of this paper. Throwing more computing power at the problem increases performance significantly. It is possible that our algorithms are adequate but our hardware is not.

To a computer vision researcher, 15.8% on 20k categories is phenomenal.

> This is the most powerful AI experiment yet conducted (publicly known).

That's an ill-defined statement. AI is a vast and diverse field: what makes one demonstration more "powerful" than another? There are definitely other projects that could be viewed as being in the same class of "powerful" as this cluster.

This is certainly an interesting paper, but it has to be viewed in the context of a large and active field.

That's true. Let's say in machine learning then.

Let's not, because it's still wrong on the order of thirty counterexamples.

Let's say we'll stop making broad proclamations about the global best in a field we know very little about.

Andrew Ng has set many state of the art results on various data sets using similar approaches as the one described in the paper.

Here is a reasonably approachable talk he gave about it. http://www.youtube.com/watch?v=ZmNOAtZIgIk

This is a very interesting talk.

Thank you for sharing with me. :)

In return, I will offer you two interesting non-sequiturs, because I don't have anything topical and a non-sequitur seems like it's worth half what something germane would be.


Bret Victor, "Inventing on Principle." First 5 minutes are terribly boring. Give him a chance; it's 100% worth it.



Damian Conway, "Temporally Quaquaversal Virtual Nanomachine."

It's as funny as it sounds.


I put the singularity bit in to make it relevant for those who are non-technical. This experiment is significant because it shows that large artificial neural networks can be made to work. People have tried and failed at this for decades.

This technigue was "discovered" by geoff hinton at the university of toronto in 2005. However, nobody at tried (or maybe got enough funds) to try it this scale.

If this continues to work at larger and larger scale, this would be a machine learning technique that can work accurately on tasks that are hugely important to society

- accurate speech recognition - human level compuer vision (make human manual labor redundant)

Even so, the singularity bit is editorializing a link to a white paper on an equally significant scale. Nowhere in the link is the singularity referenced.

As for the point about it being for non-technical people, I don't understand where you're coming from. This is hacker news. If people don't understand it and don't upvote it, then that's their problem, not yours.

"I put the singularity bit in to make it relevant for those who are non-technical." Yeah, I'm sure there's a lot of those on HN...I'd expect this kind of crap in something like Wired, but not here.

People are missing the point here.

Yes, 15% accuracy doesn't seem great.

BUT the detector built its own categories(!). It managed to find 20,000 different categories of objects in Youtube videos, and one of these categories corresponded to human faces, and another to cats.

Once the experimenters found the "face detection neuron" and used it to test faces THAT neuron managed 81.7% detection rate(!).

Forget the singularity, and just think about how amazing that is. The system trained itself - without human labelling - to distinguish human faces correctly over 80% of the time.

You're in danger of missing the point too far in the other direction. The system just returns yes/no as to whether an image has a face in it, and if it was hard-coded to respond "no" it would score 64.8%.

Obviously this is extremely impressive work, and given that Google gives away 1e9 core hours a year, I'd like to see how much further they can push this network (which only used 16e3x3x24 ~ 1e6 hours). But this isn't like scoring 80% in a written exam.

I'm also impressed by how readable the paper was. Apart from a few paragraphs of detailed maths this should be accessible to anyone who's read the wikipedia article on neural networks.


* The system just returns yes/no as to whether an image has a face in it, and if it was hard-coded to respond "no" it would score 64.8%.*

Yes, that is true. But ~80% correct is still a significant result.

I was hoping people would read beyond the 15% headline figure to understand exactly what than number meant.

> BUT the detector built its own categories(!).

It's not revolutionary. Clustering algorithms and neural nets are plenty.

Really, what differentiates this network is its scale.

And, thanks to scale, that some of the clusters corresponds to high level concepts. According to the article, earlier attempts have mostly resulted in low level concepts like "edge" or "blob" to be detected.

Also, it was (again, from the article) plausible but not a given that high level concepts could be found from unlabeled data.

That "cat" is one of the high level concept you get from using random Youtube videos as raw data is both impressive, and slightly amusing.

Exactly. Regarding your remark about edge detectors: such self-organizing neural nets are organized into hierarchical layers, and early layers' units are going to learn to become detectors of statistically common components of the input image, in the same way as the initial layers of the visual system perform blob and edge detection (retina, lateral geniculate nucleus, V1). In mathematical terms, these early units learn the conditional principal components of the inputs. The layers that are built upon these detectors, if correctly organized, are going to build upon this initial abstraction and learn more complex features: for instance to find these these edges in relative positions (to each other). Eventually, up the abstraction chain, units detect such statistically frequent features as the shape of cat's ears (common in youtube videos, I imagine), etc...

(Sorry I wrote that fast, I hope it's understandable)

The performance is due more to the architecture than to the scale. The scale is to handle all that data. But the feature learning performance has to do with their layered sparse learning technique which is brilliant. Although, their autoencoder neural network is actually learning a decomposition on the data so the neural part is kind of a red herring.

That you get high level features instead of edges is not the impressive part - you can just as well write a sparse non negative matrix factorization algorithm that will efficiently learn/represent eyes, lips and noses as features of faces unsupervised.

Clustering algorithms operate on features, which typically have to be designed by hand. The appeal of deep learning is that it discovers good features automatically.

Feature sensitivity is typically hand-crafted only because it's the practical thing to do. Neural nets can easily learn visual features. See the LISSOM neural nets for a good example of self-organized learning of features.

There was an interesting discussion on Quora about this recently[0]

The most relevant quote being perhaps:

"The magic of the brain is not the number of neurons, but how the circuits are wired and how they function dynamically. If you put 1 billion transistors together, you don't get a functioning CPU. And if you put 100 billion neurons together, you don't get an intelligent brain."

0. http://www.quora.com/How-big-is-the-largest-feedforward-neur...

That's an interesting discussion, but this experiment suggests exactly the opposite (perhaps that's why you included the discussion). Who knows, if we put 1 billion cores together, and fed it a massive amount of data (akin to what a baby receives as he/she matures), perhaps we would get a brain we would consider "intelligent". The fact that this system was able to pick out high-level features like "face" and "cat" without any prior training -- and with only 1000 cores, not 1 billion -- is quite suggestive that they're on to something.

EDIT: Mistyped number of cores. 1000, not 100.

"That's an interesting discussion, but this experiment suggests exactly the opposite (perhaps that's why you included the discussion)."

It absolutely does not. This experiment supports that position strongly.

What this experiment shows is that said meaningful structure can be progressively, automatically discovered.

"and with only 1000 cores, not 1 billion"

Comparing CPU cores to individual neurons is more than slightly disingenuous.

16,000 cores sounds impressive until you realize it's just five to ten modern GPUs. For Google, it's easier to just run a 1,000 machine job than requisition some GPUs.

See: http://www.nvidia.com/object/tesla-servers.html (4.5 teraflops in one card)

Reminder: GPUs will destroy the world.

What a GPU calls a "core" doesn't at all correspond to what a CPU calls a "core". Going by the CPU definition (something like "something that can schedule memory accesses") a high end GPU will only have 60 or so cores. And going by the the GPU definition (An execution unit) a high end CPU will tend to have 30-something cores.

GPUs do have fundamentally more execution resources, but that comes at a price and not every algorithm will be capable of running faster on a GPU than on a CPU. If neural networks just involve multiplying lots of matrices together with little branching they might be well suited to GPUs, but most AI code isn't like that.

"What a GPU calls a "core" doesn't at all correspond to what a CPU calls a "core"."

They aren't as different as you imagine. They're general purpose programmable arithmetic units with processing rates on the order of 20-30% of CPUs, provided the limitation that they're all doing roughly the same thing.

For most machine learning tasks, that's exactly what you're doing anyway. Oh no, your neural network engine has to be parallel? C'est damage!

A CPU core isn't a general purpose programmable arithmetic unit, though. In fact what you call a "core" when you're talking about CPUs is composed of multiple such general purpose programmable, as well as less general purpose memory load/store units that can still be used for basic arithmetic and a instruction fetch and scheduling system. So a core in you Intel iFOO processor is structurally equivalent to what NVidia calls an SM. Now, and NVidia SM has 48 execution units to Intels 6, but it operates at a lower frequency and doesn't have the bypass network, branch predictor, memory prefetcher, etc that you could find in an Intel core. So there are some tasks where the Intel core will be much faster than the NVidia SM, and some tasks where the NVidia SM will be much faster. And the case here does seem like one where the GPU has an advantage. But saying that the NVidia GPU has 1526 "cores" is just dishonest.

"In fact what you call a "core" when you're talking about CPUs is composed of multiple such general purpose programmable, as well as less general purpose memory load/store units"

So are GPU cores.

"But saying that the NVidia GPU has 1526 "cores" is just dishonest."

No, it isn't. You can run 1536 things in parallel at speeds that would have qualified as full cpu speeds several years prior.

Something isn't any less a core merely because it does less juggling magic, and that juggling magic is actually undesirable for a heavily parallelized task.

"So there are some tasks where the Intel core will be much faster than the NVidia SM, and some tasks where the NVidia SM will be much faster."

This conversation already has a context. Arguments which ignore that context completely miss the point.

If you don't understand how I achieved the amount of processing I did, that's fine. Playing games with the semantics of a "core" somehow magically requiring all the features of current Intel-strategy chips, though, are not going to convince me.

There is more to Heaven and Earth, Horatio, than is dreamt of in Intel's philosophy. This sort of attitude towards what constitutes the no true scotsman "a real core" is why Arm is in the process of eating Intel alive, and why Tilera stands a decent chance of doing the same thing to ARM.

This is merely extreme RISC. I realize it's sort of a tradition for the modern VLIW movement to suggest that if you can't double-backflip through a flaming hoop made out of predictive NAND gates it somehow doesn't count.

But, if you actually look, the rate of modern supercomputing going to video cards is rising dramatically.

So obviously they count as cores to somebody.

You also seem to have missed the point. It's not the core scale that we're discussing here. It's the dataset scale. The number of cores you throw at a problem is not terribly important; 20 years ago it would have been breathtaking to throw 32 cores at a problem, and now that's two CPUs.

What makes an experiment cutting edge is the nature of the experiment, not the volume of hardware that you throw at it. I was talking about the /data/ and the /problem/ . Predicting movie ratings is a hell of a lot harder than feature detection.

OpenCV has rewritten several of its algorithms for GPU's. http://www.opencv.org.cn/opencvdoc/2.3.2/html/modules/gpu/do... In general, the GPU versions are faster but you need to be cognizant of data transfer times between memory and the GPU. Relative speed also depends on which CPU and GPU's you have access to and the quality of the GPU vs CPU algorithm implementations. For example, my 2012 Macbook CPU is faster than my 2011 Macbook GPU for certain OpenCV algorithms.

The problem is not the theoretical peak teraflops. The problem is actually achieving those teraflops with useful work. Due to architecture that is easier on a CPU than on a GPU, so you can't directly compare teraflops and conclude that GPUs are superior. Getting something to run fast on a GPU is very difficult.

And actually the thing that does 4.5 teraflops in single precision does only 95 gigaflops in double precision per GPU. A good x86 CPU does ~100 gigaflops in double precision as well, and you're much more likely to actually achieve that number on a x86. Although another one on the page you linked to theoretically does 665 gigaflops double precision.

Single precision is probably fine for a neural network. Neural networks are somewhat insensitive to noise and failure and single precision adds very little noise.

I don't think that one CPU core is exactly comparable to one GPU core.

The singularity is a poorly constructed myth. It is built around the presumption that intelligence is a linear function of CPU power, and that surely as CPU power rises, so shall intelligence; the problem is, that prediction was made in the 1970s, since which CPU power has risen ten decimal orders of magnitude, and we still don't have much better speech recognition than we did back then, let alone anything even approaching simple reasoning.

The ability to detect faces is not a signal that general intelligence is right around the corner.

Well, maybe. There are a whole lot of very different things called "The Singularity" and some of them are much more reasonable than others.

There's the Cambpellian Singularity, which says that we won't be able to predict what will happen next. Pretty non-controversial as far as it goes.

There's the Vingean Singularity, which says that if we ever develop AIs that can think as fast and as well as humans then due to Moore's Law they'll be thinking twice as fast as humans after 2 years, so they'll start designing chips and the period of Moore's law will fall to 1 year, and so on with us reaching infinite computing power in finite time. I think this vision is flawed.

Relatedly, there's the Intelligence Explosion Singularity (associated with Yudkowsky), which says that as soon as its AIs designing AIs, smarter AIs will relativly quickly be able to make even smarter AIs and we'll get a "fwoosh" effect, though not to infinity in finite time. I find this unlikely, but can't rule it out.

There's one I don't have a handy name for, but lets call it the AI Revolution viewpoint, which is that AIs will cause civilization to switch to a faster mode of progress, just like the Agricultural Revolution and Industrial Revolution did. This one will only look like a singularity in hindsight, and might seem gradual to the people living through it. I think this one is pretty credible.

There's the Kurzweilian Singularity, where thanks to Accelerating Change we'll someday pass a point which will arbitrarily be called the Singularity. As far as I can tell this is Kurzweil appropriating the hot word of the moment for his ideas a la Javascript.

Then there's the Naive Singularity, which equates processing power with intelligence and then concludes that computers must be getting smarter. This is indeed totally naive and not something we should worry about. I guess the linked paper is evidence that you can substitute a faster computer for smarter AI researchers to some extent, but probably not a very large one.

Your characterization of Vinge's singularity is incorrect. I have never read anything in which he brings up infinity, I.J. Good does though. Vinge's is actually more like your AI revolution except that it will be evident as a singularity only to those looking forward and not to those looking backwards. So instead of using agriculture/industrial divide as an analogy he posits a human/animal divide.

As his definition of singularity is pretty strongly tied to comprehension think of it like this - the singularity is the time point after which a 10 year old unmodified human child from 1000 AD can not grow up to understand his or her surroundings.

I could have sworn that that argument came from a short piece of non-fiction Vinge wrote which was my first exposure to the whole idea, but I might be mis-remembering because it was a long time ago. Or, given that this was a long time ago, he might have changed his viewpoint.

"There are a whole lot of very different things called "The Singularity" "

If the singularity was a legitimate concept with anything approaching experimental evidence, then this could not be true. This observation of yours - with which I agree - suggests to me that The Singularity needs a pope hat.

It is instructive to notice that all of "the singularities" are the products of science fiction authors, and in the case of the original, a particularly bad one.

There is a delightful level of schadenfreude involved in observing the multiplicity of "The Singularities." In two different ways its name says "there's only one," and yet they still can't agree on topics that are critical and fundamental to the concept itself, like the definition of intelligence, or whether or not to circumcise.

Pass the sacramental chalice, please?

Vinge is a retired math professor, I.J. Good was an accomplished mathematician, Kurzweil makes hard to swallow predictions but is still an accomplished technologist and I happen to very much enjoy Vinge's writing. Regardless, the idea is worth considering independent of who is saying it.

I am also sure you know that words - such as variety and polymorphism - have different context specific meanings. Singularity in this case as in the kind of thing you can find on a variety but not on a manifold.

The idea of infinite recursive Moore's law fueled intelligence explosions leading to super human intellects by 2030 is something I assign a low probability to. I don't find it hard to believe that there is some point in the future - say 2131 - such that if anyone alive today or previously were transported there, they would never be able to understand what was going on and everyone from that time would think circles around them.

Maybe you didn't realize this, but the science fiction author bit wasn't meant as a slander. Many high quality people, and also Kurtzweil, are science fiction authors.

What I was getting at was "you realize they're writing books to make people happy for money, not doing legitimate science on that day, right?"


"words - such as variety and polymorphism - have different context specific meanings."

Sure. All handwaving about the rules of language notwithstanding, though, none of The Singularities have merit or underlying measurement, even if you want to talk syntax and grammar to create a seeming of academia by proxy.


"The idea of infinite recursive Moore's law"

... is nonsense. What would "recursion" be in the context of Moore's law? Have you even thought this over?

What, Moore's Law solves itself by going deeper into itself until the datastructure is exhausted?


"fueled intelligence explosions"

The science fiction part. I mean, you might as well say "fuelled by warp drives," because there's no evidence they're going to happen either. Or unicorns.


"is something I assign a low probability to."

This suggests that you don't know what probabilities are. Probabilities are either frequentist, which cannot happen here because we have no knowledge of the rates here (this would be like calculating the frequentist probability of alien life - it's just making numbers up,) or Bayesian, where you draw probabilities from observed events, at which point the probability is exactly zero.

So, is it undefined or zero that you're promoting?


"I don't find it hard to believe that there is some point in the future - say 2131"

(rolls eyes)


"they would never be able to understand what was going on and everyone from that time would think circles around them."

It seems you don't even need to be transported into the future for that.

1.) Okay.

2.) Singularity as in breakdown not as in single. You purposely muddled the meaning to make your quip work.

3.) Moore's law fueled as in AI gets interest on their intelligence. Recursive as in AI makes smarter AI makes smarter AI...

4.) Science fiction or not i find it unlikely.

5.) Bayesian. Look up prior.

6.) 2131 was tongue in cheek.

7.) Thanks ;) You actually never address my main point though.

"2.) Singularity as in breakdown not as in single. You purposely muddled the meaning to make your quip work."

This is a blatant falsehood. I have solely and exclusively used it as a title for Kurtzweil's concept. It has no meaning; it's a name. I have muddled nothing. It is inappropriate for you to make accusations like this without evidence.


"3.) Moore's law fueled as in AI gets interest on their intelligence"

Yes, that's what I said at the outset: this whole thing is driven by the false belief that intelligence is a function of CPU time. There is no experimental evidence in history to support this, and there are 65 years of counter-examples.

Repeating it won't make it less wrong.


"Recursive as in AI makes smarter AI makes smarter AI..."


This gets to a different false presumption, namely that the ability to create an intelligence, as well as that the power of the intelligence created, is a linear function of the prior intelligence.

This whole treating everything like it's a score, like it's a number you tweak upwards? It's crap.

You can't make an AI with an IQ of 106 just because you have a 104, and the guy who made the 104 had a 102.

This is numerology, not computer science.


"4.) Science fiction or not i find it unlikely."

I can't even tell what noun you're attached to, at this point. What do you find unlikely?


"5.) Bayesian. Look up prior."

What about bayesian, sir? I don't need to look up prior; I used it, correctly, in what I said to you. You're just telling me to look things up to pretend that there is an error there, so that you can take the position of being correct without actually having done the work.

There are zero priors of alien life, sir. That was my point, in bringing up what you're now blandly one-word repeating at me, in your effort to gin up a falsehood where none actually exists.


"7.) Thanks ;) You actually never address my main point though."

You don't appear to have one.

Maybe you've forgotten that you were replying to someone else, who already said that to you?

"Whose work do they all point to? Moore, Lanier, Holland, and Hawkins. Guess what all four of them say they think?"

Here are some pretty uninformed statements by tech luminaries: http://spectrum.ieee.org/computing/hardware/tech-luminaries-...

They mostly don't have the philosophical or synthetic chops to make intelligent statements about the singularity. Moore misunderstood his own observation for at least the first ten years after he made it, not to minimize his important contributions.

Singularity is an unfortunate term, because it's technically incorrect and logically contradictory. My work is not trying to make a singularity; it is trying to make recursively self-improving machine-human intelligence which interacts with and learns from its environment. This is not impossible; it is merely technically difficult.

It is also a hypothesis. That's all. It isn't a phenomenon, so we cannot yet be good Aristoteleans and observe it. Therefore, we can't develop a science of it. You'd avoid this entire webpage's worth of argument if you'd just simply remember that the singularity is nothing more than this hypothesis.

Repeat: The singularity is not a phenomenon, nor is it a theory; it is a hypothesis.

For those who wish to believe it is a correct hypothesis, and it is the future, well, get busy doing the hard work and developing the the technology to make it the future. For those who don't believe it could be a possible future, get out of our way, since you're so damn sure you're right.

In general, a singularity is a point at which an equation, surface, etc., blows up or becomes degenerate. Singularities are often also called singular points.

To call Vinge a particularly bad science fiction author says more about your critical acumen than about him. (Perhaps you're thinking about his ex-wife?)

(I read the "singularity is near" in the article title as ironic - almost parodic).

"In general, a singularity is a point at which an equation, surface, etc., blows up or becomes degenerate."

Do you also participate in discussions of Microsoft Surface by explaining that in general, a surface is a flat exterior of a coherent object?


"To call Vinge a particularly bad science fiction author"

Vinge didn't come up the singularity; Kurtzweil did. Kurtzweil did. I quite like Vernor Vinge's work.


"says more about your critical acumen"

Acumen is the ability to make good numeric estimations on the spot, such as business decisions.

Despite that I quite enjoy Vernor Vinge's work, I also feel it important to point out that merely should someone dislike an author you like would not, in fact, be a measurable sleight against their intelligence, any more than liking different pizza toppings would be.


"I read the "singularity is near" in the article title as ironic"

That's interesting. If that's correct, then you have a point. (Also, bravo for being part of the one percent of the internet who knows what that word correctly means. I mean that in earnest.)

Vinge did coin the term singularity.

The term was coined by science fiction writer Vernor Vinge, who argues that artificial intelligence, human biological enhancement or brain-computer interfaces could be possible causes of the singularity.


It certainly lacks experimental evidence, but then again so does all other speculation about the future. That doesn't that it's illegitimate, just that it isn't scientific. On the other hand a lot of the terminology and discussion around "The Singularity" does tend to be confused, and I think that we might all be better off if we stopped using that term.

I should also point out that you're exagerating the link between the idea and science fiction authors. Campbell was a science fiction author (well mostly an editor but close enough) and Vinge was too, though Vinge was also a CS professor. The people I'd associate with the other schools of thought aren't science fiction authors, though.

"It certainly lacks experimental evidence, but then again so does all other speculation about the future."

We disagree here. The singularity made measurable predictions, and not only has every single one that's come to pass failed without exception, but the remainder are no closer to happening than the day they were made.

Remember, originally, computers were supposed to be smarter than us back in the 90s, when people still thought The Simpsons was funny.


"I think that we might all be better off if we stopped using that term."

We agree, though I think for different reasons. If I understand you correctly, you are suggesting that we forego this term in favor of clearer, better defined ones, but keep the idea.

I think we should actually reject the concept.


"Campbell was a science fiction author (well mostly an editor but close enough) and Vinge was too, though Vinge was also a CS professor. The people I'd associate with the other schools of thought aren't science fiction authors, though."

Minsky, who associates with that school of thought, is. Kurtzweil, who originated this school of thought, is. Stanislaw Ulam, Damien Broderick, Hans Moravec, Greg Egan, Nancy Kress, Larry Niven, Dean Ing, Samuel Delaney, Ray Solomonoff, Pohl, Aasimov, Steele, other-Steele, Yudkowsky, the founder of Singularity University (which is not legally a university) Peter Diamandis, Aubrey de Gray, et cetera.

Sort of the germane understanding is to look at their work. Whose work do they all point to? Moore, Lanier, Holland, and Hawkins.

Guess what all four of them say they think?


"I should also point out that you're exagerating the link between the idea and science fiction authors."

I don't think that I am. Every proponent of The Singularity I am aware of writes speculative fiction for money, without exception. The list I gave above isn't even close to exhaustive.

And, again, that's not the actual point I'm making. This isn't about "the link between" The Various Singularities and speculative fiction; I'm asserting directly that every single Singular proposal is itself science fiction.

I'm not saying it's written by science fiction people. Aasimov did legitimate speculative engineering, for example, in the cases of geosynchronous orbits and the space elevator, and arguably in arcological discussion. I would not say that his work was science fiction, despite that he's a science fiction author.


Because there are real numbers involved. There's real math. There are real equations. He knew the fuel demands, the weight of the building, the energy requirements. He wasn't writing fun stories; he was doing real work.

Spend two weeks. You will _never_ find real work done around the singularity. It hasn't been done.

Earlier you suggested that all speculation about the future was so; I do not agree. We have several quite good designs for an arcology on Mars which would actually work. Seward's Folly is speculative, but it's actual legitimate work; it could be built.

The Singularity is just a cute story.


"The people I'd associate with the other schools of thought aren't science fiction authors, though."

Schools of thought is a great phrase to try to bring gravitas to a situation where it isn't warranted.

I'll pay more attention when I see a single work on The Singularity which could pass muster as a freshman thesis at a second string state school.

There's lots of material out there about infinite energy devices, too. Ask yourself a question: if you didn't have the Second Law of Thermodynamics, how would you tell the free energy devices apart from the legitimate ones?

How is it that you know Andrea Rossi doesn't really have cold fusion?

Try some skepticism. It's delicious, and low cholesterol.

The only originality in this work is the processing power used. The principle in it self is not original.

For a possible resemblance with the real cortical neural network working principle and face or object recognition, this is just a farce.

Regarding getting closer to the presumed singularity, this is like saying that cutting flint is close to making diamonds.

The authors didn't claim that, but the abusive use of "neural network" for such kinds of applications is just doing that. It is a dishonest abuse of people who can't make the difference.

The true problem is that significant quality work toward modeling real cortical neural network is drown in the sea of such faker crap.

I think much of this is overhyped as well, but I disagree that whether modeling human brain structure is relevant. The term "neural network" has historical baggage (responsible for some of the hype), but these days refers to a class of mathematical approaches with only historical connection to "neurons". Those can be interesting on their own for AI purposes, and imo accurate modeling of the human brian, while interesting for neuroscience research, is not necessarily the way forward for AI research.

My use of modeling was indeed inappropriate. What I meant is the working principle of cortical neural networks. The artificial model would then be a kind of proof of concept.

Regarding you other point, it is a matter of research strategy. I think that the path trying understanding the working principle of real cortical neural network is the shortest path to AI. My impression is that the other path which is to play around with artificial neural networks is too hazardous.

We can make a parallel with learning to fly. We are in a similar situation regarding how the brain works and AI. Understanding how birds fly require a true research. People seems to simply focus on flapping while this is not the real working principle of flight.

I see there a strong analogy with artificial neural network. The most relevant properties of cortical neural networks are ignored.

With flight the proof condition of mastering it was obvious. With AI, it is less obvious. I would be glad to hear suggestions. Face recognition is the most difficult condition because this process is the end product of many prior processes like 3D perception and feature extractions. My current impression is that talk decoding would be a much better candidate. Siri shows the potential impact of such AI product. At least the turing test would be a direct match.

I guess I take the opposite suggestion from the flight example: we succeeded in flying once we started focusing more on the physical/mathematical research of aerodynamics and lift, and less on attempting to mimic biology, in copying the biomechanics of bird wings. We ended up producing something that flies, but not in exactly the way that birds fly. That's what I tend to view as the better route for AI as well: instead of trying to copy the details of how a brain works, focus more on first-principles mathematical/logical principles of inference, whether they're symbolic ones (e.g. theorem-proving) or statistical ones (Bayesian networks, etc.).

Admittedly this is a big area of disagreement both within and outside the field.

You'd be surprised at how little precision this much computing power gets you, even on very basic classifiers. If anything, working on this kind of stuff has given me an appreciation for just how far away the singularity may be.

15.8% accuracy != singularity is near.

No more supplements-eating Kurzweil, walking Terminators and Skynet-like BS please.

I wonder what accuracy would human get if you trained him/her only with 10 million static 200x200 px images in complete silence.

Babies can outperform any CV model with less data than that.

I strongly doubt that. I think babies have much richer input than any CV system up to date. In my opinion movement is crucial for understanding image.

A human baby learns in a very interesting way. By using an overcomplete basis to sparsely code data. This method has only recently started getting attention in ML.

A human baby learns from uncleaned raw data using far less energy with better generalization than a computer and fuses large amount of data without suffering from dimensionality curses.

I think it is safe to say that human babies are still ahead. for now.

I agree that CV is currently behind human sight. I just argue that this is more due to lacking input processing than to actual classification engine. The reason they used 200x200 px images, I think, was that bigger images couldn't be analysed in sufficient quantity.

In a way their first input processing of the real images was pixelisation filter. If you feed pixelised image to a person you see how much information is lost. If you make single pixels occupy significant portion of view person might loose ability to recognize the image at all. Feeding so little information to CV system is like trying to teach nearly blind man to see.

To improve CV we should focus on finding best ways of converting full resolution visual data to something of smaller volume in such way that important features are preserved. This input data IMO should also include time. I, thanks to crappy eyesight often recognize people, actions, objects relying more on how they move not how they look to me. Even with sharp eysight sometime your vision just gets stuck and can't recognize what is in the scene you are currently looking at. You can't understand what you see until something in the scene moves or you move a bit.

>>To improve CV we should focus on finding best ways of converting full resolution visual data to something of smaller volume in such way that important features are preserved.

You are exactly right! See Dictionary Learning, Random projection, compressive sensing. As for time, perhaps you are right I don't know. That question is: would a suitably written video trained classifier that preserved temporal features do better on image classification?

I think that it might be the case as I believe that there is a lot of information in movement (or in 3d that is made observable thanks to movement) than makes dividing objects into categories much easier task. There must be a lot of shortcuts that can be made thanks to the fact that objects are not something completely arbitrary but physical, usually solids and have to obey relatively simple rules of reality. Once this classifier is taught with reach data you can match newly observed object to one of the classes, relying only on very small amount of information, for example its 2d small resolution image.

Seeing is in my opinion very similar to understanding language in sense that the information that is transferred, observed image or words heard are just small fuzzy fragments. Sender (speaker, or in case of vision, physical world) has rich model and recipient has rich model of all the things they can communicate about and the actual information passed, only indicates the parts of the underlying model to the recipient, that he should select and how he should modify them to get the message.

Building usable model from small fuzzy fragments of information that are passed when recognizing image or hearing spoken words should be incredibly hard task and I think no biological brain could do that. I think that absorbing as much real information as possible at the time of training the classifier is absolutely crucial for achieving anything close to what humans or animals can do.

From the techniques you mentioned, dictionary learning looks most awesome to me, and most applicable to CV.

I don't think this was "complete silence" in this case. They actually trained the computer (meaning they provided data as input).

scotty79 means to showing a human only the 10 million 200x200 images. We probably see a lot more than that, in 3D from different perspectives with continuity in motion, and in a lot more detail and possibilities for filtering.

> scotty79 means to showing a human only the 10 million 200x200 images

Then, what kind of thing are you measuring? Recognizing patterns ? We know humans are very good at that. We can see thousands of different people everyday but we can recognize in the blink of an eye a familiar face. A computer or computer network is very, very, very far from being able to do that yet.

But why are humans are very good at that? Could it be because humans have gotten tons of high-resolution, 3D imagery with binaural audio? Whereas this computer got a handful of low-res still images. Is the solution to just throw more horsepower at the problem, or is there really some inherent quality of the brain that's different?

edit: this quote puts things into perspective a bit "It is worth noting that our network is still tiny compared to the human visual cortex, which is 1,000,000 times larger in terms of the number of neurons and synapses."

Also, consider training time. I doubt babies are recognizing 20k objects at 15% success rate after just a few days. Though to really compare that, the speed of the brain vs the speed of the supercomputer has to be normalized for, and training data and network size has to be similar as well of course for any real comparison.

By "silence" I meant only image, no other input for example no labeling images in any way.

"supplements-eating Kurzweil", ROTFL!

Maybe I'm missing something here, but how exactly is it "unlabeled" data if they're specifically feeding it millions of pictures of "faces"? I mean, if you make a specific selection of the type of images you train the network on, isn't that basically equivalent to labeling them?

The aim of the paper was to produce an unsupervised system that would generate high level features from noisy data. These high level features could then be used in supervised systems where labelled data is added.

Thus, the paper is about using an unsupervised system to help a later supervised system. An advantage of this is that, as the unsupervised system isn't trained to recognise object X, it instead learns features that are discriminative. This same network could be used to recognise arbitrary objects (which is what they do later on in the paper with ImageNet).

In other words: imagine a baby. She sees 100k "images" of faces. Thanks to the statistical regularity of the world, she now has a "subsystem" that recognizes a face in the absence of her knowing what it's called. Then, when she is told "this is a face" she pins this thing pointed-to to the existing, unnamed representation.

Nope. Not all the images contained faces (cats, bodies, etc.). There was no specific face-detection code. The system just learned the concepts from the data. http://en.wikipedia.org/wiki/Unsupervised_learning

You're correct: it isn't an unlabelled system, and the article author is deeply confused about basic topics in artificial intelligence.

What he's trying to talk about is "this is an unsupervised feature detector in a large dataset which is only categorized, and where no human has provided correct answers up front to verify progress."

The reason this matters (and it doesn't matter very much) is that that means that in cases where it's prohibitive to provide training sets, such as where you don't know the good answer yourself, or where giving a decent range of good answers would be difficult, this sort of approach can still be used.

"isn't that basically equivalent to labeling them?"

Yes. It is. The original poster is confused.

What he meant to say was "there is no training set."

The article authors of the paper? How can you say they are deeply confused - have you not seen their previous work and presentations? Everything else you say I agree with.

No. The author of the Y! Combinator article.

I apologize for being vague, and shall endeavor to be clearer in the future.

They are feeding 10M frames from random YouTube videos, 1 frame per video. Only 3% of 60x60 patches from those frames contained faces

While I'm excited about progress, 15.8% accuracy is not exactly "Singularity is near"

It detects faces with 81% accuracy. That isn't bad at all...

Also, the previous best on the same dataset was 9.3%

before, there was room to double accuracy 3 times. Now, there's not. If i understood correctly their approach can take advantage of parallelism. I'm not saying they can just throw 128k cores at the problem and be done, adding 2^n resources will likely have a nice boost to results.

OTOH, it's late and i might be way off.

They just have to scale it up - more computers, more days, and the accuracy level should increase accordingly (that is my intuition and hope on this, though i could be wrong).

Haha, no.

--- last company was in computer vision.

"last company was in computer vision."

Did you sell, leave or did it fail? Why? I have some ideas that I think are novel applications of computer vision, and just within the range of what's feasible, but it seems that most computer vision applications look like that at first, and then after 90% done find out that the second 90% is exponentially harder and, realistically, infeasible. How could I test my ideas against that? Or am I asking from wrong premises?

Company was doing well and had a good idea. Product worked great, we did our job. The problem is, management didn't.

I left after all the other engineers did.

Second reply:

Email me if you want to discuss practical application.

This isn't singularity material. While this may not be a bog standard neural network, it has no feedback. It cannot think, because thinking requires reflection. It is trained by adjusting the weights of the connections after the fact using an equation.

Is it cool, and perhaps even useful? Yes. But don't confuse this research project for a precursor to skynet.

I put the singularity bit in to make it relevant to people who would otherwise not get the significance of this (which is that large scale neural nets can work - something people have been trying and failing at for decades).

Or you put it in as a cheap link-bait tactic.

The site is google research - I have no ads on it. This paper's been out for weeks but no mentions anywhere - thought i would give it a deserving push.

"We also find that the same network is sensitive to other high-level concepts such as cat faces..."

"Our training dataset is constructed by sampling frames from 10 million YouTube videos."

If anyone is interested to read more on this topic, there is another recent, closely related and perhaps slightly more accessible paper ( "High-Level Invariant Features with Scalable Clustering Algorithms" http://bit.ly/KDuN04 ) from Stanford that also learns face neurons from unsupervised collection of images (disclaimer: I'm co-author). It uses a slightly different model based on layers of k-means clustering and linking, but the computation in the end is very similar.

I'm familiar with both models so I can also try to answer any questions.

I think your comment http://news.ycombinator.com/item?id=3838971 is very relevant to this discussion.

I'm seriously considering quitting my job and studying ML for a few months in a desperate attempt to get work in projects like this. I feel like I'm missing out but too dumb for traditional grad school.

That's what I did a few months ago - quit my job and decided to go to a grad school to study AI (with focus on neural nets and ML).

What did you learn so far about neural nets? I recently looked into machine learning and naively thought I could find at least one practical fun tutorial "Here is a neural network API in C, you have to do that and this to let a simulated robot evade obstacles or learn to play Asteroids". Instead my (extremely superficial) search did find that neural nets are pretty arcane, genetic algorithms trapped in local minima and you are faster and better of coding logic yourself, developing a mathematical model to calculate results, instead searching for patterns in vast sets of data.

SVMs are a relatively easy to use (but not to understand) method that yield impressive results for beginners. See the libsvm website, there are plenty of good material there (http://www.csie.ntu.edu.tw/~cjlin/libsvm/). But as stated somewhere else here, game AI is a whole different story.

If you are trying to make game AI then neural networks are a bad idea. But if you are just trying to have some fun learning then do whatever you want. That said, MLPs are hard to use properly, not a good place to start.

A good way to start "AI". Write a decision tree they will serve you well and with boosting do even better. Basic but useful stuff: logistic regression, armed bandits, weighted experts, kNearest, k means , Kernel Density estimation and Naive Bayes. That covers online, ensemble, super and unsuper vised algorithms. Goodluck!

Here's a fun toy project using neural networks.


I learned that neural nets research requires serious math knowledge, so that's what I'm working on currently. One fun project to start learning about NNs is a balancing pole problem. There are solutions in various languages.

Yes, for some problems it might be faster and better to code logic yourself, but there are also tasks (such as pattern recognition) where NNs might be more effective.

try python's scikit.learn library

I'm too dumb for gradschool. It'll have to be self taught.

"and studying ML for a few months"

Could anyone with expertise say if this would be enough to build a foundation? How much math background do you need?

You need at least probability/statistics, linear algebra, calculus, and numerical methods. Once you know this machine learning is a relatively thin layer built on this math foundation. The real problem is learning the math foundation.

It might be enough to be able to use tools that other people have built, and have a rough idea of what's going on under the hood, and how to select which algorithm to use in a very general sense. It won't be enough for you to be designing your own algorithms.

If you like math, Caltech's "Learning from Data" is awesome http://work.caltech.edu/telecourse.html

Yes, often. Thanks for the links.

Fascinating, exciting times we live in. I'll upvote the link as soon as the "singularity" silliness is taken out of the title.

From the article: "It is worth noting that our network is still tiny compared to the human visual cortex, which is 1,000,000 times larger in terms of the number of neurons and synapses."

> the dataset has 10 million 200x200 pixel images downloaded from the Internet

They take the frames from YouTube. It is weird to me that YouTube, (derided as a way of sharing funny cat videos) is able to contribute something actually useful to the world.

Youtube contains a lot of educational material you would not find otherwise. If I wanted to learn sewing on a sewing machine, I would just watch some video tutorials - try that with a book on sewing. Same thing with instructions on how to play instruments. Heck you can even watch videos on how to fix problems with your car engine. Many procedural instructions can't be transported properly via papers or books. It also allows asynchronous video messaging for laymen asking experts stuff which is difficult to get via written text. I bet that youtube will contribute very much to knowledge preservation and distribution in the long term.

I hope that Google look after it a bit better than they've looked after their Usenet archives. Google groups search is a particularly frustrating experience.

I agree that there is some great content on Youtube. Interesting that you mention sewing machines, because that's something I've used and they are particularly helpful. (See also all those other crafting videos; latch-hooking etc.)

As far as I can tell, this is let's train a huge number of models and then cherry-pick few that works well on a test set, so an overfitted junk. What have I missed?

It seems like they are cherry picking the one that works well on the training set no? Where did you get the sense that they were doing it on test?

The train is not labeled, so it is not possible; and they do not mention that the labeled set was split or used in validation -- it is just called "test".

"We followed the experimental protocols specified by (Deng et al., 2010; Sanchez & Perronnin, 2011), in which, the datasets are randomly split into two halves for training and validation. We report the performance on the validation set and compare against state-of-theart baselines in Table 2. Note that the splits are not identical to previous work but validation set performances vary slightly across different splits."

As I understand, this is only about this side experiment with ImageNet data which uses logistic regression on those neurons in some cryptic way; I was trying to comprehend the core work (faces) before that.

Well, that's the record breaking bit.

We need to harness this neural net to improve kittydar:


The singularity is already here. There are black holes in our universe. This submission title really annoys me.

Even if you are trolling, I think I'll leave this here. http://en.wikipedia.org/wiki/Technological_singularity

I'm not trolling, I've read Keurzweil's book... It's extrapolating. If I was a risk assesor, I'd err on the side of a mass extinction event over us consuming the entire universe as data.

There's already been 4 extinction events, it's pretty safe to bet that it will happen again.

But they happen 50-100 million years between each other and usually take thousands of years to take full effect once they begin.

Even if technological singularity takes an extra 100-200 years to really happen, if any significant 'AI' is achieved, a lot could happen in a thousand years, let alone a million.

For what it's worth, your ability to post that is based on that they were actually near-extinction events.

Is accuracy referring to recall, precision, or some other measure?

When there is no possible difference between recall & precision, you report one figure, accuracy.

Google Research and Stanford researchers test out a 1 billion connection, 9 layer, 16,000 core deep learning neural network (see geoff hinton and andrew ng talks on youtube) to recognize 20,000 different objects in images (with low accuracy but huge improvement over previous approaches).

This was done no with no pre-labeled images (except for fine tuning)! A brain that learned from raw images. The same algorithm can be applied to any data type (financial data, text, audio, images/video) without any human involvement (except gatheting of unlabelled data and running the system). Pretty much the artificial intelligence holy grail!

"Singularity is near"? "Holy grail"? You may be getting a little carried away here.

The outcome shows a very nice improvement on an unsupervised classification and feature detection task, but it also highlights that unsupervised machine learning still has a long way to go. 16% accuracy from a network with 1bn connections and 100m inputs using (if my math is right) 1.15m hours of CPU time. Which of these would be the easiest way to continue making gains: investing more time/hardware, increasing the complexity of the model, or developing a new and improved algorithm altogether? All of these sound pretty intensive to me.

If the algorithm keeps increasing in accuracy as you scale up computation and add more unlabeled data that is pretty amazing. You might get something that matches human performance on vision/speech recognition etc.

If you extrapolate that way you'd conclude that naive Bayes is the solution to AI. Improvements tend to tail off fairly quickly as you add more data and computation, unfortunately.

I only read the abstract, so I'm sure this is a basic/dumb question... but if you don't label images as faces or not, what makes it a face detector? :) How do you get an elbow detector or a butt detector out of the same algorithm?

Show it a zillion pictures. then show it a face and see what gets activated. that's your face detector. show it an elbow or butt, and see what gets activated, that's your elbow or butt detector.

It automatically creates a set features that you can then use a final layer of machine learning to get what you want.

In machine learning, normally you have to create a set of features (call feature engineering - basically think algorithms to better represent your data). The amazing thing about deep learning is that the computer does this for you!

You just need a few 10s/100s face/nonface images - same for 20,000 other objects - this is called fine-tuning.

For more, andrew ng, geoff hinton, yann lecun have given talks on this at google and they are up on youtube.

This paper is actually more interesting: it automatically learns some "neuron" which its firing represents a detected face, without any supervise technique. It shows the possibility to extract complex information solely from data.

Is there any concept of "reward" for this thing?

Wouldn't that make training it much quicker and make it much more accurate?

Or are we trying to avoid any human interaction at all with the earning loop?

See, for example this company (one of many) that trains bees to smell certain odours.


Using "reward" or say supervised training is easier and (near certainly) often gives better result, but unsupervised is more interesting as a research result, it tells that we can actually extract very high level information from data itself, using some "obvious" rules (such as linearly mix adjacent pixels and give as sparse-"laplace distribution like" results as possible). It is important because it proves that we may simulate brain functionality without knowing exact structure of brain (as we know brain is complex), but by analysis the data it processes using lots of simple structure instead.

Wait until they become conscious and demand human rights. Then we can no longer exploit them and we're back to square one.

Seriously, a Phd thesis not far from now may have the title: "The limits of AI: how far can we exploit the machines before we are limited by machine rights"

Actually, that is not just a thesis at some point in the future, that is an active field of study and has been since the 1980's. Some keywords are "machine ethics" and "AI rights". One of my friends wrote his PhD (in the 'real' sense, i.e. from a philosophy department - just to say that this is not (only) a CS topic) thesis about a question related to this in the early 80's.

I'd say until they day they realize they're being exploited, and complain about it.

And so is a profession of "machine rights lawyer". Not to mention "Corpsicle lawyer" for representing cryogenically suspended persons. Seriously.

The added singularity is near part is silly. In ten years when that's a 500,000 (or 5 million) core neural net, the singularity is near part still won't apply.

I thought that the best so far was simulating 1/2 a mouse brain compextity at 1/2 speed.

Where can I find this study?

He might be referring to blue brain, though I recall that simulated 100k neurons from the neocortex (not quite half a mouse brain). The thing is that they simulated the physical processes in the synapses and more for a more accurate representation, so it's actually a lot cooler than it sounds!

Here's a link I googled up: http://bluebrain.epfl.ch/cms/lang/en/pid/56882

"we trained our network to obtain 15.8% accuracy"

Film at 11

Look at this way: for a 20000 category classification problem, guessing randomly will give you 0.005% accuracy. Compared to chance, this is pretty good.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact