This work is interesting enough to warrant detailed discussion on the topic at hand, large scale machine learning, rather than just rehashing discussions of the singularity.
Added: As I can't reply to the comment below I'll do it here =] The network provides learned representations that are discriminative.
The aim of the network is to learn high level features representative of the content.
One of the many features it produced was one which accurately indicated the presence of a face in the image.
Note that they said train a face detector and not classify.
For example, from the same network there was a feature which accurate detected cats yet they didn't explicitly train a cat detector either (see the section "Cat and human body detectors").
As the network represents the content as generic features it is clear that, if it reaches a high enough level, those features are essentially classifications themselves.
tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", others with "has cat in image", etc, but these features cannot be used without labelled training.
"Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not."
tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", another to "has cat in image" (see the section "Cat and human body detectors") and so on.
Note however that they select the "best neuron" for face classification -- the only way they can do that is via using labelled data and testing all the neurons (where each neuron's activation is a feature).
Thus, these features cannot be used without labelled training.
Strictly speaking, you do need to have some labeled data at the end in order to determine how the neural net views faces, but I think that obscures what's notable about this system.
The amount of human participation involved in training is potentially six or more orders of magnitude less. That's a breakthrough, and a change in kind, not just degree.
Overhyping when it comes to machine learning and AI seems to be the norm and has already hurt AI/ML severely in the past.
More specifically: I didn't disagree with anything you've stated, simply pointed out that labeled training data is necessary in response to the statement that it wasn't.The high-level feature extraction the paper discusses is unsupervised but the classifiers it produces are semi-supervised. It's an important distinction.
Autoencoders take high dimensional input, map it to a lower dimensional space and then try to recreate the original high dimensional input as closely as possible.
The idea is to learn a compressed representation for the data and hope that this compressed representation works as a high level featureset.
As the model is just trying to represent the original input, no labelled data is required for the initial part. Labelled data is later introduced when the high level features are used for classification.
What's most interesting about this paper is that one of the features learned by the model maps quite well to "image contains a face" without any prompting by the researchers.
For more details, check out http://www.stanford.edu/class/cs294a/sparseAutoencoder.pdf
"our experimental results reveal"
No, they don't. We had one of these in the 1980s.
No, they weren't overstated. They were hyped by a clueless press. There's a pretty critical difference. It's a bit like how the early web pioneers didn't say that the web was going to revolutionize the delivery of dog food; it was a journalist who said that.
"It's extremely valuable for someone to actually go and do a thing, now that we can"
Self organized unsupervised learning was in use for optical classification of potatoes in the feeding of Frito Lay automated processing plants in the late 1970s.
Please distinguish that you haven't actually looked for earlier examples from that you imagine none exist. Thanks.
1. The fact that Claude Shannon succeeded in training a chess system has virtually no impact on sallweachter's claim that many AI results were overstated.
2. Certainly the press overstated them, which supports saalweachter's premise rather than weakening it. Even if the _implied_ claim was that _researchers_ overstated results, your argument does nothing to weaken this claim.
3. Frito Lay solved a problem several orders of magnitude easier that of face recognition in natural images, which is still very much an open problem in computer vision.
4. Similar to 1., the Frito Lay example contributes nothing to your goal of weakening saalweachter's claim that this is valuable research--a claim which is exceedingly innocuous.
I understand that you've probably got a bone to pick against the many AI naysayers and saalweachter's comments conjured a few common misrepresentations (i.e. (a) that the "AI revolution" burnt-out because it's researchers were somehow naive and (b) that neural networks are something new invented by computer vision researchers). You'd be justified in arguing against these claims, and I'm sure your father (respected AI researcher of the same name) would make them too, if saalweachter had tried to make them (which he didn't). But even if you were justified in making the argument, I would expect a less condescending one that made better use of evidence than the argument you've made here.
When a comment opens with a tone like this, I usually don't bother to respond, but I'll give you a chance, because you seem to have done a lot of honest mis-reading.
To wit, it may be of value for you to inspect your own tone, if you find public condescention inappropriate.
"1. The fact that Claude Shannon succeeded in training a chess system has virtually no impact on sallweachter's claim that many AI results were overstated."
It wasn't meant to. Sallweatcher's claim was silly. Who cares if many things were overstated? That has zero bearing on that valid work was, in fact, being done.
The purpose of that statement was to remind us that as early as the 1940s, machine learning was able to defeat its own creator at what remains today regarded as a highly intellectual pursuit. My goal was to ignore the FUD of "some people got it wrong" as an attempt to suggest that there was nothing right.
Some people always get some of everything wrong. His claim is tautological and disinteresting. I was politely declining to shame him for it, but since you've presented me as having false goals, I now have no choice but to clarify.
It is generally inappropriate, for reasons like these, to chastize strangers over imagined motivations. Frequently, you don't know strangers' motivations as well as you might imagine from a simple read of a few paragraphs.
"2. Certainly the press overstated them, which supports saalweachter's premise"
You are now repeating something I said to me back to me. From that, you are deriving the false conclusion that because a journalist somewhere said something wrong, an important thing has been discovered.
What I'd like to point out is that the net result of observing that journalists made mistakes is still "so what?"
"Even if the _implied_ claim was that _researchers_ overstated results"
"your argument does nothing to weaken this claim."
You have not correctly identified what I was speaking to. This is akin to telling someone discussing environmental damage that some farmer is talking about crop yield and the speaker hasn't weakened their claim.
Again: so what? I never argued that there are journalists who got things wrong. I'm the one who brought it up.
What does that have to do with my original discussion?
"Frito Lay solved a problem several orders of magnitude easier that of face recognition in natural images"
Discovering defects in potatoes moving at 45 miles an hour inside a water sluice from a single blurry image from a single angle in hard realtime using 1970s hardware is not several orders of magnitude easier than locating things on a face in slow time on modern hardware.
It's actually quite a bit more difficult even in fair conditions. Potato defects are under the surface, and have to be located by subtle color variation. It is not hard to find the characteristic shape and shadow of the nose.
With respect, sir, it's quite clear that this is not something you've done. You're claiming that easy things are more difficult than hard things, and you're forgetting the 40 year technology gap inbetween in your rush to show that a 2012 project is more impressive than a 1973 project.
To be clear, Babbage's mechanical calculator is also more impressive than an algebra solving system made in prolog. Why? Because it's more work and it's more difficult.
Your claim of several orders of magnitude simpler suggests that you are inventing data for the sake of feeling correct in an argument, and that you do not actually have the experience to show correct guesses in this field. That, combined with a tone suggesting that you feel it appropriate to rebuke strangers in public, suggests that I don't really want to much talk to you anymore.
"Frito Lay example contributes nothing to your goal of weakening saalweachter's claim that this is valuable research"
Again, you've misidentified my goal, and the way by which you've done that is to drop a critical piece of his actual claim.
I don't know why you feel that it's okay to guess at people's goals, then tell people how morally wrong your guesses are. I really don't.
My actual goal was to point out the jarring unfamiliarity with the field that both he and you evidence:
"It's extremely valuable for someone to actually go and do a thing, now that we can, even if someone had the idea for the thing eons ago"
The thing I was focussing on was to show him that this thing that he's applauding someone for doing in 2012 for the first time now that it's practical, even though it isn't being used in industry, was actually outclassed by a much more difficult problem on much more limited hardware in realtime 40 years ago by a company that nobody would think of as a technology giant.
The goal was to display just how far out of touch saalweatcher was with the state of the industry.
Please don't speak to my goals anymore. For someone who'd like to speak about condescention (when I think you actually mean arrogance,) for you to tell me what I meant and what I was getting at - incorrectly - then lambast me for it in a tone far more severe than that which you're criticizing is, I admit, difficult to swallow politely.
"I'm sure your father (respected AI researcher of the same name) would make them too"
Do not speak for, or involve, my recently deceased father in your attempt to be correct, sir. Especially not while you're telling someone else they're being rude.
"I would a less condescending one that made better use of evidence than the argument you've made here."
Unfortunately, though you suggest this, taking a brief look through your comment shows that this is not in fact correct. You have been radically uglier than that which you are criticizing, involving personal attacks, false claims of other people's intent, false claims of other people's goals, and the repeat involvement of a recently deceased relative.
I would prefer not to hear from you again. Thanks.
Also, this paper is about 20,000 object categories, not just 1 (faces). And the neural network is not the standard type but of the deep learning variety which has only existed since 2005 (invented by geoff hinton, who was also big in neural net circles in the 80s so he's not some newcomer who hasn't done his literature search). One of the couthors of the paper is andrew ng, head of the stanford ai lab, so he's pretty legit.
I call "citation needed", and needed quite badly: Shannon was a much better chess player than any program available in 1949.
Whether or not you believe me, everyone else just went ahead and took a quick look, and learned something.
Frankly, I would be happier, given your seeming inability to be a part of this conversation in a polite way, yet also your seeming unwillingness to depart this conversation even after it was requested, that you actually believe I'm wrong, and go around "calling people on this," so that everyone has early warning just how much you actually know about this field, instead of having to wait to listen to you speak.
"Shannon was a much better chess player than any program available in 1949."
On a technicality, this is correct: he started his work on December 29, and it wasn't until five days later, January 2 of 1950, that it was able to beat him.
All the same, you have no idea what you're talking about, and are asserting your beliefs as fact.
The correct way to handle "that doesn't sound right" is a search engine, not putting your hands on your hips and telling someone they're wrong in public.
Go check now, little bird.
This is the most powerful AI experiment yet conducted (publicly known).
No, it isn't. This classifier cannot identify theme variations, unknown rotations, will confuse new objects for objects it already knows, is unable to cope with camera distortion, needs fixed lighting, has no capacity for weather, does not work in the time you need to run away from a tiger, requires hundreds of times more data than a human eye presents, and does a far lower quality job, all while completely losing the ability to give a non-boolean response.
To say this is approaching human abilities is to have no idea what human abilities actually are.
"This is the most powerful AI experiment yet conducted"
No, it isn't. Please stop presenting your guesses as facts. Cyc runs circles around this, as do quite a few things from the Netflix challenge, as well as dozens of other things.
I personally have run far larger unsupervised neural networks than this, and I am not a cutting edge researcher.
I ask this question in all seriousness; I'd really like to know.
(And yes, I see that your username is that of a noted AI researcher. Who died in 2010. So if you're actually his beta simulation, then I'll indeed be rather impressed...)
Let's take the example of The Netflix Prize, a $1 million bounty that the movie shipping organization ran several years ago. Their purpose was to improve their ratings prediction algorithm, under the pretext that people frequently ran out of ideas of what to rent, and that a successful suggestion algorithm would keep people as customers longer after that point.
So, they carefully defined the success rate of their algorithm - that is, make it predict some set of actually-rated movies X on a 1-5 half-integer scale, take the arithmetic mean of (the sum of (the square of each error from the real rating)) - which we'll call root mean square error, or RMSE - and you have your "score," where towards zero is perfect.
Their predictor had a score of I think 0.973 something (it's been years, don't quote me on that.) Their challenge was simple.
Beat their score by ten percent, and you trigger a one month end-of-game. At the end of that month, whoever's best wins le prize. One million dollars, obligatory Doctor Evil finger and all.
Netflix provided (anonymized) a little over 100 million actual ratings, where all you had was a userID, a movieID, a real rating, and separately, a mapping "this movieID is this title." You were only allowed to use datasets in your solution that were freely available to everybody, and you had to reveal them and write a paper about your strategy within one month after you accept le prize, honh honh honh.
Seriously, it was awesome. They were going to do a second one, but lawyers, and the world sadded.
So, there, you've got a ten times larger dataset. So surely sixteen thousand cores is the drastic thing, right?
Well, not really. I was running my solution on 32 Teslas, which in the day were $340 in bulk and had 480 cores each. So I actually "only" had 15,360 cores, which falls a whopping four percent short of Google's approach, which several years ago cost me about the price of a recently used car, and which I was able to resell afterwards as used, but without the bulk discount, for almost exactly what I paid for them in the first place.
And I mean, I've got to imagine that someone else chasing that million dollar prize who thought they were going to get it invested more than I did. There were groups of up to a dozen people, data mining companies, etc.
So if one dude sitting in his then-Boise apartment can spend like $11k on a ten times this dataset dataset over a commercial prize?
Cyc still pantses all of us.
Don't get me wrong, the Netflix prize was cool.
What's cool about this is that Google hasn't given the learning system a high level task. They basically say, figure out a lossy compression for these 10 million images. And then when they examine that compression method, they find that it can effectively generate human faces and cats.
Predicting someone's reaction to a given movie is a lot more complicated than a pair of IDs and a rating, too, it turns out.
Let's take the speculation out of this.
You can get features of an image with simple large blob detection; four recurring boltzmann machines with half a dozen wires each can find the corners of a nose-bounding trapezoid quite easily. They'll get the job done in less than the 1/30 sec screen frame on the limited z80 knockoff in the original Dot Matrix Gameboy. You'll get better than 99% prediction accuracy. It takes about two hours to write the code, and you can train it with 20 or 30 examples unsupervised. I know, because I've done it.
On the other hand, getting 90% prediction accuracy from movie rating results takes teams of professional researchers years of work.
"I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time"
And you won't get anywhere near the prediction accuracy I will with noses. That's the key understanding here.
It's not enough to say "you can do the job." If you want to say one is harder than the other, you actually have to compare the quality of the results.
There is no meaningful discussion of difficulty without discussion of success rates.
I mean I can detect noses on anything by returning 0 if you ignore accuracy.
"What's cool about this is that Google hasn't given the learning system a high level task."
Yes it has. Feature detection is a high level task.
"They basically say, figure out a lossy compression for these 10 million images."
I have never heard a compelling explanation of the claim that locating a bounding box is a form of lossy compression. It is my opinion that this is a piece of false wisdom that people believe because they've heard it often and have never really thought it over.
Typically, someone bumbles out phrases like "information theory" and then completely fails to show any form of the single important characteristic of lossy compression: reconstructibility.
Which, again, is wholly defined by error rate.
Which, again, is what you are casually ignoring while making the claim that finding bounding boxes is harder than predicting human preferences.
Which is false.
"they find that it can effectively generate human faces and cats."
Filling in bounding boxes isn't generation. It's just paint by number geometry. This is roughly equivalent to using a point detector to find largest error against a mesh, then using that to select voronoi regions, then taking the color of that point and filling that region, then suggesting that that's also a form of compression, and that drawing the resulting dataset is generation.
And it isn't, because it isn't signal reductive.
Here, I made one for you, so you could see the difference. Those are my friends Jeff and Joelle. Say hi. The code is double-sloppy, but it makes the point.
See how I'm getting a dataset that isn't compression? See how that dataset is being used to make the original image, but nothing's being generated?
Your rant about this not being compression or whatever you're trying to say is completely off the mark. You don't seem to understand what this work is about.
The netflix challenge is a supervised learning challenge. You have lots of 'labeled data'. This technique is about using 'unlabeled' data.
(Side note: At one point, Geoff Hinton and his group using this technique had the best result in the netflix challenge, but were beaten out by ensembles of algorithms.)
Cyc has nothing to do with this and is huge failure at AI.
tldr; You don't seem to be knowing what you're talking about after having reading your comments, and seem to readily discount the some of the most prominent machine learning researchers in the world today. You're obscuring important results that newcomers might have found interesting to follow up on.
Your reading skills seem to be up to par, since I have discredited a list of zero people.
"You're obscuring important results that newcomers might have found interesting to follow up on."
Not only have I obscured no results, but this isn't actually something I have the power to do.
Here is an old example with hundreds of millions of records and instances:
Both authors are now with Google.
Also, people here may not be as up to speed on the state of the art in face rec as they think they are. It's not as much of an unsolved problem as it was even 10 years ago.
Not necessarily. Crowdsourcing is another option, like Google's image tagging game, reCAPTCHA, et cetera.
Pay a herd of people to do things, and they'll do things for you. You don't have to pay them in money. Telling them they have a high score is often enough.
With 15.8% accuracy.
> This is the most powerful AI experiment yet conducted (publicly known).
It's only powerful because they threw more cores at it than anyone else has previously attempted. From a quick skimming of the paper, there does not appear to be a lot of novel algorithmic contribution here. It's the same basic autoencoder that Hinton proposed years ago. They just added in some speed ups for many cores.
It's a great experiment though. You shouldn't detract from its legitimate contributions by making outlandish claims.
That's an ill-defined statement. AI is a vast and diverse field: what makes one demonstration more "powerful" than another? There are definitely other projects that could be viewed as being in the same class of "powerful" as this cluster.
This is certainly an interesting paper, but it has to be viewed in the context of a large and active field.
Let's say we'll stop making broad proclamations about the global best in a field we know very little about.
Here is a reasonably approachable talk he gave about it.
Thank you for sharing with me. :)
In return, I will offer you two interesting non-sequiturs, because I don't have anything topical and a non-sequitur seems like it's worth half what something germane would be.
Bret Victor, "Inventing on Principle." First 5 minutes are terribly boring. Give him a chance; it's 100% worth it.
Damian Conway, "Temporally Quaquaversal Virtual Nanomachine."
It's as funny as it sounds.
This technigue was "discovered" by geoff hinton at the university of toronto in 2005. However, nobody at tried (or maybe got enough funds) to try it this scale.
If this continues to work at larger and larger scale, this would be a machine learning technique that can work accurately on tasks that are hugely important to society
- accurate speech recognition
- human level compuer vision (make human manual labor redundant)
As for the point about it being for non-technical people, I don't understand where you're coming from. This is hacker news. If people don't understand it and don't upvote it, then that's their problem, not yours.
Yes, 15% accuracy doesn't seem great.
BUT the detector built its own categories(!). It managed to find 20,000 different categories of objects in Youtube videos, and one of these categories corresponded to human faces, and another to cats.
Once the experimenters found the "face detection neuron" and used it to test faces THAT neuron managed 81.7% detection rate(!).
Forget the singularity, and just think about how amazing that is. The system trained itself - without human labelling - to distinguish human faces correctly over 80% of the time.
Obviously this is extremely impressive work, and given that Google gives away 1e9 core hours a year, I'd like to see how much further they can push this network (which only used 16e3x3x24 ~ 1e6 hours). But this isn't like scoring 80% in a written exam.
I'm also impressed by how readable the paper was. Apart from a few paragraphs of detailed maths this should be accessible to anyone who's read the wikipedia article on neural networks.
Yes, that is true. But ~80% correct is still a significant result.
I was hoping people would read beyond the 15% headline figure to understand exactly what than number meant.
It's not revolutionary. Clustering algorithms and neural nets are plenty.
Really, what differentiates this network is its scale.
Also, it was (again, from the article) plausible but not a given that high level concepts could be found from unlabeled data.
That "cat" is one of the high level concept you get from using random Youtube videos as raw data is both impressive, and slightly amusing.
(Sorry I wrote that fast, I hope it's understandable)
That you get high level features instead of edges is not the impressive part - you can just as well write a sparse non negative matrix factorization algorithm that will efficiently learn/represent eyes, lips and noses as features of faces unsupervised.
The most relevant quote being perhaps:
"The magic of the brain is not the number of neurons, but how the circuits are wired and how they function dynamically. If you put 1 billion transistors together, you don't get a functioning CPU. And if you put 100 billion neurons together, you don't get an intelligent brain."
EDIT: Mistyped number of cores. 1000, not 100.
It absolutely does not. This experiment supports that position strongly.
What this experiment shows is that said meaningful structure can be progressively, automatically discovered.
"and with only 1000 cores, not 1 billion"
Comparing CPU cores to individual neurons is more than slightly disingenuous.
See: http://www.nvidia.com/object/tesla-servers.html (4.5 teraflops in one card)
Reminder: GPUs will destroy the world.
GPUs do have fundamentally more execution resources, but that comes at a price and not every algorithm will be capable of running faster on a GPU than on a CPU. If neural networks just involve multiplying lots of matrices together with little branching they might be well suited to GPUs, but most AI code isn't like that.
They aren't as different as you imagine. They're general purpose programmable arithmetic units with processing rates on the order of 20-30% of CPUs, provided the limitation that they're all doing roughly the same thing.
For most machine learning tasks, that's exactly what you're doing anyway. Oh no, your neural network engine has to be parallel? C'est damage!
So are GPU cores.
"But saying that the NVidia GPU has 1526 "cores" is just dishonest."
No, it isn't. You can run 1536 things in parallel at speeds that would have qualified as full cpu speeds several years prior.
Something isn't any less a core merely because it does less juggling magic, and that juggling magic is actually undesirable for a heavily parallelized task.
"So there are some tasks where the Intel core will be much faster than the NVidia SM, and some tasks where the NVidia SM will be much faster."
This conversation already has a context. Arguments which ignore that context completely miss the point.
If you don't understand how I achieved the amount of processing I did, that's fine. Playing games with the semantics of a "core" somehow magically requiring all the features of current Intel-strategy chips, though, are not going to convince me.
There is more to Heaven and Earth, Horatio, than is dreamt of in Intel's philosophy. This sort of attitude towards what constitutes the no true scotsman "a real core" is why Arm is in the process of eating Intel alive, and why Tilera stands a decent chance of doing the same thing to ARM.
This is merely extreme RISC. I realize it's sort of a tradition for the modern VLIW movement to suggest that if you can't double-backflip through a flaming hoop made out of predictive NAND gates it somehow doesn't count.
But, if you actually look, the rate of modern supercomputing going to video cards is rising dramatically.
So obviously they count as cores to somebody.
You also seem to have missed the point. It's not the core scale that we're discussing here. It's the dataset scale. The number of cores you throw at a problem is not terribly important; 20 years ago it would have been breathtaking to throw 32 cores at a problem, and now that's two CPUs.
What makes an experiment cutting edge is the nature of the experiment, not the volume of hardware that you throw at it. I was talking about the /data/ and the /problem/ . Predicting movie ratings is a hell of a lot harder than feature detection.
And actually the thing that does 4.5 teraflops in single precision does only 95 gigaflops in double precision per GPU. A good x86 CPU does ~100 gigaflops in double precision as well, and you're much more likely to actually achieve that number on a x86. Although another one on the page you linked to theoretically does 665 gigaflops double precision.
The ability to detect faces is not a signal that general intelligence is right around the corner.
There's the Cambpellian Singularity, which says that we won't be able to predict what will happen next. Pretty non-controversial as far as it goes.
There's the Vingean Singularity, which says that if we ever develop AIs that can think as fast and as well as humans then due to Moore's Law they'll be thinking twice as fast as humans after 2 years, so they'll start designing chips and the period of Moore's law will fall to 1 year, and so on with us reaching infinite computing power in finite time. I think this vision is flawed.
Relatedly, there's the Intelligence Explosion Singularity (associated with Yudkowsky), which says that as soon as its AIs designing AIs, smarter AIs will relativly quickly be able to make even smarter AIs and we'll get a "fwoosh" effect, though not to infinity in finite time. I find this unlikely, but can't rule it out.
There's one I don't have a handy name for, but lets call it the AI Revolution viewpoint, which is that AIs will cause civilization to switch to a faster mode of progress, just like the Agricultural Revolution and Industrial Revolution did. This one will only look like a singularity in hindsight, and might seem gradual to the people living through it. I think this one is pretty credible.
Then there's the Naive Singularity, which equates processing power with intelligence and then concludes that computers must be getting smarter. This is indeed totally naive and not something we should worry about. I guess the linked paper is evidence that you can substitute a faster computer for smarter AI researchers to some extent, but probably not a very large one.
As his definition of singularity is pretty strongly tied to comprehension think of it like this - the singularity is the time point after which a 10 year old unmodified human child from 1000 AD can not grow up to understand his or her surroundings.
If the singularity was a legitimate concept with anything approaching experimental evidence, then this could not be true. This observation of yours - with which I agree - suggests to me that The Singularity needs a pope hat.
It is instructive to notice that all of "the singularities" are the products of science fiction authors, and in the case of the original, a particularly bad one.
There is a delightful level of schadenfreude involved in observing the multiplicity of "The Singularities." In two different ways its name says "there's only one," and yet they still can't agree on topics that are critical and fundamental to the concept itself, like the definition of intelligence, or whether or not to circumcise.
Pass the sacramental chalice, please?
I am also sure you know that words - such as variety and polymorphism - have different context specific meanings. Singularity in this case as in the kind of thing you can find on a variety but not on a manifold.
The idea of infinite recursive Moore's law fueled intelligence explosions leading to super human intellects by 2030 is something I assign a low probability to. I don't find it hard to believe that there is some point in the future - say 2131 - such that if anyone alive today or previously were transported there, they would never be able to understand what was going on and everyone from that time would think circles around them.
What I was getting at was "you realize they're writing books to make people happy for money, not doing legitimate science on that day, right?"
"words - such as variety and polymorphism - have different context specific meanings."
Sure. All handwaving about the rules of language notwithstanding, though, none of The Singularities have merit or underlying measurement, even if you want to talk syntax and grammar to create a seeming of academia by proxy.
"The idea of infinite recursive Moore's law"
... is nonsense. What would "recursion" be in the context of Moore's law? Have you even thought this over?
What, Moore's Law solves itself by going deeper into itself until the datastructure is exhausted?
"fueled intelligence explosions"
The science fiction part. I mean, you might as well say "fuelled by warp drives," because there's no evidence they're going to happen either. Or unicorns.
"is something I assign a low probability to."
This suggests that you don't know what probabilities are. Probabilities are either frequentist, which cannot happen here because we have no knowledge of the rates here (this would be like calculating the frequentist probability of alien life - it's just making numbers up,) or Bayesian, where you draw probabilities from observed events, at which point the probability is exactly zero.
So, is it undefined or zero that you're promoting?
"I don't find it hard to believe that there is some point in the future - say 2131"
"they would never be able to understand what was going on and everyone from that time would think circles around them."
It seems you don't even need to be transported into the future for that.
2.) Singularity as in breakdown not as in single. You purposely muddled the meaning to make your quip work.
3.) Moore's law fueled as in AI gets interest on their intelligence. Recursive as in AI makes smarter AI makes smarter AI...
4.) Science fiction or not i find it unlikely.
5.) Bayesian. Look up prior.
6.) 2131 was tongue in cheek.
7.) Thanks ;) You actually never address my main point though.
This is a blatant falsehood. I have solely and exclusively used it as a title for Kurtzweil's concept. It has no meaning; it's a name. I have muddled nothing. It is inappropriate for you to make accusations like this without evidence.
"3.) Moore's law fueled as in AI gets interest on their intelligence"
Yes, that's what I said at the outset: this whole thing is driven by the false belief that intelligence is a function of CPU time. There is no experimental evidence in history to support this, and there are 65 years of counter-examples.
Repeating it won't make it less wrong.
"Recursive as in AI makes smarter AI makes smarter AI..."
This gets to a different false presumption, namely that the ability to create an intelligence, as well as that the power of the intelligence created, is a linear function of the prior intelligence.
This whole treating everything like it's a score, like it's a number you tweak upwards? It's crap.
You can't make an AI with an IQ of 106 just because you have a 104, and the guy who made the 104 had a 102.
This is numerology, not computer science.
"4.) Science fiction or not i find it unlikely."
I can't even tell what noun you're attached to, at this point. What do you find unlikely?
"5.) Bayesian. Look up prior."
What about bayesian, sir? I don't need to look up prior; I used it, correctly, in what I said to you. You're just telling me to look things up to pretend that there is an error there, so that you can take the position of being correct without actually having done the work.
There are zero priors of alien life, sir. That was my point, in bringing up what you're now blandly one-word repeating at me, in your effort to gin up a falsehood where none actually exists.
"7.) Thanks ;) You actually never address my main point though."
You don't appear to have one.
Maybe you've forgotten that you were replying to someone else, who already said that to you?
Here are some pretty uninformed statements by tech luminaries: http://spectrum.ieee.org/computing/hardware/tech-luminaries-...
They mostly don't have the philosophical or synthetic chops to make intelligent statements about the singularity. Moore misunderstood his own observation for at least the first ten years after he made it, not to minimize his important contributions.
Singularity is an unfortunate term, because it's technically incorrect and logically contradictory. My work is not trying to make a singularity; it is trying to make recursively self-improving machine-human intelligence which interacts with and learns from its environment. This is not impossible; it is merely technically difficult.
It is also a hypothesis. That's all. It isn't a phenomenon, so we cannot yet be good Aristoteleans and observe it. Therefore, we can't develop a science of it. You'd avoid this entire webpage's worth of argument if you'd just simply remember that the singularity is nothing more than this hypothesis.
Repeat: The singularity is not a phenomenon, nor is it a theory; it is a hypothesis.
For those who wish to believe it is a correct hypothesis, and it is the future, well, get busy doing the hard work and developing the the technology to make it the future. For those who don't believe it could be a possible future, get out of our way, since you're so damn sure you're right.
To call Vinge a particularly bad science fiction author says more about your critical acumen than about him. (Perhaps you're thinking about his ex-wife?)
(I read the "singularity is near" in the article title as ironic - almost parodic).
Do you also participate in discussions of Microsoft Surface by explaining that in general, a surface is a flat exterior of a coherent object?
"To call Vinge a particularly bad science fiction author"
Vinge didn't come up the singularity; Kurtzweil did. Kurtzweil did. I quite like Vernor Vinge's work.
"says more about your critical acumen"
Acumen is the ability to make good numeric estimations on the spot, such as business decisions.
Despite that I quite enjoy Vernor Vinge's work, I also feel it important to point out that merely should someone dislike an author you like would not, in fact, be a measurable sleight against their intelligence, any more than liking different pizza toppings would be.
"I read the "singularity is near" in the article title as ironic"
That's interesting. If that's correct, then you have a point. (Also, bravo for being part of the one percent of the internet who knows what that word correctly means. I mean that in earnest.)
The term was coined by science fiction writer Vernor Vinge, who argues that artificial intelligence, human biological enhancement or brain-computer interfaces could be possible causes of the singularity.
I should also point out that you're exagerating the link between the idea and science fiction authors. Campbell was a science fiction author (well mostly an editor but close enough) and Vinge was too, though Vinge was also a CS professor. The people I'd associate with the other schools of thought aren't science fiction authors, though.
We disagree here. The singularity made measurable predictions, and not only has every single one that's come to pass failed without exception, but the remainder are no closer to happening than the day they were made.
Remember, originally, computers were supposed to be smarter than us back in the 90s, when people still thought The Simpsons was funny.
"I think that we might all be better off if we stopped using that term."
We agree, though I think for different reasons. If I understand you correctly, you are suggesting that we forego this term in favor of clearer, better defined ones, but keep the idea.
I think we should actually reject the concept.
"Campbell was a science fiction author (well mostly an editor but close enough) and Vinge was too, though Vinge was also a CS professor. The people I'd associate with the other schools of thought aren't science fiction authors, though."
Minsky, who associates with that school of thought, is. Kurtzweil, who originated this school of thought, is. Stanislaw Ulam, Damien Broderick, Hans Moravec, Greg Egan, Nancy Kress, Larry Niven, Dean Ing, Samuel Delaney, Ray Solomonoff, Pohl, Aasimov, Steele, other-Steele, Yudkowsky, the founder of Singularity University (which is not legally a university) Peter Diamandis, Aubrey de Gray, et cetera.
Sort of the germane understanding is to look at their work. Whose work do they all point to? Moore, Lanier, Holland, and Hawkins.
Guess what all four of them say they think?
"I should also point out that you're exagerating the link between the idea and science fiction authors."
I don't think that I am. Every proponent of The Singularity I am aware of writes speculative fiction for money, without exception. The list I gave above isn't even close to exhaustive.
And, again, that's not the actual point I'm making. This isn't about "the link between" The Various Singularities and speculative fiction; I'm asserting directly that every single Singular proposal is itself science fiction.
I'm not saying it's written by science fiction people. Aasimov did legitimate speculative engineering, for example, in the cases of geosynchronous orbits and the space elevator, and arguably in arcological discussion. I would not say that his work was science fiction, despite that he's a science fiction author.
Because there are real numbers involved. There's real math. There are real equations. He knew the fuel demands, the weight of the building, the energy requirements. He wasn't writing fun stories; he was doing real work.
Spend two weeks. You will _never_ find real work done around the singularity. It hasn't been done.
Earlier you suggested that all speculation about the future was so; I do not agree. We have several quite good designs for an arcology on Mars which would actually work. Seward's Folly is speculative, but it's actual legitimate work; it could be built.
The Singularity is just a cute story.
"The people I'd associate with the other schools of thought aren't science fiction authors, though."
Schools of thought is a great phrase to try to bring gravitas to a situation where it isn't warranted.
I'll pay more attention when I see a single work on The Singularity which could pass muster as a freshman thesis at a second string state school.
There's lots of material out there about infinite energy devices, too. Ask yourself a question: if you didn't have the Second Law of Thermodynamics, how would you tell the free energy devices apart from the legitimate ones?
How is it that you know Andrea Rossi doesn't really have cold fusion?
Try some skepticism. It's delicious, and low cholesterol.
For a possible resemblance with the real cortical neural network working principle and face or object recognition, this is just a farce.
Regarding getting closer to the presumed singularity, this is like saying that cutting flint is close to making diamonds.
The authors didn't claim that, but the abusive use of "neural network" for such kinds of applications is just doing that. It is a dishonest abuse of people who can't make the difference.
The true problem is that significant quality work toward modeling real cortical neural network is drown in the sea of such faker crap.
Regarding you other point, it is a matter of research strategy. I think that the path trying understanding the working principle of real cortical neural network is the shortest path to AI. My impression is that the other path which is to play around with artificial neural networks is too hazardous.
We can make a parallel with learning to fly. We are in a similar situation regarding how the brain works and AI. Understanding how birds fly require a true research. People seems to simply focus on flapping while this is not the real working principle of flight.
I see there a strong analogy with artificial neural network. The most relevant properties of cortical neural networks are ignored.
With flight the proof condition of mastering it was obvious. With AI, it is less obvious. I would be glad to hear suggestions. Face recognition is the most difficult condition because this process is the end product of many prior processes like 3D perception and feature extractions. My current impression is that talk decoding would be a much better candidate. Siri shows the potential impact of such AI product. At least the turing test would be a direct match.
Admittedly this is a big area of disagreement both within and outside the field.
No more supplements-eating Kurzweil, walking Terminators and Skynet-like BS please.
A human baby learns from uncleaned raw data using far less energy with better generalization than a computer and fuses large amount of data without suffering from dimensionality curses.
I think it is safe to say that human babies are still ahead. for now.
In a way their first input processing of the real images was pixelisation filter. If you feed pixelised image to a person you see how much information is lost. If you make single pixels occupy significant portion of view person might loose ability to recognize the image at all. Feeding so little information to CV system is like trying to teach nearly blind man to see.
To improve CV we should focus on finding best ways of converting full resolution visual data to something of smaller volume in such way that important features are preserved.
This input data IMO should also include time. I, thanks to crappy eyesight often recognize people, actions, objects relying more on how they move not how they look to me. Even with sharp eysight sometime your vision just gets stuck and can't recognize what is in the scene you are currently looking at. You can't understand what you see until something in the scene moves or you move a bit.
You are exactly right! See Dictionary Learning, Random projection, compressive sensing. As for time, perhaps you are right I don't know. That question is: would a suitably written video trained classifier that preserved temporal features do better on image classification?
Seeing is in my opinion very similar to understanding language in sense that the information that is transferred, observed image or words heard are just small fuzzy fragments. Sender (speaker, or in case of vision, physical world) has rich model and recipient has rich model of all the things they can communicate about and the actual information passed, only indicates the parts of the underlying model to the recipient, that he should select and how he should modify them to get the message.
Building usable model from small fuzzy fragments of information that are passed when recognizing image or hearing spoken words should be incredibly hard task and I think no biological brain could do that. I think that absorbing as much real information as possible at the time of training the classifier is absolutely crucial for achieving anything close to what humans or animals can do.
From the techniques you mentioned, dictionary learning looks most awesome to me, and most applicable to CV.
Then, what kind of thing are you measuring? Recognizing patterns ? We know humans are very good at that. We can see thousands of different people everyday but we can recognize in the blink of an eye a familiar face. A computer or computer network is very, very, very far from being able to do that yet.
edit: this quote puts things into perspective a bit "It is worth noting that our network is still tiny compared to the human visual cortex, which is 1,000,000 times larger in terms of the number of neurons and synapses."
Thus, the paper is about using an unsupervised system to help a later supervised system. An advantage of this is that, as the unsupervised system isn't trained to recognise object X, it instead learns features that are discriminative. This same network could be used to recognise arbitrary objects (which is what they do later on in the paper with ImageNet).
What he's trying to talk about is "this is an unsupervised feature detector in a large dataset which is only categorized, and where no human has provided correct answers up front to verify progress."
The reason this matters (and it doesn't matter very much) is that that means that in cases where it's prohibitive to provide training sets, such as where you don't know the good answer yourself, or where giving a decent range of good answers would be difficult, this sort of approach can still be used.
"isn't that basically equivalent to labeling them?"
Yes. It is. The original poster is confused.
What he meant to say was "there is no training set."
I apologize for being vague, and shall endeavor to be clearer in the future.
Also, the previous best on the same dataset was 9.3%
OTOH, it's late and i might be way off.
--- last company was in computer vision.
Did you sell, leave or did it fail? Why? I have some ideas that I think are novel applications of computer vision, and just within the range of what's feasible, but it seems that most computer vision applications look like that at first, and then after 90% done find out that the second 90% is exponentially harder and, realistically, infeasible. How could I test my ideas against that? Or am I asking from wrong premises?
I left after all the other engineers did.
Email me if you want to discuss practical application.
Is it cool, and perhaps even useful? Yes. But don't confuse this research project for a precursor to skynet.
"Our training dataset is constructed by sampling frames
from 10 million YouTube videos."
I'm familiar with both models so I can also try to answer any questions.
A good way to start "AI". Write a decision tree they will serve you well and with boosting do even better. Basic but useful stuff: logistic regression, armed bandits, weighted experts, kNearest, k means , Kernel Density estimation and Naive Bayes. That covers online, ensemble, super and unsuper vised algorithms. Goodluck!
Yes, for some problems it might be faster and better to code logic yourself, but there are also tasks (such as pattern recognition) where NNs might be more effective.
Could anyone with expertise say if this would be enough to build a foundation? How much math background do you need?
https://www.coursera.org/course/ml (From one of the authors of this paper!)
Prof. Hinton's videos are very watchable:
They take the frames from YouTube. It is weird to me that YouTube, (derided as a way of sharing funny cat videos) is able to contribute something actually useful to the world.
I agree that there is some great content on Youtube. Interesting that you mention sewing machines, because that's something I've used and they are particularly helpful. (See also all those other crafting videos; latch-hooking etc.)
But they happen 50-100 million years between each other and usually take thousands of years to take full effect once they begin.
Even if technological singularity takes an extra 100-200 years to really happen, if any significant 'AI' is achieved, a lot could happen in a thousand years, let alone a million.
This was done no with no pre-labeled images (except for fine tuning)! A brain that learned from raw images. The same algorithm can be applied to any data type (financial data, text, audio, images/video) without any human involvement (except gatheting of unlabelled data and running the system). Pretty much the artificial intelligence holy grail!
The outcome shows a very nice improvement on an unsupervised classification and feature detection task, but it also highlights that unsupervised machine learning still has a long way to go. 16% accuracy from a network with 1bn connections and 100m inputs using (if my math is right) 1.15m hours of CPU time. Which of these would be the easiest way to continue making gains: investing more time/hardware, increasing the complexity of the model, or developing a new and improved algorithm altogether? All of these sound pretty intensive to me.
In machine learning, normally you have to create a set of features (call feature engineering - basically think algorithms to better represent your data). The amazing thing about deep learning is that the computer does this for you!
You just need a few 10s/100s face/nonface images - same for 20,000 other objects - this is called fine-tuning.
For more, andrew ng, geoff hinton, yann lecun have given talks on this at google and they are up on youtube.
Wouldn't that make training it much quicker and make it much more accurate?
Or are we trying to avoid any human interaction at all with the earning loop?
See, for example this company (one of many) that trains bees to smell certain odours.
Here's a link I googled up: http://bluebrain.epfl.ch/cms/lang/en/pid/56882
Film at 11