If you showed me an image of a green field with a bunch of fur balls on it. I'd go "Oh look! Floofy Sheep!" but then maybe upon closer inspection, i'd go "heeyyy.... thats actually a herd of cats!" But a neural net isn't designed to make decisions, to say hey maybe I should investigate further, etc. Its just a black box that spits out probabilities of classifiers. I think if we want to get more sophisticated with judgements and something nearing more realistic intelligence, we would need something like nets of neural nets, and for ways to interconnect them. Like here is a model for sheep, it also has interconnections with environment, and here is another model for a sheep's facial features, etc. And maybe a net for decision making or asking questions if confidence is lacking or ambiguous.
I can see a toddler going "oooh sheep!", as well and then a parent going, "no, look closer, those are kittens!" And then the kid learns oh, maybe I shouldn't be so quick to conclude! Sometimes I may be deceived!
Well, neural nets may be just starting out but I think one can they're approximation process are not complex. They are very complex in the sense of having many layers and many pseudo-neurons on each layers.
What's happening is that the networks are mapping images to high-dimensioned "feature space" and then drawing dividing line in the feature between matching and not-matching images. It is vastly complicated but heuristic process. Essentially, the division between image types are based on both meaningful and meaningless differences between the images. The example classified as "a boy holding a dog" (when it was a goat) and "a herd of giraffes in trees" (when it was goats that had climbed trees happened to have more random characteristics in common with the classification than their real qualities.
The thing is the method can be made relatively better but for absolute improvement, you'd want a way to not just have more approximation but to find a way to get rid of garbage approximation, garbage conclusions and so-forth. I suspect that would imply both different algorithms and a different training cycle.
This is an excellent point but it begs for an answer to the question "what does 'really mean' mean?" What are all the ways a human can determine what a picture "really means" and which of these methods can be used in a given picture?
We know dogs have certain shapes and goats have certain shapes. Other entities have different characteristics. We can explain how we think we reach conclusions. How we actually the conclusions is likely different and may or may not involve "pattern matching steroids" for a given case - what's more definite is we try to reconcile our conclusions between the example so they involve a single consistent picture of the world. Is determining "what a picture really means?"
The current generation of image recognition is really missing an understanding of physics and 3d space. There's no understanding of what would happen if a dog moves its head around.
The next generation of algorithms might fix this. Some people are excited about "capsule networks", which are supposed to learn features that are able to be rotated significantly without breaking.
The problem I have once I think about it is that this line of thinking leads me to be much less sure of the nature of my own existence. Do we first let the mind of an AI develop to appreciate humanity before letting it know that it is an AI? Seems like it would solve a lot of possible problems since Ghandi wouldn't take the murder pill.
Just to play devil’s advocate, we don’t know yet if the model is bad or if we’re just feeding the wrong kind of data, right? These optimization algorithms are good at interpolating; they do well with new data points that land inside the multidimensional convex hull of the training data. They can fail spectacularly when the new data to inference is outside that boundary. But humans aren’t that great at extrapolation either. Maybe NNs will be good enough when we show them everything there is, maybe what we think of as conceptual understanding is just as much simple interpolation of our experiences as our neural networks...?
Possibly - we don't know. Sure if we could train one of these imagenet type classifiers with a million or billion images - close to all known objects in the universe, it may well be able to "recognize" everything. But that still doesn't solve the problem of abstraction or meaning, much less of intuition and generalization. Humans are able to generalize to new domains based on internal models of the world around us. The models used in RNNs for encoding word embeddings, seem to a bit closer to representing meaning. I agree though that NNs are evolving - we are in very early days, and who knows the NNs of the future may reveal that what we consider as understanding is no more than simple interpolation, as you suggest.
I think some of the filters are pretty clearly developing "higher levels of abstractions".
Because the key phrase in the grandparent post is "really represent or mean." Grass + white strands = sheep, is a hierarchical abstraction, but it's a bogus one.
Feature visualizations do not answer this more relevant question.
There's a lot that's still not understood about the brain. Even at the cellular level, microglia have gained attention recently for their contribution to human learning . The NN algorithm was modeled after human neurons for having n inputs and 1 binary output. It's hard to say how cells that regulate the cellular environment and communicate with each other via cytokines would affect the abstraction of the brain to an algorithm or mathematical model. At least for this example, there's a book called "the other brain" about how glia, which make up ~85% of the brain, perform a myriad of operations beyond just keeping neurons together .
But you could also look at the brain through a genetic lens. A lot of the neurosystem is simply hardwired. The knee-jerk response is a reflex arc, which means the signal goes directly from the sensory nerve to the motor nerve (causing you to kick) before it even gets to your brain. That reflex response has been hardcoded into your DNA. How much of the rest of the brain is learned vs. predeterminantly structured?
Cognition can probably be reduced to just pattern matching with learned responses, but it's a bit like an ancient Roman looking at New York City skyscrapers and saying "these are just built using arches and a lot of nuance".
Firstly I would like to point out that there are no (good) "mechanical pattern matching algorithms". I would love for you to point out some, but as far as I know, outside of AI no such algorithms exist.
As for the entire argument, the problem with this reasoning is that it only works at the lowest level. And even then, it sort of works for fully connected and CNN based image classification. But autoencoders certainly have what I'd consider "concepts". Not in a language we understand, but they do. They have a signaling mechanism "explaining" high-level features to other neural networks. RNNs have concepts. RL policy networks don't just have concepts, they have strategies. They have lies, truths, and even political lies: truths explicitly designed to make it really really easy to believe something that's not actually happening. Usually they even exhibit meta-lying: systematically not deceiving with just one (but important) deception in an unpredictable location.
(and I would like to add that the CNN features, looking at the numbers, look VERY similar to concepts in more abstract neural networks. Perhaps the "difference" is merely one of those philosophical differences we keep hitting). GAE networks have concepts (of course they are autoencoders usually).
So in truth this is a matter : neural networks that have no use for concepts have only an incomplete notion of concepts. Neural networks that have to "teach" or otherwise interact at a high level with either humans or other neural nets very much do have concepts.
And I would like to say, this is yet another "humans are magical because X" argument. None of those arguments has ever stood the test of time. This one won't either. AGI is coming, sorry to disappoint you, and the current theories of neural networks will be shown to be "insect-level" (or whatever level) AGI.
However, many of the leading figures in AI - including Geoffrey Hinton - the father of deep learning, is very skeptical of the approach to AI that he pioneered. He recently stated - "My view is throw it all away and start again." 
Francois Chollet - the author of the deep learning framework Keras, has said: "For all the progress made, it seems like almost all important questions in AI remain unanswered. Many have not even been properly asked yet." 
And of course, Doug Hofstadter, who thinks it is going to take a lot more to come close to human level intelligence & understanding, even when you consider the most advanced RNNs of the day - those that run Google Translate .
Take a great horse drawn carriage, and compare it to the first cars. Oh my god. Those cars SUCK ! They are bumpy (no suspension). They rarely get from one city to the next without doing repairs . Getting them to start at all requires an engineering degree AND more than average muscle. Despite many cities having maybe 20 cars total, one blew up every 10 days or so. And the top speed of horses is actually faster than cars ! And the fuel is WAY more expensive than horse feed.
Compared to the AVERAGE horse drawn carriage, which also had no suspension and constantly needed repair ... well they started off being about even, perhaps a little worse, and after a few years they were so much better it isn't even funny.
Likewise compared to a PhD-level expert human translator, in that specific language pair, Google translate sucks. Of course ... there are a few 10's of thousands expert human translators, and they might know 3 languages, and perhaps there's 100 that know 4 languages. But as can be plainly seen on the Indian channel in Australia, Google translate outperforms the translators used for that ... by a wide, wide margin. Google translate can translate between any pair of languages, either language can be chosen out of over 100 languages. Let's face facts here : on most metrics Google Translate wipes the floor with even those PhD level translators, but not -quite yet- on every last metric. Compared to the average human translator, Google Translate wins on every last metric.
Aside from that, Google translate is always available (even mostly available offline, with no Google involved beyond a binary upload to your phone), it's cheap, and frankly it translates from Chinese to English better than a Chinese person following an English course for 2 years can express themselves in English.
The truth is Douglas Hofstadter is ... wrong. Okay, you might argue he's just 80% wrong, and the remaining 20% is shrinking, that's fair. And of course, Google translate is not AGI.
You know, there are social scientists, in fact quite a few of them, that claim "AGI" is simply an AI solving 2 problems:
1) any problem, like surival in groups
2) explaining the actions they took to achieve (1) to other (usually new) members of their group
That's AGI. They're a pretty abstract/advanced form of auto-encoders. We don't know that for sure, but ... it's not far off the truth.
But yes, you can find a few exercises where humans still outperform Google Translate. They're mostly unfair (humans outperform the AI because they have side-channel information available, e.g. what events happened outside of the content of the actual translation. A good test would exclude that and then humans are 100% left behind, but in the press ...)
A lot of disciplines are currently like that. Humans are utterly beat by AI in just about every last thing that was used to "prove" humans have "intelligence" and machines "don't" just 10 years ago. Even the most human of things. AIs actually outperform humans at chatting up humans, can you think of anything you can do on computers that's more human than that ?
And no we're at the point where it's becoming more and more obvious that while AIs don't beat the best experts in specific fields, but they do beat the average human. It's getting ridiculous. AI robots are better at navigating forest terrain than humans, to take an example I recently saw. AI is not just better, but has error rates solidly and consistently below humans on expert-level medical analysis. Expert level mathematics and physics without AIs doing most (or all) of the work has been dead for 2 decades, and in the 2 decades before that I would argue forms of AI made particular researchers far more successful than their peers already.
Where exactly is the point when "but they can't yet X" gets the answer "oh yeah. Hold my beer" ? Is it really that far off ?
So here's Mr. Hofstadter, and with all due respect, he's merely moving the goalposts again. He best goes home to dive into his books, urgently, because in 2 years, we'll have crushed even the most expert human translator using AIs, and he'll need a new place to put those goalposts. I look forward to where he'll put them this time, it'll be fun !
How did we historically "prove" computers "aren't intelligent" ? "Don't have a soul" ? Well, they can't analyze a problem, can they ! (and then we had expert systems). But they can't strategize ! Take chess (and then we had deep blue). But they'll never recognize unstructured data like images will they ? (oops) Okay, but never video and reading people's intentions ! (oops) Okay, but at least humans are better at voice recognition (AIs win consistently now) ! And translation (90% there) ! Okay, sure, but they'll never control a robot in the real world ! (and now pretty much every research robot ... and of course there's self-driving cars). Okay but they'll never deal as well with dynamically stable robots as humans will (that one's a TODO at the moment). Okay, but they'll never deal as well with partial information games and bluffing (Poker - humans beat. Starcraft ... TODO)
Hinton might very well be right. There's 5 major chapters in an introduction to ML course, and Hinton is a big name in 3 of them. Frankly there ought to be a 6th (Hebbian learning). When it comes to exploring deeply, we have only really done that for one of those chapters, the symbolic reasoning chapter. We're getting deeper into the neural network chapter, but symbolic reasoning got a headstart of a millenium or four or five, which I would say we've not quite matched yet. We are very far from out of ideas to get the field to progress further, so I wouldn't worry about that yet.
He also does have a good point : the overwhelming majority of current AI research is focusing on too narrow a slice of the field.
I would like to point out that Hinton is a theoretical researcher into AI. That he believes that theoretical advances are necessary to advance the field is almost a tautology : he wouldn't have become Geoffrey Hinton if he believed otherwise. I mean, this argument has it's place, but it's a statement of faith. A very successful statement of faith, by a very impressive researcher, but ultimately it's about as informative finding out Mr. Nadal likes tennis.
That's an very reductionist view on translation. I'm of opposite opinion that it requires human-level intelligence to translation anything but the simple and dry texts. Translators of literary texts are no less authors than the actual writers.
> Poker - humans beat.
AIs beat humans only in simplest variant of poker - heads-up (two players). In the more complex ones, AIs are nowhere near humans.
This suggests that we these large ML models do need complementary components to improve.
For example, if they were capable of active exploration of an image, they may have said there are sheep and then upon searching for the sheep, realize they aren’t there.
And computers are at best mechanical pattern matching machines. This isn't something that's subject to empirical evidence, and appeals to ignorance or the limits of our current understanding will not do. Computers cannot be any more than that by definition (and arguably the word "matching" is being used in an analogous fashion; rather, a computer configured with an algorithm is such that given an initial state, it will lead to a final state that, when interpreted by a human being, can be interpreted consistently in the desired manner, i.e., a final state of 0 means that the initial configuration encoded two state that match, and a final state of 1 means that they did not). Abstraction is, by definition, NOT reducible to a mechanical process like this. The human interpreter is the one possessing abstract concepts and who interprets by means of meanings conventionally assigned to symbols or machine states. A machine may contain a state that we call an image, but no process can in-principle abstract anything from this aggregation of states. To claim otherwise is to completely misunderstand what symbol manipulation is. (Even a human being couldn't abstract anything from an image in a mode analogous to the way in which a computer operates. By analogy, given a matrix of RGB values, could you tell me what's in the image? Or could you at best compute, say, a value that, when looked up in an already given table of values, gives you a label such as "sheep"?)
However, that does not mean that AI cannot perform well, at least within narrow constraints. It may very well be possible to improve AI techniques to such a degree that it can assign the label "sheep" correctly with high accuracy. There simply is an in-principle difference between AI and actual intelligence.
I'm pretty sure people train image recognizers to output these representations, which can include states like 51% certainty dog, 48% certainty sheep, 1% other, and if you aren't sure, take the best choice.
Its such an intuitive idea to combine these things that if it hasn't happened in the literature yet (I only looked for 1 minute), its because 1000 people have tried it, failed to improve the state of the art, and didn't publish.
On the other hand, we're generally pretty bad at inferring and generalizing 3d structure out of images, so I tend to blame that.
It's more accurate than you'd think, and as this article notes, the mistakes are indeed funny: https://twitter.com/picdescbot/status/968561437126938625
A friend of mine has a young son (~2-3 years old), and a cat named Mono.
Her son knows the cat is named Mono - he plays with her everyday.
But when they go out for a walk, any 4 legged animal he sees is also a "Mono".
Fortunately for her son, his developing, extremely plastic brain will soon know how to differentiate Mono the cat from a random dog on the street (unlike neural networks which we will need to entirely redesign, rebuild, and retrain to get similar progress).
If a neural network has a representation of entities in the world apart from language referring to those entities, that's would be awesome. I'm guessing we're not there yet though.
My son also does the same thing with "dodo" for dinosaur. At first, he applied it only to big, scary dinosaurs that made large sounds—later on, it applied to any animal that had some large body and jaws, e.g. a shark. Finally, he learned to differentiate between the majority of animals (save for a few big, scary-looking animals that are still "dodo") and can name sharks, dinosaurs, and bears separately!
It's very exciting to watch and hear the progress!
I don't think you would need to redesign or rebuild anything for that. You would need to train the network on additional examples, which I supposed you could call retraining (although it need not be from scratch), but that is the same in the case of the child.
You could argue that the child's neural network will be redesigned and rebuilt as his brain matures.
Humans identify objects by looking at how different parts are geometrically located and connected, possibly in a hierarchical fashion, and what basic shapes, colors and textures those parts have. A sheep is a body, four legs, hoofs, a tail, a head with its characteristic shape, the ears, mouth, nose and all those come with characteristic textures and colors.
And because there are so many features and their relations, it is quite hard to fool humans, you can hide or change quite a few of them. We also have a lot of background knowledge, a bright orange sheep might be unusual but we also have a pretty good idea of how hard it is to change the color of a sheep.
I naively expected neural networks to also learn those features but there is just no pressure for them to do so. They mostly see common objects in common situations and there just looking for a patch of wool-colored, wool-textured fur might be enough to identify the presence of sheep correctly almost all the time. Or if sheep are mostly depicted in a characteristic environment it might be good enough to just identify landscape features and ignore the sheep altogether.
I would guess that it is in general not really feasible to come up with enough contrived, adversarial examples to force neural networks to learn the important parts and relations of different objects just by starring at many images. I think one would have to hard-wire some knowledge about space, spatial relations, occlusion, shapes and the like into a system to really get it to learn what a sheep is in a similar way as humans do without heavily increasing the risk of over-fitting.
Show me a couple of images of fields full of sheep classified as oveja or Schafe and I might make the same learning error as the ML process and think the word refers to the [general pattern of] surrounding field or hills. But show me a further image of oveja outside a field - even a close up of an oveja that doesn't resemble those in field photos in any way - and I'll grasp the meaning of the term straight away. Needless to say I'm also less likely to stumble over conceptual links between the names of animals and tastes of food, types of clothing etc which are independent of the living animal's morphology
FWIW this is what capsule networks are designed to do.
I know the likes of Google and Facebook are already doing precisely this with human faces, but we'd need a more generalised object detection algorithm before the examples of sheep in the article would be reliably identified.
* I used the word "simple", as distinct from "easy": for example, creating a training dataset might be a challenge.
I haven't seen anything in the literature about incredibly effective implementations of 3D understanding and/or depth perception at the level you'd perhaps hope, but there is some progress being made.