Which is a lot better than reading someone tell you about this new idea that is called "capsules" but doesn't go into detail. The only thing is that, when this presentation was given, it seems they hadn't worked much more than MNIST (so the new thing now would be the toys-recognition net).
Better source, with date: http://techtv.mit.edu/collections/bcs/videos/30698-what-s-wr... (December 2014, for the lazy).
Oddly the capsule approach is how I naively thought image recognition worked until I learned more about it.
The paper however details their experiments on Cifar-10 and other datasets in the discussion section.
They don't produce good enough results but the paper proposes certain hypotheses for the poorer performance and that it could be overcome in the future.
This claim seems dubious. Study's have shown humans can react to visual stimuli in as little as 1-3ms. If a child observes a cat in the room for only 10 seconds, that's already between 3,000 to 10,000 samples from various perspectives. While our human experience may describe this as a single viewing 'instance', our neurons are actually getting an extensive, continuous training. Is this accounted for in the literature?
Also, a child observing a cat continuous for 10 seconds is getting highly correlated samples, not new independent instances. The effective number of samples (which I quite agree would be >1 if the child got to examine the cat from different perspectives, or the cat moves around, etc etc) should still be lower than what a putative sampling rate would suggest.
Edit: First said LGN projects SC, but pathway is even shorter than that.
Edit2: The consensus on these fast, superior colliculus-guided "Express" saccade latency seems to be 80-120 ms. See: http://www.scholarpedia.org/article/Human_saccadic_eye_movem...
"The optic nerve is directly connected to the superior colliculus"
But I wonder how this "direct connection" was established. Was it done in humans or only mice/rats? Is it really always a direct connection or only in eg 80% of people, etc.
This is a good text, at an upper-div/grad level, of fundamental neuroscience with all sources cited.
That particular connection is straightforward to do in humans. A Golgi stain to the rector muscles/ON and dissection in cadavers would be sufficient to trace the reflex to the SC and then another Golgi stain to that area to get back to the optic nerve. I'm unfamiliar with the toxicity of Golgi stains, but it may be able to be done alive.
Also, the visual systems to the brain-stem are remarkably conserved through evolution. I would not be surprised to see this connection in lampreys. That any significant percent of humans lack it would be a hell of paper.
Blind individuals usually have these reflexes too (like Stevie Wonder): https://en.wikipedia.org/wiki/Blindsight
I was able to check a bit and see no citations:
"The human brain contains a huge number of these cells, on the order of 10^11 neurons, that can be classified into at least a thousand different types."
That 10^11 number is out of thin air. How was it determined? That is what a citation is for.
>"I'm unfamiliar with the toxicity of Golgi stains, but it may be able to be done alive."
No, the gogli stain is very toxic. It depends on a precipitate forming in "random" (no one knows why) cells. Also I see no reason it couldn't spread from cell to cell (via gap junctions, etc) so that method isn't too convincing.
>"Also, the visual systems to the brain-stem are remarkably conserved through evolution."
You can remove a rat's cerebrum and have it stay alive and keep doing stuff:
"Cage climbing, resistance to gravity, suspension and muscle tone reactions, rhythmic vibrissae movements and examination of objects with snout and mandible were difficult to distinguish from controls."
Rodents are much more reliant on their brainstem than humans, I wouldn't be at all surprised that there are large differences. In fact, there's been a long debate about a similar claim regarding the cortico-spinal tract:
"Direct connections between corticospinal (CS) axons and motoneurons (MNs) appear to be present only in higher primates, where they are essential for discrete movement of the digits. Their presence in adult rodents was once claimed but is now questioned."
If you really have a problem with Kandel, use email. Most authors of these types of book NEVER get any email about them and would be thrilled to have some interaction with a reader.
It seems very factual and set in stone but I bet if you read the primary literature there will be variation and doubt. If you read my last ref you will see they claim direct connections between CST and motorneurons in rats of some ages but not others. Perhaps this optic nerve claim was made based on using animals of a certain age, so it won't generalize. Who knows? That's why there should be a citation.
tl;dr Current textbook practices promote false certainty, and I don't think it is helpful for learning about a topic.
If you see a black cat and a white cat, and someone tells you there are striped colored cats, you can imagine it. And if you were to come across it, you'd instantly recognize it as a cat. Neurals nets can't do that. You can also see a lynx and recognize it as "some kind of cat". Again, neural nets are not there yet. Which is why there are people researching to find new, better algorithms that better mimic what we recognize as intelligence.
Of course these are just based on my intuition of what neural nets are capable of, so if you have examples of cases where these specific tasks were attempted unsuccessfully, I'm interested.
Once again, children are able to see a cat and extract all that relevant information: four legs, head, tail, eyes & nose & ears with a particular shape, different than dogs, most cats fur (except for those alien-looking furless cats, of course).
It's because they're drawing a conceptual, represntational model of a hand, not a distilation of visual "hand" characteristics. That's the difference with human learning: it's based on representational model-making, which is not at all the same thing as pattern matching.
If you reverse the output of a CNN "hand" classification, it'll give you images that resemble the geometry and shading of fingers, palms, nails, knuckles, etc. -- these, I submit, are the distillation pattern matching for the actuality of "hands". Under no circumstances will it give you the five widely-separated fingers which a child draws. That's because the child-drawn hand is based on literal visual stimuli, but rather on an abstract logical model of a hand. That logical model is fully integrated with a similarly abstract model of the world, and includes functional relationships between abstractions, like the knowledge that "hands" can open "jars". The value of these being logical models rather than matched patterns is that they can then be extended to include never-before-seen objects. Confronted with a strange but roughly jar-sized object, a child can surmise that maybe it, too, can be opened with hands. That isn't pattern-matching: it's algebra.
Those activations are a representation of the matched patterns at a similar level of abstraction as the compositional model humans might come up with. They are not exactly the same as what a child would draw, but mostly because the neural network isn't trained to draw with hands. With a bit of work, I'm sure a bunch of PhDs could make a neural network model generate child-like drawings from realistic images.
See, that seems to me like a statement of faith which I just don't share. I think that building relational models of the world via abstract inductive reasoning is qualitatively different than pattern matching. I don't think there's some magic tonnage of pattern matching at which abstract inductive reasoning will suddenly emerge. I don't think that they're isomorphic. I think the AI toolkit still has a few missing pieces.
You talk about representations and reasoning but are not assessing the fact that the human brain is literally a decision maker, acting on stored procedures and memory. Any representations and any reasoning will only apply to a select scenario or select objects, regardless to how you wish to define the pattern, the fact that a subset of abstractness/generality out of the whole of existence is specified, implies a pattern that is coded for implicitly or explicitly.
My God, the level of hubris expressed by members of the cult of AI has reached a fever-pitch.
Stored procedures and memory?
Newton, in the age of clocks, managed to present the universe in the image of a clock. Is it any wonder that computer programmers present the universe in the image of the computer?
+ Brown horses on beaches
+ Some way to indicate "white" "horse".
If you have a good idea about how we can train for your problem, I'm not so convinced that it cannot be solved.
But that's not bad per se about style transfer. It's an interesting technique, but if you want to convert all horses to zebras in an image, that seems to be a bit too general for current-generation GAN architectures. Maybe it can be improved upon, or a different, novel architecture is required, and not just something you can solve by throwing more data at it.
In dutch it's effectively called a 'nose monkey' (neusaap) which makes the name easier to remember for kids.
I have three kids and I can tell you that they wouldn't be able to do this reliably until probably age 5, and even then maybe.
My 5 year old son was a monster at animal puzzles when he was 3, but we had thousands of hours with animal picture books, animal toys etc... from 4 months on.
Ironically, the problem that Dr. Hinton is attempting to solve could be characterized as being that ordinary CNNs have trouble learning what's most relevant in an image.
In that time period all of the brain infrastructure to do single shot/transfer learning at speed is developed. So showing a 10 year old a picture of an elephant with relevant label could probably be learned in a single shot. Not so with a 1 year old.
People basically ignore that it takes humans YEARS of 24/7 training on ungodly amounts of data to be able to do anything close to reliable inference on even the most basic of tasks. That's the point.
Said more clearly, I am all but certain that the logical/mathematical process of correlating identifiable and measurable attributes through iterative search is the correct approach to reach general intelligence goals.
There are many improvements in efficiency, both in data acquisition, labeling, processing etc... that will need to happen make it tractable computationally, but fundamentally I think it's the correct approach.
Where I differ from Hinton is that he seems to think that human level processing requires less data than I believe it does. It's a subtle point actually.
It's true though that we can generalise from descriptions and recognise the real thing from those. If you describe an elephant as a big grey animal with big ears and a trunk they can use to grab stuff, then an adult (not a 2 year old I suspect) seeing one for the first time, will recognise it from that description.
When we see a cartoonish drawing of one, we can still distill the defining characteristics from it and use it to create a description or recognise the real thing. We can recognise a very crude childish drawing of one by looking for these characteristics. We have a lot of additional knowledge that influences our image recognition, and having a big toolbox of general recognition of tons of different objects, we don't really need to train to recognise new objects anymore, because we will distill its identifying characteristics the first time we see it.
Computers clearly don't look like that.
So, while this pattern in particular doesn't apply to humans (it really doesn't?), many animals have ready-to-use pattern recognition when they are just hours or days old.
Why does that count as 3000 - 10,000 samples? Why is it not a single sample? I don't think our brains sample image in that way. And that might be a fundamental difference between how humans process images and how we're expecting computers to do it.
Human brain receives roughly 25-50 images worth of data per second, so less than 50 samples per second. (consciously we observe only 25 images per second).
Short-term synaptic plasticity works on a timescale of 20ms to few minutes, so also roughly 50 times per second timescale.
If I try to translate this to deep learning framework, it would mean max 500 training steps in 10 seconds per neuron.
Learning to recognize cat using _unsupervised learning_ in just 10 seconds would be really impressive.
We have some very particular biases like fear of snakes or heights, but learning to recognize spatial objects is something very general.
We need a SpaceX of Deep Learning, where a lot of learning is reused and linked in creating much larger web of knowledge about the world.
Human children see their pet from a million different viewpoints every day
That's pretty great.
Micro-movements of cat body parts and their more general character traits could be the only hint that separates two supposedly same cats.
Humans see video; it's easier to derive stuff from videos than from static photos. It's awesome we can teach a car to drive from single photos, but those models that incorporate time factor, 3D convolutions + RNNs, perform better...
We weren't given anything to refer to except that the picture is 1000x1000 rgb array, and that the cat had stripes on the body.
I wonder, if the brain has no specific function to recognize geometry first, and then deduce the object further by secondary characteristics as in today's computer vision, but it synthesizes a resulting decision in reverse from many nearly independent "circuits" (people can guess that a cat is a cat even if they look at its partial image, or even if the cat is painted purple and the person has never seen a purple cat in real life)
The real world is a pretty damn good "training environment".
The paper was authored by Sara Sabour, Nicholas Frosst, Geoffrey E Hinton in that order.
The credit from the paper should go to all researchers of course, but Hinton is main driving force behind the research.
I see the society give unproportionally large credit to the so-called "leaders", because they pioneered certain "ideas".
Histories bury too many talented individuals who are not recognized, because of some pioneers are frantically sucked all the attentions the society can give.
In professional research community, authors are listed because they are deemed a critical contributor to the work. Plain and simple.
I get why the focus is on him but they could have at least mentioned the others.
That guy has a clear passion for what he's doing, and seems immensely knowledgeable on the subject. My favourite speaker of the event by far.
I haven't heard yet from Ms. Sabour, but if Frosst is at all representative of the team (as I'm sure is at minimum the case), then it's a shame they aren't mentioned. From what I understand so far they work rather closely with Mr. Hinton on a regular basis.
Moreover, the last author is usually the most senior person or someone who funded the project; usually perceived as important or more important than the first author who "did all the work".
To be accurate, that person is considered the one who will claim the credit after the student graduates. Not that they are important to the work.
Most academic papers are mostly students work under a general instruction from the advisor.
Edit: whoops apparently I'm not allowed to question this.
Also I'd prefer to see the original content rather than a mangled Google version, there are significant issues with AMP (both technically and morally) and it would help if we don't propagate it's usage where possible. Thanks
There are no margins, and zoom is messed up on Chrome/Firefox.
Basically what you get with this AMP link goes against all known facts about how the human eyes & brain best receives textual information, unless you just happen to have a huge zoom or tiny screen.
Also, the zero margins are not helpful on a large display.
These alone make the experience grotesque and awful for me; DOM-distiller is so much better than AMP in terms of readability and layout.
(I'll admit that I don't fully understand it yet), but I think the major thing that capsules tries to fix is that a CNN only looks at a small window of the image at a time. Since the capsules aggregate more information, it can learn more general features.
Also, he notes that the paper was done on the MNIST data set (small images), and may not generalize to larger images, but the initial results are promising.
How does the human brain handle "invariance"? Not just of the spatial variety. But transformational, temporal, conceptual, and auditory invariance as well?
Some background on "columns" from bio-inspired computational neuroscience startup Numeta:
Why Does the Neocortex Have Layers and Columns,
A Theory of Learning the 3D Structure of the World
This is practically introducing AI to the real world: an object is more than the picture of it.
"this guy" is "the guy" behind deep learning revolution.
Do you mean a Halloween decoration involving LEDs that a human interprets as a representation of cat eyes? That's not really a mistake.
Or a human mistaking flashing LEDs as cat eyes? In which case I can't see how the mistake would be limited to the Halloween period.
> Hinton’s capsule networks matched the accuracy of the best previous techniques on a standard test of how well software can learn to recognize handwritten digits
Is the journalist just saying that capsule networks can perform well on MNIST? Don't most state of the art techniques perform with 99+ accuracy on MNIST?
We then tested this network on the affNIST 4 data set, in which each example is an MNIST digit with
a random small affine transformation. Our models were never trained with affine transformations
other than translation and any natural transformation seen in the standard MNIST. An under-trained
CapsNet with early stopping which achieved 99.23% accuracy on the expanded MNIST test set
achieved 79% accuracy on the affnist test set. A traditional convolutional model with a similar
number of parameters which achieved similar accuracy (99.22%) on the expanded mnist test set but
only achieved 66% on the affnist test set.
https://github.com/naturomics/CapsNet-Tensorflow Tensorflow Implementation of CapsNet.