"OMG, this is amazing, it's just like humans. We're probably close to AGI."
"Ha-ha, humans are stupid, so the algorithm giving unexpected result is just a proof that it's better and less biased."
Here, we have both in response to the same demo.
Still, I honestly don't know why some people are so biased in favor of neural nets and have zero interest in edge cases and flaws (the most interesting parts if you want to gain deeper understanding of how the algorithm actually operates). Wishful thinking, I guess.
Multistable/Bistable perception is not unique to Google Cloud vision - it afflicts humans too.
I'm looking forward to ReCAPTCHA asking "Click the images of things that are upside down."
What if it was AI looking at bacteria? Or scanning a roadside for IEDs? Or when a guy on a bike when turned and rotated the correct way appears to be a crosswalk paint mark of a guy on a bike?
If our current AI is making different “DEFINITE” determinations based only on image rotation - there is a problem.
The image is both a rabbit and a duck regardless of orientation, capturing the object it depicts as a single class with a confidence measure is a mistake.
The magnitude of this mistake becomes apparent when you connect it to real-world decision making, and it becomes highly unsafe.
As is most of ML because it only uses statistical (rather than causal) modelling of the world -- so really, it is only offering us generalised statistical associations. It cannot cope with statistical discontinuities.
The trouble with the world is that the stupid are cocksure and the intelligent are full of doubt. - Bertrand Russell
Though in this case, stupid/intelligent are probably overly harsh. Intelligent people are certainly not immune to this 'trap'. In some ways they can be even more susceptible since they may themselves know very little, but that very little is still enough to put them ahead of 80% of the rest which can yield unjustified confidence. So let's just say uninformed/informed.
Many things belong in distinct groups depending on their orientation or other physical attributes.
A bucket is, amongst other things, a bucket when standing flat on the ground with the hole facing upwards.
Turn it around, and it becomes the cover for a mole trap, place it on someones head, it becomes a rain cover.
Mount a light bulb inside it and it becomes a lampshade.
It's still a bucket, but it's not _primarily_ a bucket in all cases, and shouldn't necessarily be classified as a bucket in all cases.
Rotate the plus symbol 45 degrees and you got the letter x. And it is definitely, 100% certainly an x in one orientation and a + in another orientation.
In most fonts, the only difference is rotation and/or mirroring.
Yet they represent different sounds, they have different meanings.
Rotation is another data point, not something entirely independent from the data.
If you showed me the rabbit rotation of the picture, i'd tell you with pretty high confidence that it's a rabbit.
If you showed me the duck rotation, i'd tell you with pretty high confidence that it's a duck.
That's the point of this, it's an illusion.
And it did give a bit of an "I don't know" answer for many of the rotations in the middle of the gif/video, which is exactly as I would expect it to, and when I pause the video at those points and glance at it, it doesn't look like much of anything to me either.
I still fail to see how human brain only fall for this illusion once.
Eg clever animal camouflage. A caterpillar that looks like a snake should not be classified as ambiguous. It should always be caterpillar.
Anyway, the early layers of an NN should be performing an encoding that creates scale and rotation invariance though, so that later layers can classify. That's what makes this result interesting. Well that and the ambiguity matches the human ambiguity.
The picture is constructed to be ambiguous, and this property is preserved by the rotation: you can still easily see the duck by slightly shifting where you focus.
One mode might be more prominent at some orientation, but the ambiguity is always there so to confidently assign labels and then switch what you assign is an error. So you should be constantly switching as the two classes end up with very similar scores and the noise decides.
Either that, or you decide one label is the most appropriate and then it correctly handle the trivial rotation.
A classifier generally outputs a vector of weights, so it’s likely classifying, say [0.8, 0.75] and then the output is selecting the highest and saying “bunny”. Then you rotate it, and the classifier says [0.75, 0.8] and the output says “duck”.
This is completely reasonable on the part of the network: all things being equal, animals generally appear in certain orientations and we should prefer the interpretation of the amigbuity which respects this alignment, slightly. Example: “bill” down, it looks more like a duck because rabbits rarely have their head in that alignment, while “ears up” it looks more like a rabbit since ducks rarely hold their bills that way.
The problem is actually in how we represent probabilistic information to humans, aka “why the weather man is always wrong”, so it seems like the classifier is randomly flapping when it’s actually perfectly correctly adjusting its distribution of answers based on information we give it.
The original post actually included the predicted probabilities, which are around 80% for duck or rabbit and 0% for the other class. So the neural network really is overconfident.
I can somewhat understand how that happens, but I find it's in an interesting observation (rather than a criticism on the system, though the title is somewhat unclear and had me expect something else).
AI does makes dumb predictions from time to time, but I my opinion, this isn't that strong a case. When it rotate upside down, it does look like a rabbit even to me.
The more interesting 'failure' here to me, is that while the rotation is smooth, the prediction is not, instead it is flickering, which does raise some interesting questions about what the model's internal distribution surface looks like.
This would likely give incorrect or worse results when, eg, classifying ducks and rabbits in wildlife photos — animals come in orientations, and you’ll do a better job classifying them in practice if you respect that.
It’s also not the case that a human would classify it 50/50 in all rotations — it certainly looks more or less like one or the other as you rotate it. Humans are even program,Ed for that compromise: we see faces in one orientation much better than rotated 180 degrees.
I don't think it's meant as an example of a 'dumb prediction'. I think it's an interesting example that inspires (me at least) to think about how this recognition and classification works, and what could cause this effect.
Depending on orientation our first interpretation is also either a duck or a rabbit, because our vision is obviously biased to interpret things in the orientation they are most likely to occur based on our priors that have evolved in the presence of "up" and "down" directions. The AI correctly takes the orientation into account because it also matters in its priors, having been fed training data that captures those human priors.
Now, most computer vision algorithms do strive for some degree of rotation (and translation etc) invariance, because a classifier that gets confused when you rotate the input by 15 degrees or whatever isn't very useful in the real world. But complete rotation invariance would just be a case of Artificial Stupidity in an application that attempts to classify like a human would. Input orientation is meaningful information.
Not sure what is the purpose of it, is it to show that even computers vision algorithms can get confused by visual illusions?
You don’t just rotate the image in your mind or focus on specific features to bring out the duckiness or rabbit-ness? I can make it more duck or more rabbit as will.
When making the animation I didn't intend for the occlusion, but the fact that the occlusion causes the prediction to drop to zero is itself an interesting data point. Many objects in real life are occluded.
Like if something passes through your field of vision an you recognize it as lets say a god. But a second later you feel something was off, turn around, look at it in details and determine it to be a weird looking cat? Yeah, those "AI"s so far seem incapable of such deep thinking.
Vide "dirty mind" pictures like posting
to Clarifai https://clarifai.com/models/nsfw-image-recognition-model-e95... gives 88% for NSFW.
What do you find misleading about it?
I disagree that this is an example of "adversarial attack". The famous duck/rabbit illusion has been around since ~1892 and therefore was not deliberately constructed to be an "adversary" to image classification neural networks.
To me, it's an interesting example of feeding a well-known optical illusion to an AI algorithm and observing its behavior.
This (well known) illusion is NOT an adversarial example. Though, I explain why for people working with AI (e.g. me) the title seemed like mentioning an adversarial example. There are plenty of examples of "just rotate and a vulture becomes an orangutan" where it does not look like an orangutan for humans.
Vide: "A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations" https://arxiv.org/pdf/1712.02779.pdf
In addition, if Cloud could taste one, it would really help itself with the answer.
The interesting part (IMO) isn't that the AI classifies it as both a rabbit and a duck, but that the classification is dependent on the rotation of the picture.
That said, I the response here is likely filtered and normalized to only include "duck" and "rabbit" classifications; after all the bird looks much more like a seagull than a duck.