
How convolutional neural networks see the world - makmanalp
http://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
======
fizixer
Question to deep learning practitioners:

Is there any research on comparing the 99.9% confidence about something wrong,
to the human weakness of 'optical illusion'?

If you watch a few minutes of this Dan Ariely TED Talk [1], we humans are
fooled into near certainty that the table on the left has longer vertical
distance that the horizontal distance of that on the right. Even after the
speaker shows this is not the case, we still "confidently see" the wrong
thing. (not to mention, we suffer from non-optical illusions as well, like
cognitive biases, and fallacious reasoning, as shown later in the talk).

It seems to me you guys have discovered 'machine optical illusion'. It's just
that the machine has illusions about something completely different. I believe
exploring the space of architectures/weights/what-not might lead towards
making a machine get into the same kind of optical illusions as the human
visual system gets into. At the least, it looks like a very promising research
direction to me.

[1]
[https://youtu.be/9X68dm92HVI?t=2m22s](https://youtu.be/9X68dm92HVI?t=2m22s)

~~~
brianchu
That is not an example of how weak our brain is. That is an example of how
incredibly intelligent our brain is.

If you saw both tables in the real, _physical, natural_ world, because of the
effect of perspective, the table on the left is indeed _physically_ longer
than the width of the table on the right. Our brain is correctly inferring
this from perspective/trigonometry. The visual system is designed to correctly
interpret the natural world, not correctly interpret lengths of line segments
on a slide. This illustrates how hard it is to create computer algorithms that
robustly infer 3D from 2D (this inference requires taking cues from shading,
perspective, etc.)

Almost all optical illusions are examples of the strength, not weakness, of
the human brain.

EDIT: I've emphasized the word "natural". The problem with the below examples
of "illusions" constructed in the physical world is this: what do you expect
the brain to do otherwise? When you see a scene in the world, is it a weakness
that our brain discards the (extremely unlikely) possibility the scene is just
a massive hologram five inches from your eyes, or a meticulously painted and
lit cardboard scene? I don't.

~~~
jessriedel
Many (probably most) of these optical illusions can be translated to the real
physical world. Not surprising, it just requires situations that are outside
the normal experiences of the human brain, especially in the ancestral
environment. For instance:

[https://s-media-cache-
ak0.pinimg.com/736x/24/5b/59/245b59444...](https://s-media-cache-
ak0.pinimg.com/736x/24/5b/59/245b59444cdcff14d139df7c37b7e271.jpg)

~~~
im2w1l
I like this one
[http://0.media.collegehumor.cvcdn.com/70/36/483778211d5177e9...](http://0.media.collegehumor.cvcdn.com/70/36/483778211d5177e9b7b74dbc8bbc8290-amazing-
floating-paper-cube-illusion.gif)

It seems impossible a few seconds until the trick is revealed.

~~~
dr_zoidberg
I was expecting that (or the T-rex variation of) when he mentioned optical
illusions. The image he posted is more related to perspective than how the
visual cortex works -- that is, it can even be done inside a 3d game/engine,
with clever texturing.

------
akavel
SPOILER WARNING!

Nice read at the end of the article:

 _" [...] Naturally, this does not qualify as "seeing" in any human sense, and
from a scientific perspective it certainly doesn't mean that we somehow solved
computer vision at this point. Don't believe the hype; we are merely standing
on the first step of a very tall ladder._

 _Some say that the hierarchical-modular decomposition of visual space learned
by a convnet is analogous to what the human visual cortex does. It may or may
not be true, but there is no strong evidence to believe so. Of course, one
would expect the visual cortex to learn something similar, to the extent that
this constitutes a "natural" decomposition of our visual world (in much the
same way that the Fourier decomposition would be a "natural" decomposition of
a periodic audio signal). But the exact nature of the filters and hierarchy,
and the process through which they are learned, has most likely little in
common with our puny convnets. The visual cortex is not convolutional to begin
with, and while it is structured in layers, the layers are themselves
structured into cortical columns whose exact purpose is still not well
understood --a feature not found in our artificial networks (although Geoff
Hinton is working on it). Besides, there is so much more to visual perception
than the classification of static pictures --human perception is fundamentally
sequential and active, not static and passive, and is tightly intricated with
motor control (e.g. eye saccades)._

 _Think about this next time your hear some VC or big-name CEO appear in the
news to warn you against the existential threat posed by our recent advances
in deep learning. Today we have better tools to map complex information spaces
than we ever did before, which is awesome, but at the end of the day they are
tools, not creatures, and none of what they do could reasonably qualify as
"thinking". Drawing a smiley face on a rock doesn't make it "happy", even if
your primate neocortex tells you so._

 _That said, visualizing what convnets learn is quite fascinating --who would
have guessed that simple gradient descent with a reasonable loss function over
a sufficiently large dataset would be enough to learn this beautiful
hierarchical-modular network of patterns that manages to explain a complex
visual space surprisingly well. Deep learning may not be intelligence is any
real sense, but it 's still working considerably better than anybody could
have anticipated just a few years ago. Now, if only we understood why... ;-)"_

~~~
seiji
_Drawing a smiley face on a rock doesn 't make it "happy", even if your
primate neocortex tells you so._

That's one of the best analogies we've got with regard to "deep learning"
versus reality. People around here seem to think the AI apocalypse is 3-5
years away and are rushing to fund billion dollar "sentient rock" research.

~~~
Afforess
An AI doesn't have to understand the world in order to destroy it. In fact,
the less it understands, the more dangerous it could be.

~~~
seiji
There's two lines of thought. Either a.) an AI will have the mental capacity
of a 2 year old with the powers of a god — or — b.) an AI will have the mental
capacity of a god (not a dumb Abrahamic god, but more like a universal atman)
also combined with the powers of a god.

If you're in the "2 year old with unlimited power" camp, nothing can save us
and everything is futile and we should all just eat drink and be merry for
tomorrow the AI kills us all.

If you're in the AI-as-enlighened-buddha camp, the godlike AI will either save
us all — or — just leave us alone to solve our own problems (while potentially
locking out future godlike-AI development so we don't do too much runaway
damage (eschaton, etc)).

~~~
fennecfoxen
Those are the only two lines of thought? Don't be INSANE. Where the hell do
people get this shit from, bad sci-fi movies? I'm sorry, I don't really mean
you're actually stupid or anything, it's just.... HOW ON EARTH could anyone
with a passing familiarity with computer science possibly arrive at this sort
of expectation? I don't understand!!

The powers of a GOD? What does that even mean? No AI is going to be able to go
"Let there be light!" and make there be light. Heck, no AI is going to be able
to go "I will hack into this camera and spy on you!" without either spending
the requisite CPU-hours to crack the passwords or encryption protecting it, or
analyzing all its attack surface for weaknesses like a hacker. Computational
complexity is REAL, P does not and never will equal NP (we just don't know how
to prove it yet), and there are real physical limits on the computing power
that you can fit inside a given volume and its energy budget.

AT ITS BEST, an AI will have the same powers as a civilization of humans
working together using computers the old-fashioned way, only faster.

~~~
zornthewise
Well, P/NP really has almost no bearing on this problem. That is a theoretical
problem and even if P=NP, the algorithm could have a ginormous constant or
degree. Conversely even if P=/=NP, the problem might be very easy to solve at
human timescales with advanced enough algorithms/processing speed.

~~~
fennecfoxen
P/NP has direct bearing on (a) how easy it is for an AI entity to HACK ALL THE
INTERNETS that are accessible to it but protected with (NP) cryptography and
stuff, and (b) how easy it is for an AI entity to design the next generation
of itself in advance of this SINGULARITY APOCALYPSE I keep hearing about (and
designing a better computer is probably a problem in NP as well, to say
nothing of manufacturing concerns).

------
cjenken
What I find interesting is that the auras that I get as a precursor to my
migraines don't look far off from those images.

If you were to take a C shaped slice and overlay it onto video and then
animated the color you would pretty much have it.

It would be interesting to see if they were connected somehow...

~~~
mej10
Sounds like you are describing "scintillating scotoma", if you wanted a name
for that visual distortion.

[https://en.wikipedia.org/wiki/Scintillating_scotoma](https://en.wikipedia.org/wiki/Scintillating_scotoma)

------
mdonahoe
What would happen if you generated a lot of images like this, tagged them as
"conv-net filter", and then added them to the set of images that the neural
nets trained on?

Would the network learn more discriminating filters for everything else?

~~~
dhj
Very interesting question and worth an experiment or two. Personally, I don't
think it would help because the patterns are complex and it is an internal
representation. It would be like feeding it a hash values and labelling them
"hash values". That wouldn't necessarily help you discriminate pre-hashed
values. That said, I would definitely be interested in the results of a real
test.

------
yeukhon
Just really curious? What is up with ML on HN today? So many posts about ML in
a single day.

After reading this article I have to say even humans have hard time actually
understand images and patterns like the one shown in the article, let alone a
machine. I wonder what would a machine say about the famous "is this dress
blue or grey" photo last year.

------
tariqali34
Machines do not have to think or see in the same way as humans. To dismiss
machine capablities simply because they do not resemble human intelligence
seems hopelessly naive.

------
Synaesthesia

        we are merely standing on the first step of a very tall ladder.
    

...

    
    
        Deep learning may not be intelligence is any real sense, but it's still working considerably better than anybody could have anticipated just a few years ago. Now, if only we understood why... ;-)
    

Yes I'd agree with this, there needs to be a deeper level analysis for how the
cognitive functions within the brain, for example how vision in the brain
works. Such understanding will clearly take a great amount of effort to
achieve, and I don't know if research enough research is aimed in this deeper
theoretical direction, as opposed to neural net type machine learning.

------
ThomPete
A little of–topic.

There is something interesting to be said about what kind of awareness a
future AI (or whatever we should call it) will simulate the world.

Imagine what kind of perspectives are possible when thousands or millions of
input sources are your senses.

~~~
sigmar
>Imagine what kind of perspectives are possible when thousands or millions of
input sources are your senses.

Doesn't this exactly describe human sensory input? Though our brain is
efficient by throwing out most of the data early on in the signal chain (as
research has revealed in vision and auditory input). Will future AI also need
to be as efficient?

~~~
seiji
_Doesn 't this exactly describe human sensory input?_

Well, consider having a 360º array of 30 cameras all integrated into a perfect
spherical sensory experience. It's something we can't really imagine
experiencing natively, but it would be trivial for eBrains to coalesce visual
systems that way from eBirth.

Our bodies have lots of low bitrate sensors like billions of individual
sensory nerves distributed throughout our bodes ( _and_ they are each
individually addressable in the brain), but we don't think of "touch" as a
sense to "computationalize" like vision or sound or language.

One amusing thing about AI sensors: nobody ever talks about superhuman smell.
Where are the quantum AI noses?

~~~
tamana
Biochemical sensors, like what that discredited blood test company theranos
tried to do, is a sort of superhuman smell.

------
rasz_pl
Its striking how similar CNNs are to good old JPEG compression. Makes me think
you would get better results by chopping up input into blocks, and slapping
few more layers on top. From the article description I get that VGG16 tries to
guess whole picture content from existence of particular patterns totally
ignoring their location and arrangement, hence nonsensical magpie prediction.
Photoshop magpie with a beak sticking out its ass and it will pass as genuine
because all prerequisite patterns are there.

------
arketyp
If you took every image agreeing with your conception of a magpie and composed
it into one image, would it look anything like a magpie? The Google
inceptionism team did the same experiment but supplied the networks with
natural image priors. Perhaps hand-picked, but those examples were actually
quite convicining.

