Hacker News new | past | comments | ask | show | jobs | submit login
How convolutional neural networks see the world (keras.io)
214 points by makmanalp on Jan 31, 2016 | hide | past | favorite | 49 comments



Question to deep learning practitioners:

Is there any research on comparing the 99.9% confidence about something wrong, to the human weakness of 'optical illusion'?

If you watch a few minutes of this Dan Ariely TED Talk [1], we humans are fooled into near certainty that the table on the left has longer vertical distance that the horizontal distance of that on the right. Even after the speaker shows this is not the case, we still "confidently see" the wrong thing. (not to mention, we suffer from non-optical illusions as well, like cognitive biases, and fallacious reasoning, as shown later in the talk).

It seems to me you guys have discovered 'machine optical illusion'. It's just that the machine has illusions about something completely different. I believe exploring the space of architectures/weights/what-not might lead towards making a machine get into the same kind of optical illusions as the human visual system gets into. At the least, it looks like a very promising research direction to me.

[1] https://youtu.be/9X68dm92HVI?t=2m22s


That is not an example of how weak our brain is. That is an example of how incredibly intelligent our brain is.

If you saw both tables in the real, physical, natural world, because of the effect of perspective, the table on the left is indeed physically longer than the width of the table on the right. Our brain is correctly inferring this from perspective/trigonometry. The visual system is designed to correctly interpret the natural world, not correctly interpret lengths of line segments on a slide. This illustrates how hard it is to create computer algorithms that robustly infer 3D from 2D (this inference requires taking cues from shading, perspective, etc.)

Almost all optical illusions are examples of the strength, not weakness, of the human brain.

EDIT: I've emphasized the word "natural". The problem with the below examples of "illusions" constructed in the physical world is this: what do you expect the brain to do otherwise? When you see a scene in the world, is it a weakness that our brain discards the (extremely unlikely) possibility the scene is just a massive hologram five inches from your eyes, or a meticulously painted and lit cardboard scene? I don't.


Many (probably most) of these optical illusions can be translated to the real physical world. Not surprising, it just requires situations that are outside the normal experiences of the human brain, especially in the ancestral environment. For instance:

https://s-media-cache-ak0.pinimg.com/736x/24/5b/59/245b59444...


I like this one http://0.media.collegehumor.cvcdn.com/70/36/483778211d5177e9...

It seems impossible a few seconds until the trick is revealed.


I was expecting that (or the T-rex variation of) when he mentioned optical illusions. The image he posted is more related to perspective than how the visual cortex works -- that is, it can even be done inside a 3d game/engine, with clever texturing.


This would be a textbook example of overfitting, yes?


Yeah I suppose you could say that our "training set" doesn't include optical illusions. Though I wouldn't really call it overfitting because our brains haven't learnt specific training inputs, which is what happens with overfitting.


I find it less interesting in terms of applications. I am sure finding the causes of these illusions will help in understanding aspects of how the system, human or machine, subject to the illusion operates; however, so many of them depend on either a static image or a fixed perspective that it just doesn't seem to be all that big of an issue to me.

The visual illusions humans are subject to are frequently defeated by some simple motion to reposition the viewer and there seem to be few illusions that can sustain being in motion themselves and still fool people. From that it doesn't seem all that revolutionary that a fixed position view of a three dimensional reality, with no fourth dimension of time, is not omniscient. It only has one bias sample or view of the scene, so why would we expect it to be omniscient?

The paper "Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images"[0] is interesting in its explorations of the issue but I still don't get the hype around "fooling" DNNs. Even if someone gets an actual video scene, a timeseries of frame after frame, that still fools some kind of DNN (perhaps a LSTM => softmax classification), it's still not all that interesting as that occasionally seems to happen to humans too.

[0]http://arxiv.org/abs/1412.1897 (web, but the pdf linked from there is ~9.5MB)


length illusion happens higher up the meat stack. You see the picture for what it is, but visual cortex gets overridden. Happens all the time, blue dress, motion blur, saccades. A ton of stuff seen by the eye is filtered out or overruled by upper layers.


I'd say that the problem is not well-defined. You are asking to create a 3d reconstruction from 2*2d image data, so no wonder that some information has to be made up.


Search for explaining and harnessing adversarial examples for some contemplation on the overly confident results.


SPOILER WARNING!

Nice read at the end of the article:

"[...] Naturally, this does not qualify as "seeing" in any human sense, and from a scientific perspective it certainly doesn't mean that we somehow solved computer vision at this point. Don't believe the hype; we are merely standing on the first step of a very tall ladder.

Some say that the hierarchical-modular decomposition of visual space learned by a convnet is analogous to what the human visual cortex does. It may or may not be true, but there is no strong evidence to believe so. Of course, one would expect the visual cortex to learn something similar, to the extent that this constitutes a "natural" decomposition of our visual world (in much the same way that the Fourier decomposition would be a "natural" decomposition of a periodic audio signal). But the exact nature of the filters and hierarchy, and the process through which they are learned, has most likely little in common with our puny convnets. The visual cortex is not convolutional to begin with, and while it is structured in layers, the layers are themselves structured into cortical columns whose exact purpose is still not well understood --a feature not found in our artificial networks (although Geoff Hinton is working on it). Besides, there is so much more to visual perception than the classification of static pictures --human perception is fundamentally sequential and active, not static and passive, and is tightly intricated with motor control (e.g. eye saccades).

Think about this next time your hear some VC or big-name CEO appear in the news to warn you against the existential threat posed by our recent advances in deep learning. Today we have better tools to map complex information spaces than we ever did before, which is awesome, but at the end of the day they are tools, not creatures, and none of what they do could reasonably qualify as "thinking". Drawing a smiley face on a rock doesn't make it "happy", even if your primate neocortex tells you so.

That said, visualizing what convnets learn is quite fascinating --who would have guessed that simple gradient descent with a reasonable loss function over a sufficiently large dataset would be enough to learn this beautiful hierarchical-modular network of patterns that manages to explain a complex visual space surprisingly well. Deep learning may not be intelligence is any real sense, but it's still working considerably better than anybody could have anticipated just a few years ago. Now, if only we understood why... ;-)"


Drawing a smiley face on a rock doesn't make it "happy", even if your primate neocortex tells you so.

That's one of the best analogies we've got with regard to "deep learning" versus reality. People around here seem to think the AI apocalypse is 3-5 years away and are rushing to fund billion dollar "sentient rock" research.


An AI doesn't have to understand the world in order to destroy it. In fact, the less it understands, the more dangerous it could be.


An AI doesn't have to understand the world in order to destroy it. It needs to be hooked up to the one technology which could credibly destroy the world: a nuclear weapons system. A very poorly designed nuclear weapons system that doesn't have any meaningful interlocks.

Myself, I'm going to remain much more worried about the natural intelligences which have been hooked up to nuclear weapons systems.


It doesn't need to be something big and scary like a nuclear weapons system.

It could be hooked up to the stock market, and it could make entirely rational decisions based on its objective (profitable trades) and these actions can result in imbalances leading to famine in certain regions, increased pollution, unsustainable depletion of natural resources etc.

We are already hooked up. The AIs are just amplifications of our own narrowly focused objectives.


Algorithmic trading systems have already gone haywire, repeatedly in fact. All major stock markets "circuit breaker" systems in place to halt trading in the event of extreme volatility for specifically this reason.

The sort of long range damaging activities you mention are unlikely though, as algorithmic trading systems in general take their long term cues from humans.


Then it has already happened[0], but few seem to recognize it.

[0] https://en.wikipedia.org/wiki/2010_Flash_Crash


There's two lines of thought. Either a.) an AI will have the mental capacity of a 2 year old with the powers of a god — or — b.) an AI will have the mental capacity of a god (not a dumb Abrahamic god, but more like a universal atman) also combined with the powers of a god.

If you're in the "2 year old with unlimited power" camp, nothing can save us and everything is futile and we should all just eat drink and be merry for tomorrow the AI kills us all.

If you're in the AI-as-enlighened-buddha camp, the godlike AI will either save us all — or — just leave us alone to solve our own problems (while potentially locking out future godlike-AI development so we don't do too much runaway damage (eschaton, etc)).


Those are the only two lines of thought? Don't be INSANE. Where the hell do people get this shit from, bad sci-fi movies? I'm sorry, I don't really mean you're actually stupid or anything, it's just.... HOW ON EARTH could anyone with a passing familiarity with computer science possibly arrive at this sort of expectation? I don't understand!!

The powers of a GOD? What does that even mean? No AI is going to be able to go "Let there be light!" and make there be light. Heck, no AI is going to be able to go "I will hack into this camera and spy on you!" without either spending the requisite CPU-hours to crack the passwords or encryption protecting it, or analyzing all its attack surface for weaknesses like a hacker. Computational complexity is REAL, P does not and never will equal NP (we just don't know how to prove it yet), and there are real physical limits on the computing power that you can fit inside a given volume and its energy budget.

AT ITS BEST, an AI will have the same powers as a civilization of humans working together using computers the old-fashioned way, only faster.


Well, P/NP really has almost no bearing on this problem. That is a theoretical problem and even if P=NP, the algorithm could have a ginormous constant or degree. Conversely even if P=/=NP, the problem might be very easy to solve at human timescales with advanced enough algorithms/processing speed.


P/NP has direct bearing on (a) how easy it is for an AI entity to HACK ALL THE INTERNETS that are accessible to it but protected with (NP) cryptography and stuff, and (b) how easy it is for an AI entity to design the next generation of itself in advance of this SINGULARITY APOCALYPSE I keep hearing about (and designing a better computer is probably a problem in NP as well, to say nothing of manufacturing concerns).


And yet this is how people seem to view it (even looking at the "serious" conversations on singularityhub and similar places). Their view of has this real idiot/god dichotomy going on, dramatized so that it only takes seconds from "it's just a dumb computer" to "it's become self-aware" to "it's taken over the entire internet and built an army of robots!"

Thinking about it, it probably comes from our general perceptions (based on conventional software development) that computers will either not do something at all, or they'll do it blindingly fast. And even people who work with computers often don't really grok the difference between multiplying two big matrices, and pondering the best way to approach an unsolved problem.


An AI will be able to learn everything civilization knows. It will be able to work on problems, without rest, thousands of times faster than the entire human race.

Quick example: In order to get the passwords this is what an AI can do.

1: build tiny robots. Something resembling a fruit fly.

2: robot fruit fly waits for appropriate person to use password.

3:see what password was used

4:mission accomplished

This is just one way. I'm sure the AI can figure out simpler ways to get the passwords. Tell me this isn't god like. It Is like arguing that we'll never go to the moon.

If evolution was able to invent intelligence so can we. And it will be a god. It is only a matter of time.

And I'm going to build it. (Evil Maniacal Laugh)


Why will an artificial general intelligence automatically be "thousands of times faster than the entire human race"? Are you assuming that this theoretical AI algorithm is trivial in terms of computational complexity?


I'm assuming we can just throw more hardaware to the problem. I expect strong AI to be highly parallelizable which will make it quite eazy to scale up.

Yes, I think it is trivial. At least it will be in hindsight. The brain exists, it is proof that AI is possible just as birds were proof that flight was possible. We just need to discover the easiest way to implement it.


Look at his username. He's a trollacter.


Human brains are only so powerful. There are thoughts that literally cannot happen in a human brain due to size and speed limitations, and thus can never be thought by humans. AIs would have a much higher information density, approaching the theoretical limits for our universe.

This alone is enough to reconsider the argument you are making.

While an AI wouldn't have the power to fundamentally change the universe or defy computational complexity -- what they could do would be near enough to godlike in comparison to humans that such a fact barely matters.


The problem isn't that "AI" would have unlimited power. The problem is that foolish people want to put non-interpretable machine-learning models in charge of expensive and valuable stuff, because they see a model that sorta-if-you-squint-your-eyes seems clever-ish and hype themselves into calling it "AI".


I wonder what exactly goes wrong when people think there are only two possibilities. Black and white aren't the only colors.


What I find interesting is that the auras that I get as a precursor to my migraines don't look far off from those images.

If you were to take a C shaped slice and overlay it onto video and then animated the color you would pretty much have it.

It would be interesting to see if they were connected somehow...


Sounds like you are describing "scintillating scotoma", if you wanted a name for that visual distortion.

https://en.wikipedia.org/wiki/Scintillating_scotoma


That's interesting, it also resembles the patterns I see on LSD or 2C-B


Yes, same here!

I get the vivid flickering lights occasionally (in a C shaped slice as you describe), but luckily don't get the headaches.


What would happen if you generated a lot of images like this, tagged them as "conv-net filter", and then added them to the set of images that the neural nets trained on?

Would the network learn more discriminating filters for everything else?


Very interesting question and worth an experiment or two. Personally, I don't think it would help because the patterns are complex and it is an internal representation. It would be like feeding it a hash values and labelling them "hash values". That wouldn't necessarily help you discriminate pre-hashed values. That said, I would definitely be interested in the results of a real test.


Just really curious? What is up with ML on HN today? So many posts about ML in a single day.

After reading this article I have to say even humans have hard time actually understand images and patterns like the one shown in the article, let alone a machine. I wonder what would a machine say about the famous "is this dress blue or grey" photo last year.


Machines do not have to think or see in the same way as humans. To dismiss machine capablities simply because they do not resemble human intelligence seems hopelessly naive.


    we are merely standing on the first step of a very tall ladder.
...

    Deep learning may not be intelligence is any real sense, but it's still working considerably better than anybody could have anticipated just a few years ago. Now, if only we understood why... ;-)
Yes I'd agree with this, there needs to be a deeper level analysis for how the cognitive functions within the brain, for example how vision in the brain works. Such understanding will clearly take a great amount of effort to achieve, and I don't know if research enough research is aimed in this deeper theoretical direction, as opposed to neural net type machine learning.


A little of–topic.

There is something interesting to be said about what kind of awareness a future AI (or whatever we should call it) will simulate the world.

Imagine what kind of perspectives are possible when thousands or millions of input sources are your senses.


A little pedantic: you already have thousands of different input sources, from the classic "five senses" to less conscious awareness of things like your limb placement and internal biochemistry (e.g., hunger). There are even people who've "added" artificial senses by doing things like placing magnets under their skin (to sense magnetic fields) or strapping a device to their leg that buzzes in the direction of North. Not to mention the fact that we can use things like visually inspecting a screen to access senses (via information feeds) that are not available biologically.

However, you're right that a robot/AI developed with the intention of feeding it all sorts of heterogeneous data will probably be able to process everything more effectively. In my own research (AI with a focus on reinforcement learning and robotics) I am sometimes surprised by how effective agents can be at making sense of their input streams. For example, an experiment will not go the way you expect because the robot can trivially solve a maze via sensing the current in the wiring beneath the floor.

Of course, there's a limit in terms of how effective raw information can be. Humans don't need to see ultraviolet wavelengths because in general the spectrum of ~350-700nm provides all the information we need, and the brain is good at finding the salient aspects of what we see. If you just connect a new sensor to a robot, it might improve its ability to understand the world, or do nothing at all, because it can't incorporate this new information into its representation effectively. Or it doesn't add anything new, or at least nothing that it couldn't have figured out from existing input streams.

For example, adding a stock ticker feed to your robot would probably not help it solve a particular task, unless your robot happens to be 50 feet tall and the task in question is "rampaging down Wall Street".


Hmm sure but we are talking about a completely different scale here.

I mean as an AI you are connected to the whole planets sensors, you have the whole worlds knowledge in your possession and will be able to cross reference with what you are getting as input. You can prototype, do scenario planning on the fly, you can calculate and so on. Furthermore you are potentially getting inputs from other humans too and have the ability to mostly likely control a number of things which again provide new input.


>Imagine what kind of perspectives are possible when thousands or millions of input sources are your senses.

Doesn't this exactly describe human sensory input? Though our brain is efficient by throwing out most of the data early on in the signal chain (as research has revealed in vision and auditory input). Will future AI also need to be as efficient?


Doesn't this exactly describe human sensory input?

Well, consider having a 360º array of 30 cameras all integrated into a perfect spherical sensory experience. It's something we can't really imagine experiencing natively, but it would be trivial for eBrains to coalesce visual systems that way from eBirth.

Our bodies have lots of low bitrate sensors like billions of individual sensory nerves distributed throughout our bodes (and they are each individually addressable in the brain), but we don't think of "touch" as a sense to "computationalize" like vision or sound or language.

One amusing thing about AI sensors: nobody ever talks about superhuman smell. Where are the quantum AI noses?


Biochemical sensors, like what that discredited blood test company theranos tried to do, is a sort of superhuman smell.


But at a completely different scale though. You have to imagine a whole planets sensors as your sources combined with the whole planets knowledge and so on.

I don't think it would make much sense to compare with the limited POV we are experiencing the world from.


Deep neural nets do precisely this. Though the specifics change from implementation to implementation, in general there is a large drop in the number of hidden units at each layer of the architecture.


Its striking how similar CNNs are to good old JPEG compression. Makes me think you would get better results by chopping up input into blocks, and slapping few more layers on top. From the article description I get that VGG16 tries to guess whole picture content from existence of particular patterns totally ignoring their location and arrangement, hence nonsensical magpie prediction. Photoshop magpie with a beak sticking out its ass and it will pass as genuine because all prerequisite patterns are there.


If you took every image agreeing with your conception of a magpie and composed it into one image, would it look anything like a magpie? The Google inceptionism team did the same experiment but supplied the networks with natural image priors. Perhaps hand-picked, but those examples were actually quite convicining.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: