
Do neural networks dream of electric sheep? - sp332
http://aiweirdness.com/post/171451900302/do-neural-nets-dream-of-electric-sheep
======
pitchups
This example, and others like it point to the central weakness of neural
networks for image recognition: No matter how much data you feed it, they
never really develop concepts or abstractions of what the objects it is
classifying really represent or mean. The weight and biases that get fine
tuned by gradient descent, are no more than a highly complex function mapping
the input pixels to discrete classes. While this may well represent how the
visual cortex works at the lowest level, what appears to be missing are higher
levels of abstraction and meaning. Perhaps machine learning needs to be
coupled with some of the older paradigms of AI which included modeling, logic,
reasoning, to achieve understanding. As of right now, a well trained
convolutional neural network is no more than a mechanical pattern matching
algorithm on steroids.

~~~
joe_the_user
_This example, and others like it point to the central weakness of neural
networks for image recognition: No matter how much data you feed it, they
never really develop concepts or abstractions of what the objects it is
classifying really represent or mean._

This is an excellent point but it begs for an answer to the question "what
does ' _really mean_ ' mean?" What are all the ways a human can determine what
a picture "really means" and which of these methods can be used in a given
picture?

We know dogs have certain shapes and goats have certain shapes. Other entities
have different characteristics. We can explain how we think we reach
conclusions. How we actually the conclusions is likely different and may or
may not involve "pattern matching steroids" for a given case - what's more
definite is we try to reconcile our conclusions between the example so they
involve a single consistent picture of the world. Is determining "what a
picture really means?"

~~~
fizx
Meaning exists in relationships, which its clear that the current generation
of AI learns. An example is word2vec, which can learn that king - man + woman
= queen, and simultaneously king - man + boy = prince, etc.

The current generation of image recognition is really missing an understanding
of physics and 3d space. There's no understanding of what would happen if a
dog moves its head around.

The next generation of algorithms might fix this. Some people are excited
about "capsule networks", which are supposed to learn features that are able
to be rotated significantly without breaking.

~~~
Scene_Cast2
I haven't seen any follow-ups on capsule networks since their big splash half
a year ago. I'm guessing follow-up projects have a research latency of a year.

------
minimaxir
@picdescbot is a Twitter bot which uses the same Microsoft Azure Computer
Vision API endpoint for image captioning:
[https://twitter.com/picdescbot](https://twitter.com/picdescbot)

It's more accurate than you'd think, and as this article notes, the mistakes
are indeed funny:
[https://twitter.com/picdescbot/status/968561437126938625](https://twitter.com/picdescbot/status/968561437126938625)

------
GuiA
It's okay, humans make these kind of mistakes too.

A friend of mine has a young son (~2-3 years old), and a cat named Mono.

Her son knows the cat is named Mono - he plays with her everyday.

But when they go out for a walk, any 4 legged animal he sees is also a "Mono".

Fortunately for her son, his developing, extremely plastic brain will soon
know how to differentiate Mono the cat from a random dog on the street (unlike
neural networks which we will need to entirely redesign, rebuild, and retrain
to get similar progress).

~~~
DFHippie
It could be that your friend's son has an overly generous sense of what Mono
is. It could be that he knows precisely which individual entity is Mono but
misunderstands what "Mono" denotes/refers to. It could be somewhere in
between.

If a neural network has a representation of entities in the world apart from
language referring to those entities, that's would be awesome. I'm guessing
we're not there yet though.

~~~
_emacsomancer_
It could also just be limited vocabulary and that his outside use of "Mono"
means something like "a creature which is somewhat similar to Mono".

~~~
DFHippie
It could be he knows who Mono is and he knows "Mono" just refers to Mono, but
there are a lot of other things in the world that he wants to talk about and
he knows the grownups will be able to piece thing together. I do that often
enough. "Fred or whoever, that guy, did that thing ..." If neural networks
were doing _that_ , that would be more awesome still.

------
danbruc
That left me a bit disappointed about the current - I assume the tested
detectors are at least somewhat state of the art - capabilities of neural
network for object identification. But thinking about it also makes it
somewhat obvious that they are flawed in this way.

Humans identify objects by looking at how different parts are geometrically
located and connected, possibly in a hierarchical fashion, and what basic
shapes, colors and textures those parts have. A sheep is a body, four legs,
hoofs, a tail, a head with its characteristic shape, the ears, mouth, nose and
all those come with characteristic textures and colors.

And because there are so many features and their relations, it is quite hard
to fool humans, you can hide or change quite a few of them. We also have a lot
of background knowledge, a bright orange sheep might be unusual but we also
have a pretty good idea of how hard it is to change the color of a sheep.

I naively expected neural networks to also learn those features but there is
just no pressure for them to do so. They mostly see common objects in common
situations and there just looking for a patch of wool-colored, wool-textured
fur might be enough to identify the presence of sheep correctly almost all the
time. Or if sheep are mostly depicted in a characteristic environment it might
be good enough to just identify landscape features and ignore the sheep
altogether.

I would guess that it is in general not really feasible to come up with enough
contrived, adversarial examples to force neural networks to learn the
important parts and relations of different objects just by starring at many
images. I think one would have to hard-wire some knowledge about space,
spatial relations, occlusion, shapes and the like into a system to really get
it to learn what a sheep is in a similar way as humans do without heavily
increasing the risk of over-fitting.

~~~
notahacker
Also, our grasp of sheep morphology is based on an abstract idea that the
sheep is qualitatively different to and more significant than the surrounding
fields and rocks which are roughly the same size, shape and colour as sheep.
Unlike the ML process, this concept of sheep or more unfamiliar mammals as
animate creatures which might be friend, foe or food exists independently of
how many sheep I've seen before and the language used to classify them, and is
probably instinctual rather than learned.

Show me a couple of images of fields full of sheep classified as _oveja_ or
_Schafe_ and I might make the same learning error as the ML process and think
the word refers to the [general pattern of] surrounding field or hills. But
show me a further image of _oveja_ outside a field - even a close up of an
_oveja_ that doesn't resemble those in field photos in any way - and I'll
grasp the meaning of the term straight away. Needless to say I'm also less
likely to stumble over conceptual links between the names of animals and
tastes of food, types of clothing etc which are independent of the living
animal's morphology

------
lucideer
Maybe I'm oversimplifying but this seems like an obviously simple* problem to
solve: as humans, before we can identify what objects are, we start out with
depth perception and object detection. If NNs were trained on a dataset of
imagery where autodetected object outlines were tagged, rather than simply
tagging or describing an entire image, and then run with built in object
detection and depth perception, I suspect the results would be pretty good.

I know the likes of Google and Facebook are already doing precisely this with
human faces, but we'd need a more generalised object detection algorithm
before the examples of sheep in the article would be reliably identified.

* I used the word "simple", as distinct from "easy": for example, creating a training dataset might be a challenge.

~~~
fizx
Like most obvious things, scientist have tried what is known in the field as
"image segmentation". If you google it, you'll see a bunch of demos and
papers.

I haven't seen anything in the literature about incredibly effective
implementations of 3D understanding and/or depth perception at the level you'd
perhaps hope, but there is some progress being made.

------
jquip
Categorical divisions and their names are being refined at that age. Which is
why the most familiar name is allocated to the entire group of similar
entities. Later on, categories become refined and also their namings.

------
wordpressdev
Reminds me of Philip K. Dick's book Do Androids Dream of Electric Sheep? made
into the cult movie Blade Runner.

------
ibdf
This was a great read. Thank you!

