
ImageNet-trained CNNs are biased towards texture (2019) - zdw
https://openreview.net/forum?id=Bygh9j09KX
======
stared
They are, and it is not that new; the interesting part is how to solve that.

For image classification, this bias for textures may be passable. For image
generation, it would produce atrocious results, especially for faces. To the
point, that people devised some methods to overcome this, especially with
progressive networks (e.g. proGAN). See
[https://towardsdatascience.com/progan-how-nvidia-
generated-i...](https://towardsdatascience.com/progan-how-nvidia-generated-
images-of-unprecedented-quality-51c98ec2cbd2) for the proGAN explanation - it
starts from training on 4x4 pixel images, to get the overall shape.

------
kzrdude
Texture is area-filling, while silhouette is "just a line", so it sounds
natural that texture areas would weigh more in an image? If there is both
texture and contour "signal" in the labeled inputs, how to pick what get's
weighted more? This is supposed to be learned by training.

~~~
mbeex
I think, this is one of the reasons to use much more video-based training sets
and these in time-correct frame order. A moving boundary is something quite
significant, but these should possibly not exclusively inspected through a
hole ('convolutinal')

~~~
mbeex
Oh, forgot an 'o'.

~~~
mkl
Did you know you can edit your comments for two hours? Click "edit" above the
comment.

[https://github.com/minimaxir/hacker-news-
undocumented](https://github.com/minimaxir/hacker-news-undocumented)

~~~
mbeex
This is actually new to me. Thank you!

------
wodenokoto
My favorite titled ml article [1], from 2015, seems to argue for the same
conclusion - that it is textures not shapes that drives recognition in CNN

[1] “Suddenly, a leopard print sofa appears”,
[http://rocknrollnerd.github.io/ml/2015/05/27/leopard-
sofa.ht...](http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html)

~~~
modeless
> it is textures not shapes that drives recognition in CNN

This is a misinterpretation of the paper. It's not CNNs that are biased toward
textures, it's the ImageNet dataset. As they prove by training a CNN biased
toward shapes, using a modified dataset.

------
angry_octet
I'm curious if stereo vision and eyepoint movement has a benefit for humans in
distinguishing shape vs texture, and perhaps that could be a promising
approach for ML, but are there any databases for that? There are car focused
ones like KITTI, but any more generalist ones?

------
mbeex
Quanta article:

[https://www.quantamagazine.org/where-we-see-shapes-ai-
sees-t...](https://www.quantamagazine.org/where-we-see-shapes-ai-sees-
textures-20190701/)

------
jasdfijsflj
Style transfer is done using neural networks, so it should be natural that the
resultant image fools a neural network classifier. On the other hand, for the
filled silhouette, the classifier network was pretty close to human.

~~~
w_t_payne
I don't think that's what's happening here.

------
DeathArrow
What would be a better dataset to train CNNs?

~~~
kleiba
Pro tip: read more than just the headline.

