
Feature Visualization with Convolutional Neural Networks (in TensorFlow) - strikingloo
https://www.datastuff.tech/machine-learning/feature-visualization-convolutional-neural-networks-keras/
======
p1esk
Cool images, and nice explanations, thanks for posting!

Couple of questions:

1\. Why do you have to start with a small initial image? What happens if you
start optimizing a large image?

2\. Intermediate layers typically have lots of feature maps (64-512). Each
individual feature map can be thought of as a color channel. In your
experiments, you always optimize triplets of feature maps (I assume you use
the same code for inputs to every layer). So this is not exactly how it's
happening during normal training - there's not easy way to visualize an image
which has 512 "color" channels, right? Other than looking at each individual
feature map (where there would be no colors).

~~~
strikingloo
Hi, those are really good questions! I'll do my best.

1_ I started with a small one and then kept rescaling because the literature
said convergence was faster and the results were "better" that way, but to be
honest I didn't even try starting from a big image. I just kinda took their
word for it.

For a (not too academic, but very well put) example, see this:
[https://towardsdatascience.com/how-to-visualize-
convolutiona...](https://towardsdatascience.com/how-to-visualize-
convolutional-features-in-40-lines-of-code-70b7d87b0030) If you start with a
big image, you get whatever "shape" the filter is generating, many times, but
kind of small. If you start with a small image and keep rescaling, you get
less of the shapes, each of them bigger.

That makes for subjectively cooler visuals, and that's pretty much it.

2_ I'm not entirely sure I get this one. Don't the filters apply to the three
(RGB) channels simultaneously? Or do you mean a filter actually refers to the
transformation for a single channel rather than the one to a whole pixel?
Other than that, with regards to the ease of visualization, it's actually kind
of hard to display all of the filters in a visually interesting way that isn't
just "Here's the 240 filter images I made with this layer", so I had to make a
trade-off between 'here are some of the cooler images' and 'I want each layer
I tried to be represented in this'. I'm still sort of new to presenting visual
information of this kind, so if you have any advice on how to display more of
the images, I'm all ears.

Have a nice weekend!

~~~
p1esk
1\. Ok, makes sense, however this is something to keep in mind when we try to
gain understanding of how NNs work. In this case, the actual features which
excite the filters the most might be even more "texture-like" than what you
produced. This could be important, if, say, we wanted to argue that NNs pay
more attention to textures than to shapes.

2\. Let me try to explain this better. If we want to optimize the inputs to
the first layer, these inputs are actual RGB images. So what you're doing in
line 35 is correct:

    
    
      input_img_data = np.random.random((1, 3, intermediate_dim[0], intermediate_dim[1])) 

but in the next layer, the inputs have more channels than 3. So you should be
optimizing images of the shape the layer expects (replacing 3 with the number
of channels of that layer). If you do it like that, then the image you want to
visualize - the image that excites the outputs of an intermediate layer the
most - will have more channels than 3, and therefore it's not clear whether we
should pick arbitrary triplets of channels to use as RGB channels, or if we
should visualize each channel (each input feature map) individually, in which
case it would have no color.

I guess my question should be: when you show optimized inputs to the second or
third VGG layers where does the color information comes from? Also, you
probably shouldn't call them "filters", because the filters in this context
are the actual convolutional layer weight kernels, and are 3x3 in size (each
filter).

~~~
strikingloo
1_ I hadn't thought of that. To be honest, my main concern was generating the
most aesthetically interesting pictures I could. However the "textures" you'd
get from optimiing for the last layers would really be just repeating shapes,
albeit maybe a bit smaller than you're seeing in my post. Kinda like zooming
out.

2_ Oh ok, I get it now. But that's not what the program is doing. I don't
generate a vector for the single layer I am maximizing, I generate an image
and run the whole CNN from input to that layer, all through. Then I optimize
for that layer's output.

Edit to add: As stated in the post, when I "maximize for a filter", the exact
function I am maximizing is the average output for that filter over the whole
image.

~~~
p1esk
Oh interesting. Then I have more questions :)

1\. If you try to optimize an image that would maximize all layers outputs,
how would it look like?

2\. After you optimized an image to maximize some layer outputs, how do those
outputs look like?

3\. If you try to do what I thought you were doing - optimizing not the RGB
image, but the individual layer inputs (feature maps), how would those feature
maps look like? You could either plot each individual feature map, or pick top
3 with the strongest signals and combine them as RGB channels.

