Cool images, and nice explanations, thanks for posting! Couple of questions: 1. ...

strikingloo · on May 30, 2020

Hi, those are really good questions! I'll do my best.

1_ I started with a small one and then kept rescaling because the literature said convergence was faster and the results were "better" that way, but to be honest I didn't even try starting from a big image. I just kinda took their word for it.

For a (not too academic, but very well put) example, see this: https://towardsdatascience.com/how-to-visualize-convolutiona... If you start with a big image, you get whatever "shape" the filter is generating, many times, but kind of small. If you start with a small image and keep rescaling, you get less of the shapes, each of them bigger.

That makes for subjectively cooler visuals, and that's pretty much it.

2_ I'm not entirely sure I get this one. Don't the filters apply to the three (RGB) channels simultaneously? Or do you mean a filter actually refers to the transformation for a single channel rather than the one to a whole pixel? Other than that, with regards to the ease of visualization, it's actually kind of hard to display all of the filters in a visually interesting way that isn't just "Here's the 240 filter images I made with this layer", so I had to make a trade-off between 'here are some of the cooler images' and 'I want each layer I tried to be represented in this'. I'm still sort of new to presenting visual information of this kind, so if you have any advice on how to display more of the images, I'm all ears.

Have a nice weekend!

p1esk · on May 30, 2020

1. Ok, makes sense, however this is something to keep in mind when we try to gain understanding of how NNs work. In this case, the actual features which excite the filters the most might be even more "texture-like" than what you produced. This could be important, if, say, we wanted to argue that NNs pay more attention to textures than to shapes.

2. Let me try to explain this better. If we want to optimize the inputs to the first layer, these inputs are actual RGB images. So what you're doing in line 35 is correct:

  input_img_data = np.random.random((1, 3, intermediate_dim[0], intermediate_dim[1]))

but in the next layer, the inputs have more channels than 3. So you should be optimizing images of the shape the layer expects (replacing 3 with the number of channels of that layer). If you do it like that, then the image you want to visualize - the image that excites the outputs of an intermediate layer the most - will have more channels than 3, and therefore it's not clear whether we should pick arbitrary triplets of channels to use as RGB channels, or if we should visualize each channel (each input feature map) individually, in which case it would have no color.

I guess my question should be: when you show optimized inputs to the second or third VGG layers where does the color information comes from? Also, you probably shouldn't call them "filters", because the filters in this context are the actual convolutional layer weight kernels, and are 3x3 in size (each filter).

strikingloo · on May 31, 2020

1_ I hadn't thought of that. To be honest, my main concern was generating the most aesthetically interesting pictures I could. However the "textures" you'd get from optimiing for the last layers would really be just repeating shapes, albeit maybe a bit smaller than you're seeing in my post. Kinda like zooming out.

2_ Oh ok, I get it now. But that's not what the program is doing. I don't generate a vector for the single layer I am maximizing, I generate an image and run the whole CNN from input to that layer, all through. Then I optimize for that layer's output.

Edit to add: As stated in the post, when I "maximize for a filter", the exact function I am maximizing is the average output for that filter over the whole image.

p1esk · on May 31, 2020

Oh interesting. Then I have more questions :)

1. If you try to optimize an image that would maximize all layers outputs, how would it look like?

2. After you optimized an image to maximize some layer outputs, how do those outputs look like?

3. If you try to do what I thought you were doing - optimizing not the RGB image, but the individual layer inputs (feature maps), how would those feature maps look like? You could either plot each individual feature map, or pick top 3 with the strongest signals and combine them as RGB channels.