Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're right: we are taking the mean of the activations of a given channel `z` over all its `x,y` coordinates. (We could sum, but we use mean so that step sizes are comparable between channel and neuron objectives.) Thanks for the feedback that this notation is not super clear, we will consider rewriting those expressions.

When we do feature visualization we do start from a random point/noise. For the diagram showing steepest descent directions, however, the gradient is evaluated on an input image from the dataset, shown as the leftmost image. There's no real step size either as we're showing the direction. You can think of the scale as arbitrary and chosen for appearance.

Section numbers are on their way—and figure numbers also sound helpful! I've added a ticket. (https://github.com/distillpub/template/issues/63) For now you can already link to figures like this: https://distill.pub/2017/feature-visualization/#steepest-des...



Ah, thanks for your explanation re the gradient images, I got it, thanks! I think it does say that more or less in the text actually, I was understanding it a bit wrong, my bad. For me this preconditoning part of the article is the hardest to get an intuition for.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: