We Found an Neuron in GPT-2

cottenio · on Feb 18, 2023

This is one of the first articles I’ve read with a decent attack on reverse engineering the black boxes of neural networks. I particularly appreciate the use of corrupted prompts for isolating behaviors.

danjc · on Feb 18, 2023

Came here to say something similar. It seems to me that being able to determine how specific neurons are affecting outputs will be crucial to future optimization.