Fair enough. You would need to use an open model or work at OpenAI. I assume this work could be used on the llama models - although I’m not aware of anyone has found these glitchy phrases for those models yet.
> You would need to use an open model or work at OpenAI.
The point of this post that we are commenting under is that they made this association public, at least in the neuron->token direction. I was thinking some hacker (like on hacker news) might be able to make something that can reverse it to the token->neuron direction using the public data so we could see the petertodd associated neurons. https://openaipublic.blob.core.windows.net/neuron-explainer/...