
Interpreting GPT: The Logit Lens - optimalsolver
https://www.greaterwrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
======
datameta
"Apparently, the model does not “keep the inputs around” for a while and
gradually process them into some intermediate representation, then into a
prediction. Instead, the inputs are immediately converted to a very different
representation, which is smoothly refined into the final prediction."

I wonder what would be the efficacy of having a baseline to compare against.
Perhaps it could curtail some of the more surrealistic sounding outputs if
there was an exploration of different branches on the result tree toward the
final few layers. Whether this would be an efficient use of compute and memory
is another question.

