I actually did this and while it follows the conversation quite well I think it’s really hard to make the argument that there was a decision early in the text what animal it thought of. That would only be possible if there were hidden embeddings in the output but that’s not possible after the output is converted into text. And I’m not sure how you’d prove that an animal was embedded in earlier layers since it’s pretty hard to inspect these neural networks.