The model is not stateful, but you can emulate state (certainly with GPT-3, but ...

The model is not stateful, but you can emulate state (certainly with GPT-3, but also with other language models) by simply feeding back earlier output.

For example, to simulate a chatbot, you start with a prompt. You then successively feed longer and longer chunks of the full chat back to the model, taking incrementally generated lines as the new AI's reply.

This is essentially how some of the 'use GPT-2 as a chatbot' front ends work in the world. This is also extended to make things like AI dungeon work: you can force the model to keep context within its attention by providing a good summary in the prompt.

To speculate a bit on why this seems to work, these models are massive and have read millions of texts in their corpus. Instead of 'retraining' on text which the model probably has already seen, the prompt is nudging the model to identify where in its on weights its encoded the knowledge before.