Both GPT-3.5 and GPT-4 versions of ChatGPT are limited to 4k tokens, even though GPT-4 is capable of 32k.
This leads me to believe that part of the reason for some of the mediocre results OP saw was because they hit the token limit and ChatGPT started "forgetting" earlier parts of the conversation.
No, I was explicitly watching for this. In one of the sessions where we asked it to generate Kłeti sentences and the conversation passed the token limit it started inserting characters like ı (the Turkish dotless i). A week earlier I was playing with interpreting go positions, and at some point the model switched to talking about Chess (a bit less subtle than inserting unusual characters).
GPT-4 allows you to use 8k of context in their current beta, if you're using the chat api directly. It will be interesting ( and probably expensive, lol ) when they open it to a full 32k.
I'm really looking forward to being able to use a personalized LoRa on top of a GPT-4+ class model. I want to be able to train on all of may writing over the past few decades and interrogate the history of my ideas, and I think this would be tremendously valuable for writers of all kinds. Heck, think of the value of training (with their blessing) on something like /r/AskHistorians, or other deep-dive, high quality fora.
This leads me to believe that part of the reason for some of the mediocre results OP saw was because they hit the token limit and ChatGPT started "forgetting" earlier parts of the conversation.