The LLM is trained by measuring its error compared to the training data. It is l...

ben_w · 2024-07-19T21:43:09 1721425389

GANs do that, I don't think LLMs do. I think LLMs are mostly trained on "how do I recon a human would rate this answer?", or at least the default ChatGPT models are and that's the topic at the root of this thread. That's allowed to be a different distribution to the source material.

Observable: ChatGPT quite often used to just outright says "As a large language model trained by OpenAI…", which is a dead giveaway.

sebastiennight · 2024-07-20T14:47:24 1721486844

This is the result of RLHF (which is fine-tuning to make the output more palatable), but this is not what training is about.

The actual training process makes the model output be the likeliest output, and the introduction phrase you quoted would not come out of this process if there was no RLHF. See GPT3 (text-davinci-003 via API) which didn't have RLHF and would not say this, vs. ChatGPT which is fine-tuned for human preferences and thus will output such giveaways.

dartos · 2024-07-21T01:09:09 1721524149

And then you can train a new detector.

I see no reason to believe it wouldn’t be a pendulum situation.

That’s how GANs work, after all.