Scott Aaronson (who is temporarily working for OpenAI) proposed a cryptographic scheme where the generated text can be watermarked without any decrease in quality.
It has most of the same problems you list, except it is much more robust against small changes to the text.
As of a few weeks ago, he mentioned that OpenAI had a working implementation and were discussing whether to start using it. I assume they'd tell people before they turn it on in prod, I see no advantage in secrecy.
This will catch only the laziest and stupidest cheaters. Crafting an argument is the hard part of an essay. A student can rewrite an essay, leaving not a single word in place, but still be guilty of plagiarism because the argument is the same. This is hard to prove even with the two documents side by side. It is harder still when the source could be any of a million documents.
Of course, many cheaters are that stupid and lazy. People still just copy and paste essays they found online.
It was already true before ChatGPT that if you are willing to rewrite an essay you can easily cheat without getting caught. Just find an essay online and rewrite it.
> As of a few weeks ago, he mentioned that OpenAI had a working implementation and were discussing whether to start using it. I assume they'd tell people before they turn it on in prod, I see no advantage in secrecy.
Watermarking has zero-to-negative-value for the user of the generation service, but value for users of the detection service that the common vendor of both services will sell, so the only reason to announce that watermarking is active is because you are ready to sell the detection service leveraging it. Otherwise, its just a disincentive to some users to use the generation service with no upside.
Sure, but that's true of lots of things OpenAI does. For example, they removed a bunch of functionality from their API for the new models, presumably for safety reasons.
It has most of the same problems you list, except it is much more robust against small changes to the text.
As of a few weeks ago, he mentioned that OpenAI had a working implementation and were discussing whether to start using it. I assume they'd tell people before they turn it on in prod, I see no advantage in secrecy.