Scott Aaronson (who is temporarily working for OpenAI) proposed a cryptographic ...

sieongioetnio · on May 21, 2023

This will catch only the laziest and stupidest cheaters. Crafting an argument is the hard part of an essay. A student can rewrite an essay, leaving not a single word in place, but still be guilty of plagiarism because the argument is the same. This is hard to prove even with the two documents side by side. It is harder still when the source could be any of a million documents.

Of course, many cheaters are that stupid and lazy. People still just copy and paste essays they found online.

sebzim4500 · on May 21, 2023

It was already true before ChatGPT that if you are willing to rewrite an essay you can easily cheat without getting caught. Just find an essay online and rewrite it.

dragonwriter · on May 21, 2023

> As of a few weeks ago, he mentioned that OpenAI had a working implementation and were discussing whether to start using it. I assume they'd tell people before they turn it on in prod, I see no advantage in secrecy.

Watermarking has zero-to-negative-value for the user of the generation service, but value for users of the detection service that the common vendor of both services will sell, so the only reason to announce that watermarking is active is because you are ready to sell the detection service leveraging it. Otherwise, its just a disincentive to some users to use the generation service with no upside.

sebzim4500 · on May 21, 2023

Sure, but that's true of lots of things OpenAI does. For example, they removed a bunch of functionality from their API for the new models, presumably for safety reasons.

rjmunro · on May 24, 2023

There's an interesting scheme described here:

https://www.youtube.com/watch?v=XZJc1p6RE78

Essentially by biasing the LLM's token choices slightly you can infer with high probability later that the text was generated by it.