Sure, that's if human moderators see it before the AI, in which case, why have a...

mentos · 2025-01-12T23:59:03 1736726343

To moderate the majority of the community that will not be attempting prompt injections.

What meaningful vulnerabilities are there if the post can only be accepted/rejected/flaggedForHumanReview?

satvikpendem · 2025-01-13T01:14:49 1736730889

That's what you tell the AI to do, who knows what other systems it has access to? For example, where is it writing the flags for these posts? Can it access the file system and do something programmatically? Et cetera, et cetera.

mentos · 2025-01-13T04:11:19 1736741479

The same way OpenAI offers its service to hundreds of millions of users without compromising any other systems it’s running on.

satvikpendem · 2025-01-13T04:14:38 1736741678

OpenAI doesn't allow write access to any file system. If you are recording posts to be reviewed, then you must necessarily store that information somewhere, at which point you will be allowing the AI to access some sort of data storage system, whether it be a file system or a database.