I've collated a few prompt hardening techniques here: https://www.reddit.com/r/O...

I've collated a few prompt hardening techniques here: https://www.reddit.com/r/OpenAI/comments/1210402/prompt_hard...

In my testing, the trick of making sure untrusted input is not the last thing the user sees was pretty effective.

I agree with Simon that (a) no technique will stop all attacks as long as input can't be tagged as trusted or untrusted, and (b) that we should take these more seriously.

I hope that OpenAI in particular will extend it's chat completions API (and the underlying model) to make it possible to tell GPT in a secure way what to trust and what to consider less trustworthy.