That's basically fuzz testing. I absolutely agree, put me in a room with someone and a magic reset button that resets them and their memory (but I preserve mine), and enough time, I can probably get just about anyone to do just about anything within hard value limits (eg embarrassing but not destructive).
However humans have a "fail2ban" of sorts by getting irritated at ridiculous requests. Alternatively, peer pressure is a very strong (de)motivator. People are far more shy with an authority figure watching.
I suspect OpenAI will implement some sort of "hall monitor" system which steps in if the conversation strays too far from "social norms".
However humans have a "fail2ban" of sorts by getting irritated at ridiculous requests. Alternatively, peer pressure is a very strong (de)motivator. People are far more shy with an authority figure watching.
I suspect OpenAI will implement some sort of "hall monitor" system which steps in if the conversation strays too far from "social norms".