The problem is that we keep using RLHF and system prompts to "tell" these system...

Drakim · 2025-06-05T12:23:26 1749126206

That has it's own unique problems:

https://en.wikipedia.org/wiki/Waluigi_effect

Wowfunhappy · 2025-06-07T16:06:26 1749312386

I don't see the relation. Why would the Waluigi effect get worse if we don't tell the AI its an AI?

Drakim · 2025-06-07T18:05:47 1749319547

Because it's the truth. If you tell the AI that it's actually a human librarian, it might ask for a raise, or days off. If you tell it to search for something, it might insist that it needs a computer to do that. There will inherently be a information mismatch between reality and your input if the AI is operating on falsehoods.