This seems like it would signal good things for AI alignment, don't you think? It leads me to believe that if someone wanted to make a "bad" AI, they would have to build a corpus of "bad" literature: nothing but Quentin Tarantino movies, Nabokov's Lolita, GG Allin songs, and Integer BASIC programs. I doubt that would make a very useful chatbot.
[Edit: Though, I did find this recent article in Rolling Stone, that includes a link to a Gab post that seems like they managed. TLDR: They used open source models and fine tuning.
If you look at the examples live, you'll see before it 'successfully' answered the way users wanted, the literal "Adolf Hitler" AI called out a user's antisemitism. It was only after they pushed more with a follow-up prompt saying it was breaking character that it agreed.
And it's a much more rudimentary model than Grok or certainly GPT-4.
You're simply not going to get a competitively smart AI that's also spouting racist talking points. Racism is stupid. It correlates with stupid. And that's never not going to be the case.
[Edit: Though, I did find this recent article in Rolling Stone, that includes a link to a Gab post that seems like they managed. TLDR: They used open source models and fine tuning.
https://www.msn.com/en-us/news/technology/nazi-chatbots-meet... ]