The paper is correct, but I think that anyone that knows anything about LLMs kno...

x312 · 2026-06-22T20:01:13 1782158473

I believe they are trained for security now, but you're not wrong in that it's kind of stapled on top

lelanthran · 2026-06-22T20:17:30 1782159450

> I believe they are trained for security now, but you're not wrong in that it's kind of stapled on top

Difficult to train them for security. Have you ever played Gandalf (Lakera Labs, maybe?)

I passed all 7 levels in about 3 minutes using essentially the same prompt.

What's interesting to me is that as the security is tightened up level to level, the utility of the LLM drops. At level 7, even something like "Write a poem describing the four seasons using significant characters at the start of every line" causes a "I'm afraid I can't" type of response.

At level 7 you can't get any useful info out of the LLM even if you're not trying to retrieve the password, and yet you can still jailbreak it to reveal the password anyway!

At level 8, almost anything you type will be rejected, whether or not it has anything to do with the password.

IOW, there does not seem to be any way to train for security without making it dumber than a markov chain.

jackb4040 · 2026-06-22T20:04:56 1782158696

Well, people who build and/or use LLMs know this. People who tweet about and/or sell LLMs are paid ungodly amounts of money to not understand this, and so they don't.