Most of these attacks succeed because app developers either don’t trust role bou...

spacebanana7 · 2025-08-18T15:22:14 1755530534

> They assume the model can’t reliably separate trusted instructions (system/developer rules) from untrusted ones (user or retrieved data)

No current model can reliably do this.

jonfw · 2025-08-18T13:38:32 1755524312

> If role separation were treated seriously -- and seen as a vital and winnable benchmark, many prompt injection vectors would collapse...

I think it will get harder and harder to do prompt injection over time as techniques to seperate user from system input mature and as models are trained on this strategy.

That being said, prompt injection attacks will also mature, and I don't think that the architecture of an LLM will allow us to eliminate the category of attack. All that we can do is mitigate