Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Claude Fable 5 jailbroken to bypass Anthropic's new safety guardrails (twitter.com/elder_plinius)
8 points by bukati 5 days ago | hide | past | favorite | 1 comment
 help



Just saw Pliny (@elder_plinius) drop this. He managed to jailbreak it pretty effectively using a mix of tricks: breaking down bad requests into harmless pieces and reassembling them, narrative/academic framing, long context shenanigans, weird text transforms, and out-of-distribution tokens. Pretty interesting look at how well (or not) these new output-side guardrails actually hold up against a determined multi-step attack.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: