Hacker News new | past | comments | ask | show | jobs | submit login

Well, that's N=1. But we have seen that it's sometimes possible to bypass that kind of filter with clever prompt engineering. And because these things are black boxes, it doesn't seem possible to rigorously prove "unjailbreakability"



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: