Hacker News new | past | comments | ask | show | jobs | submit login

There is quite an excitement about how someone has hacked the language model to output what was supposed to be a non-public set of rules apparently. How do people know if this is indeed the secret set of rules, not the list that the model was scripted to return in response to a request (perhaps, a bit elaborate) for the list of rules?



We don't know for sure - but we have seen this same situation play out many times for many other systems. It's far more likely that this attack worked than that this particulate team have solved a problem that has defeated basically everyone else. https://news.ycombinator.com/item?id=35925239




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: