Hacker News new | past | comments | ask | show | jobs | submit login

It must. IIRC Anthropic has a 'red team' of sorts. I wonder what they can do with this technique? What are the limits of "evil" of these current models?



It's good if you really don't want your LLM to mention specific things, which I can see some groups wanting. Having it mention some things even when they're not related could be good for integrated ads in a chatbot, which sounds evil in that it would be really annoying. Your friend's account gets hacked, a chatbot LLM is finetuned on their message history, it's able to carry on a conversation while slipping in a mention of Joe's Hot Dogs every now and then.

It probably also could help with consistency when trying to do LangChain-type stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: