Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

User feedback should have been treated as radioactive in the first place.

When it was introduced, the question to ask wasn't "will it go wrong" - it was "how exactly" and "by how much". Reward hacking isn't exactly a new idea in ML - and we knew with certainty that it was applicable to human feedback for years too. Let alone a proxy preference model made to mimic the preferences of an average user based on that human feedback. I get that alignment is not solved, but this wasn't a novel, unexpected pitfall.

When the GPT-4o sycophancy debacle was first unfolding, the two things that came up in AI circles were "they trained on user feedback, the stupid fucks" and "no fucking way, even the guys at CharacterAI learned that lesson already".

Guess what. They trained on user feedback. They completely fried the AI by training it on user feedback. How the fuck that happened at OpenAI and not at Bob's Stupid Sexy Chatbots is anyone's guess.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: