Hacker News new | past | comments | ask | show | jobs | submit | samanthasu's comments login

Lilian's latest blog about the reward hacking in reinforcement learning. It's more about the practical solutions research instead of how to define reward hacking.

That is excellent visualization!

A good error report is not only about how it gets constructed, but what is more important, to tell what human can understand from its cause and trace. In this example, we analyzed and showed how to design stacked errors and what should be considered in this process.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: