Lilian's latest blog about the reward hacking in reinforcement learning. It's more about the practical solutions research instead of how to define reward hacking.
A good error report is not only about how it gets constructed, but what is more important, to tell what human can understand from its cause and trace.
In this example, we analyzed and showed how to design stacked errors and what should be considered in this process.