samanthasu's comments

samanthasu · 2024-12-19T12:51:26 1734612686

Lilian's latest blog about the reward hacking in reinforcement learning. It's more about the practical solutions research instead of how to define reward hacking.

samanthasu · 2024-12-19T12:43:01 1734612181

That is excellent visualization!

samanthasu · 2024-12-19T01:47:05 1734572825

A good error report is not only about how it gets constructed, but what is more important, to tell what human can understand from its cause and trace. In this example, we analyzed and showed how to design stacked errors and what should be considered in this process.