Hacker News new | past | comments | ask | show | jobs | submit login

For coding evals, it seems like unless you are super careful, they can be polluted by the training data.

Are there standard ways to avoid that type of score inflation?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: