Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We don't actually know if it is SOTA, the previous SOTA solution also got around the same on the evaluation set.


Yeah and GPT4o was potentially trained on this test set and if the tried to hold it out it was still likely trained on discussions of the problems.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: