Hacker News new | past | comments | ask | show | jobs | submit login

Exactly. The previous version of o1 did actually worse in the coding benchmarks, so I would expect it to be worse in real life scenarios. The new version released a few days ago on the other hand is better in the benchmarks, so it would seem strange that someone used it and is saying that it’s worse than Claude.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: