Hacker Newsnew | past | comments | ask | show | jobs | submit | ForzaAaRon's commentslogin

Fascinating read. Interesting how opus performs worse compared to sonnet


Quite interesting actually. not sure why, I assume it just overthinks. What suprised me even more is how bad o4-mini performed, after taking up hours of evaluation time and more credits than all other llms combined. More thinking != better (integration) coding performance


It’s in the backlog of the list of things we wanna do for sure. Would that be something you wanna try?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: