Hacker News new | past | comments | ask | show | jobs | submit login

Sonnet is literally lower on the aider benchmark you just linked. It's only the top with Deepseek as architect, otherwise it's lower than many others.





Yes, but I use Cursor Composer Agent mode with Sonnet which is like Aider's architect mode where 1 LLM is instructing another one. Not to mention the new reasoning models can't use tool calling (except o3-mini which is not multi-modal).

Me too, cursor+sonnet is also my go to, I just didn't really understand what you were getting at by pointing out this benchmark. I guess it is significant that Sonnet is the actual line by line coder here. It is the best at that, and it's better than DeepSeek+any other combination and better than Any other reasoner+Sonnet.

Yes I've followed this benchmark for a while and before Deepseek + Sonnet Architect took the top spot, Sonnet was there alone followed by o1 and Gemini EXP. This is one of the few benchmarks where Sonnet is actually on top like my experience shows, other popular ones have 03-mini and DeepSeek r1 which fall short in my opinion.

Let's steelman a bit: once you multiply out the edit accuracy versus completion accuracy, Sonnet, on its own, is within 5% of the very top one not using sonnet.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: