Sonnet is literally lower on the aider benchmark you just linked. It's only the ...

theturtletalks · 2025-02-21T20:10:19 1740168619

Yes, but I use Cursor Composer Agent mode with Sonnet which is like Aider's architect mode where 1 LLM is instructing another one. Not to mention the new reasoning models can't use tool calling (except o3-mini which is not multi-modal).

KaoruAoiShiho · 2025-02-21T20:20:11 1740169211

Me too, cursor+sonnet is also my go to, I just didn't really understand what you were getting at by pointing out this benchmark. I guess it is significant that Sonnet is the actual line by line coder here. It is the best at that, and it's better than DeepSeek+any other combination and better than Any other reasoner+Sonnet.

theturtletalks · 2025-02-21T20:33:46 1740170026

Yes I've followed this benchmark for a while and before Deepseek + Sonnet Architect took the top spot, Sonnet was there alone followed by o1 and Gemini EXP. This is one of the few benchmarks where Sonnet is actually on top like my experience shows, other popular ones have 03-mini and DeepSeek r1 which fall short in my opinion.

refulgentis · 2025-02-21T20:05:48 1740168348

Let's steelman a bit: once you multiply out the edit accuracy versus completion accuracy, Sonnet, on its own, is within 5% of the very top one not using sonnet.