I find it struggles to even refactor codebases that aren't that large. If you ha...

teaearlgraycold · 2025-08-07T21:31:10 1754602270

I haven’t used GPT5 yet, but even on a 1000 line code base I found Opus 4, o3, etc. to be very hit or miss. The trouble is I can’t seem to predict when these models will hit. So the misses cost time, reducing their overall utility.

rapind · 2025-08-07T23:15:38 1754608538

I'm exclusively using sonnet via claude-code on their max plan (opting to specify sonnet so that opus isn't used). I just wasn't pleased with the opus output, but maybe I just need to use it differently. I haven't bothered with 4.1 yet. Another thing I noticed is opus would eat up my caps super quick, whereas using sonnet exclusively I never hit a cap.

I'd really just love incremental improvements over sonnet. Increasing the context window on sonnet would be a game changer for me. After auto-compact the quality may fall off a cliff and I need to spend some time bringing it back up to speed.

When I need a bit more punch for more reasoning / architecture type evaluations, I have it talk to gemini pro via zen mcp and OpenRouter. I've been considering setting up a subagent for architecture / system design decisions that would use the latest opus to see if it's better than gemini pro (so far I have no complaints though).

mirkodrummer · 2025-08-07T23:40:20 1754610020

This, plus I really doubt we will ever "be there". Software engineering evolves over time and so far human engineers innovate in the field.