I keep hearing about Claude's impressive coding skills (compared to its benches)...

zurfer · 2025-02-18T07:45:02 1739864702

My pet theory is that Sonnet was trained really cleverly on a lot code that resembles real world cases.

In our small and humble internal evals it regularly beats any other frontier models on some tasks. The shape of capability is really not intuitive/1 dimensional

saberience · 2025-02-18T13:31:01 1739885461

I spend four to five hours coding per day and subscribe to every major LLM and Claude is still by far the best for me personally and my co workers.

phillipcarter · 2025-02-18T13:58:00 1739887080

What are you using it for in general? IME the reason Claude pulls out ahead is that when you use it in a larger existing codebase, it keeps everything "in the style" of that codebase and doesn't veer off into weird territory like all the others.

davidee · 2025-02-18T18:47:48 1739904468

My experience as well. Working in Scala primarily, it tends to be very good at following the constructs of the project.

Using a specific Monad-transformer regularly? It'll use that pattern, and often very well, handling all the wrapping and unwrapping needed to move data types about (at least well enough that the odd case it misses some wrapping/unwrapping is easy to spot and manage).

A custom GPT or GEM with the same source files, and those models regularly fail to maintain style and context, often suggesting solutions that might be fine in isolation but make little sense in the context of a larger codebase. It's almost like they never reliably refer to the code included in the project/GPT/GEM.

Claude on the other hand is so consistent about referring to existing artifacts that, as you approach the limit of project size (which is admittedly small) you can use up your entire 5-hour block of credits with just a few back-and-forths.

anti-soyboy · 2025-02-18T10:58:57 1739876337

Lol no company is making money using 4o, however thanks to claude sonnet programms like Cursor are usable lol. 4o agents suck, just try it instead of talking

3abiton · 2025-02-18T13:45:51 1739886351

I did try it yet for more than a week still 4o still pretty much better in terms of python coding and architecture/documentation design

throwaway314155 · 2025-02-18T20:53:48 1739912028

That doesn't match my experience at all.

Alifatisk · 2025-02-18T11:03:28 1739876608

I can honestly tell you from my experience that Sonnet 3.5s coding skills did things no other models did right last year during the summer, this was even though the benchmarks showed that it wasn't the best performing at coding tasks.

Mekoloto · 2025-02-19T11:39:48 1739965188

I prototyped on the weekend and started out with 4o because i had a subscription running.

After an hour and a half assed working result, i put everything into claude and it made it significant better on the first try and i had not a subscription active with claude.

3abiton · 2025-02-20T17:14:33 1740071673

Really interesting, I used it today still lots of issues. Maybe my python notebook is not approach is too complicated for Sonnet? Couldn't be able to fix a custom complex seaborn plot. 4o failed too. o3-mini-high managed to do it really well on the other hand.

bamboozled · 2025-02-18T15:50:19 1739893819

There is honestly no rhyme of reason to all these opinions, someone was telling me the other day that Claude is for sure the best, I'd say multiple people actually.

I find it concerning there is no real accurate benchmarks for this stuff that we can all agree on.