Hacker News new | past | comments | ask | show | jobs | submit login

When I gave the same prompt to both, Sonnet 3.5 immediately gave me functional code, while GPT-4o sometimes failed after 4-5 attempts, at which point I usually gave up. Sonnet 3.5 is spectacular at debugging its output, while GPT-4o will keep hallucinating and giving me the same buggy code.

A concrete example: I was doing shader programming with Sonnet 3.5 and ran into a visual bug. Sonnet asked me to add four debugging modes, cycle through each one, and describe what I saw for each one. With one more prompt, it resolved the issue. In my experience, GPT-4o has never bothered proposing debug modes and just produced more buggy code.

For non-trivial coding, Sonnet 3.5 was miles above anything else, and I didn't even have to try hard.




Why can't you just debug this yourself? I don't think completely relying on LLMs for something like this will do you any good in the long run.


Well... why ask LLMs to do anything for us? :) Sure, I could debug it myself, but the whole point is to have a second brain fix the issue so that I can focus on the next feature.

If you're curious, I knew nothing about shader programming when I first played around. In that specific experiment, I wanted to see how far I could push Claude to implement shaders and how capable it is of correcting itself. In the end, I got a pretty nice dynamic lighting system with some cool features, such as cast shadows, culling, multiple shader passes, etc. Asking questions along the way taught me many things about computer graphics, which I later checked on different sources, it was like a tailored-made tutorial where I was "working" on exactly the kind of project I wanted.


Why not? It depends on how you use these systems. Let the LLM debug this for me, give me a nice explanation for what's happening and what solution paths could be and then it's on me to evaluate and make the right decision there. Don't rely blindly on these systems, in the same vein as you shouldn't rely blindly on some solution found while using Google.


A reasonable answer is that this is our future one way or another: the complexity of programs is exceeding the ability of humans to properly manage them, and cybernetic augmentation of the process is the way forward.

i.e. there would be a lot of value if an AI could maintain a detailed understanding of say, the Linux kernel code base, when someone is writing a driver and actively prompt about possible misuses, bugs or implementation misunderstandings.


That's a different question though. The person you replied to was asked to explain why they think Sonnet 3.5 works well/better compared to GPT-4o. To which they gave a good answer of Sonnet actually taking context and new information better into account when following up.

They might be able to debug it themselves, maybe they should be able to debug it themselves. But I feel like that is a completely different conversation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: