Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you carefully read the agent's output you'll see why. It adds layers upon layers of workarounds and defences that hide serious problems, until the codebase reaches a point where the agent can no longer understand it and work with it. All the tests pass right up until the moment when adding a feature or fixing a bug causes another bug, and then nothing and no one can save the codebase anymore.
 help



Maybe a year ago? Right now the LLMs I mainly use (GPT5.5, Opus 4.7) will intuit exactly what I need from my brief specs and universally go above-and-beyond in creating code that is not only extremely high-quality, but catches a ton of the gotchas I would have stumbled on, in advance.

Just a minute ago 5.5 looked at some human-written code of mine from last year and while it was making the changes I asked for it determined the existing code was too brittle (it was) and rewrote it better. It didn't mention this in its summary at the end, I only know because I often watch the thinking output as it goes past before it hides it all behind a pop-open.


Interesting that we’ve have such different experiences. I was working with both those models today and on several occasions it proposed some pretty poor solutions.

I also find I need to run an llm code review or two against any code it produces to even get to the point where’s it’s ready for human review.

In any case they served as an extremely valuable tool.


I use GPT 5.5. Sometimes it does what you say. It certainly finds silly mistakes in my code better than I could. But frequently enough it makes catastrophic architectural mistakes in its own code.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: