So you have an AI refactor AI generated code? What am I missing here, if AI is the cause of the tech debt because it doesn't write great code, won't you just end up with more tech debt if you ask AI to refactor it?
If a human produces tech debt, do you think a human can't refactor?
Most of the time a human works over code multiple times, and still produces tech debt.
Give an AI agent enough time, by prompting it multiple times, and explicit instructions to look for and address tech debt of various forms, and it will.
A human has taste. They learn over time from codebase patterns and develop a sense of when an abstraction can be reused, improved or refactored. Agents often generate repeated code because the original file wasn't added to the context, and it's up to a human reviewer to recognise this.
In my experience, an agent will rarely recognise a common pattern and lift it into a new abstraction. It requires a human with taste and experience to do it. For example, an agent will happily add a big amount if branches in different places of the codebase where a strategy pattern or enum would be better (depending on the language).
If you have a working prompt or harness that ameliorates this, I'd be glad to see it.
Yeah I must be missing something again. Comparing human to AI here seems to be fundamentally wrong. A human will learn over time and improve their mental model of a problem and ability to code. An AI agent for the most part is fixed by its model. I just don't see how pointing an agent at AI generated code to refactor without direct human guidance results in better code.
Maybe you can describe what the various forms of tech debt are that you are talking about?
> Yeah I must be missing something again. Comparing human to AI here seems to be fundamentally wrong. A human will learn over time and improve their mental model of a problem and ability to code. An AI agent for the most part is fixed by its model. I just don't see how pointing an agent at AI generated code to refactor without direct human guidance results in better code.
There is no need to improve their mental model of a problem and ability to code to recognise the refactoring opportunities that already exists in the code. It only takes a sufficient skill, and effort invested on refactoring. The way to get a model to invest that effort is to ask it. As many times as you're willing to.
> Maybe you can describe what the various forms of tech debt are that you are talking about?
Any. Whether or not you need to prompt much to address it depends on consistency. In general I have a simple agent whose instructions are just to look for opportunities to refactor, and do one targeted refactor per run. All the frontier models knows well enough what good looks like that it is unnecessary to give it more than that.
The best way of convincing yourself of this, is to try it. Ask Claude Code or Codex to "Explore the code base and create a plan for one concrete refactor that improves the quality of the code. The plan should include specific steps, as well as a test plan." Repeat as many times as you care to, or if in Claude Code, run /agents and tell Claude Code you want it to create an agent to do that. Then tell it to invoke it however many times you want to try.
Well, could you define what reasoning actually means? What would an AI need to do to be considered capable of reasoning? What is the core difference between what we do that is considered reasoning verse what AI currently does that is not considered reasoning?
To be clear, I am not making a statement as to whether AI reasons or not. Its just slippery to say something isn't or can't do X when we can't really define X. Perhaps if we can put it down as an outcome rather than an, in my opinion, currently impossible to accurately define characteristic of a thing.
In many examples, LLMs betray the fact that they are not reasoning, because when provided with problems that can be solved with the ability to reason, they fail.
Even in this discussion someone provided an example of coming up with board game rules. LLMs found all board game rules valid, because they looked and sounded like board game rules. Even when they were not.
In short, You can learn a subject, you can make a mental model of it, you can play with it, and you can rotate or infer new things about it.
LLMs are more analogous to actors, who have learnt a stupendous amount of lines, and know how those lines work.
They are, by definition, models of language.
IF you want a better version - GENAI needs to be able to generate working voxels of hands and 3D objects just from images.
I don’t believe the board game rules example. I think this would be a piece of cake for an llm. I’m happy to be proven wrong here if you share an example.
This is an interesting idea, but do you have an example of you having done this or is it pure speculation as to what would work? My worry would be that a complex codebase ported over would have a heap of subtle bugs littered throughout it and no one who really understands it.
It's hard to tell what your blog posts are actually about because the titles are so cryptic. May I ask if you find any benefit from naming posts in way that makes it difficult to know what you are about to write about? I'm guessing it's purely creative which is totally fine.
reply