It is only generating based on training data. In mature code bases there is a massive amount of interconnected state that is not already present in any github repository. The new logic you'd want to add is likely something never done before. As other programmers have stated, it seems to be improving at generating useful boilerplate and making simple websites and such related to what's out there en masse on Github. But it can't make any meaningful changes in an extensively matured codebase. Even Claude Sonnet is absolutely hopeless at this. And the requirement before the codebase is "matured" is not very high.
> The new logic you'd want to add is likely something never done before.
99% of software development jobs are not as groundbreaking as this. It’s mostly companies doing exactly what their competitors are doing. Very few places are actually doing things that an LLM model has truly never seen crawling through GutHub. Even new innovative products generally boil down to the same database fetches and CRUD glue and JSON parsing and front end form filling code.
Groundbreakingness is different from the type of novelty that's relevant to an LLM. The script I was trying to write yesterday wasn't groundbreaking at all: it just needed to pull some code from a remote repository, edit a specific file to add a hash, then run a command. But it had to do that _within our custom build system_, and there's few examples of that, so our coding assistant couldn't figure out how to do it.
> Even new innovative products generally boil down to the same database fetches and CRUD glue and JSON parsing and front end form filling code.
The simplest version of that is some CGI code a PHP script. Which everyone should be writing according to your description. But why so many books have been written to be able to do this seemingly simple task? So many frameworks, so many patterns, so many methodologies....
This is not the case anymore, current SOTA CoT models are not just parroting stuff from the training data. And as of today they are not even trained exclusively on publicly (and not so publicly) available stuff, but they massively use synthetic data which the model itself generated or distilled data from other smarter models.
I'm using and I know plenty of people using AI in current "mature" codebases with great results, this doesn't mean it does the work while you sip a coffee (yet)
*NOTE: my evidence for this is that o3 could not break ARC AGI by parroting, because it's a banchmark made exactly for this reason. Not a coding banchmark per se, but still transposable imo.
Try Devin or OpenHands. OpenHands isn't quite ready for production, but it's informative on where things are going and to watch the LLM go off and "do stuff", kinda on its own, from my prompt (while I drink coffee).
It is only generating based on training data. In mature code bases there is a massive amount of interconnected state that is not already present in any github repository. The new logic you'd want to add is likely something never done before. As other programmers have stated, it seems to be improving at generating useful boilerplate and making simple websites and such related to what's out there en masse on Github. But it can't make any meaningful changes in an extensively matured codebase. Even Claude Sonnet is absolutely hopeless at this. And the requirement before the codebase is "matured" is not very high.