Disclaimer: I work at a company who sells coding AI (among many other things).
We use it internally and the technical debt is an enormous threat that IMO hasn't been properly gauged.
It's very very useful to carpet bomb code with APIs and patterns you're not familiar with, but it also leads to insane amounts of code duplication and unwieldy boilerplate if you're not careful, because:
1. One of the two big bias of the models is the fact that the training data is StackOverflow-type training data, which are examples and don't take context and constraints into account.
2. The other is the existing codebase, and it tends to copy/repeat things instead of suggesting you to refactor.
The first is mitigated by, well, doing your job and reviewing/editing what the LLM spat out.
The second can only be mitigated once diffs/commit history become part of the training data, and that's a much harder dataset to handle and tag, as some changes are good (refactorings) but other might be not (bugs that get corrected in subsequent commits) and no clear distinction as commit messages are effectively lies (nobody ever writes: bug introduced).
Not only that, merges/rebases/squashes alter/remove/add spurious meanings to the history, making everything blurrier.
Consider myself very fortunate to have lived long enough that I'm reading a thread where the subject is the quality of the code generated by software. Decades of keeping that lollypop ready to be given, and now look where we are!
We use it internally and the technical debt is an enormous threat that IMO hasn't been properly gauged.
It's very very useful to carpet bomb code with APIs and patterns you're not familiar with, but it also leads to insane amounts of code duplication and unwieldy boilerplate if you're not careful, because:
1. One of the two big bias of the models is the fact that the training data is StackOverflow-type training data, which are examples and don't take context and constraints into account.
2. The other is the existing codebase, and it tends to copy/repeat things instead of suggesting you to refactor.
The first is mitigated by, well, doing your job and reviewing/editing what the LLM spat out.
The second can only be mitigated once diffs/commit history become part of the training data, and that's a much harder dataset to handle and tag, as some changes are good (refactorings) but other might be not (bugs that get corrected in subsequent commits) and no clear distinction as commit messages are effectively lies (nobody ever writes: bug introduced).
Not only that, merges/rebases/squashes alter/remove/add spurious meanings to the history, making everything blurrier.