> Most bug reports you get in the wild are more along the lines of
Since this fixes 12% of the bugs, the authors of the paper probably agree with you that 100-12= 88%, and hence "most bugs" don't have nicely written bug reports.
Except this is automated, so you could get multiples orders of magnitude more bug filled, so you need to have a very low false positive ratio to avoid being overwhelmed by automatically generated crap (which is basically spam).
You'd want three LLMs, one to create the bugs, one to report it, one to fix it. I joke of course but on the other hand this is potentially a worthwhile architecture from a self-training perspective - a bug-creating LLM means your training set size is as big as you want it +/- GAN features.
I suppose I should nail down my point. No one would ever write a big report like this. A bug generally has an unknown cause. Once you found the cause of the bug, you’d fix it. Nowadays, you could just cut and paste the problem into ChatGPT and get the answer right then. So why would anyone ever log this bug? All this demo proves that they automated a process that didn’t need automation.
To be fair, sometimes meticulous users investigate the bugs and write down logical chains explaining the causes and even offer a solution at the end (which they can't apply for the lack of commit access, for instance).
The proposed solution isn't always right, of course, but it would be incorrect to say that no bug reports come with a diagnosed cause. But that's exactly where a conscious reviewer is most needed, I believe.
I sometimes write a detailed bug reports but not a PR when there are different ways to address the problem (and all look bad to me) or the fix can introduce new problems. But I would expect LLM to ignore tradeoffs and choose an option which not necessarily the best for the same reason I hesitate - luck of understanding of this specific project.
Since this fixes 12% of the bugs, the authors of the paper probably agree with you that 100-12= 88%, and hence "most bugs" don't have nicely written bug reports.