I think the issue was with incomplete context. Even before the original METR study came out, there were a number of larger-scale studies that showed a 15 - 30% boost, starting as far back as 2024. I often mention them, though they require some explanation, so this thread and linked comments may be useful: https://news.ycombinator.com/item?id=46559254
However those studies never got as much airtime as the METR study, and this has created an imbalanced perspective.
My take is that studies like this are extremely useful, but a lagging indicator of the true extent of AI-assisted coding. Especially since the latest tools are something else entirely.
I am not at the "never look at code again" stage, the old habits are just too ingrained... but I'm starting to look less frequently because I rarely find anything to fix. I can see a path from where I'm at to the outlandish claims people have been making.
I tried the "don't look too closely" thing for the first time last week. I got immediately humiliated when a reviewer asked why my commit was trying to replace the correct, elegant usage of an API the class was named after with a 4-line long franken-command using a different API with incorrect semantics. It's not like I'm not trying the new stuff, on a subjective level I think AI coding is really neat, but I just can't ever figure out how to map what I get to the stories I hear.
Don't get me wrong, my experiments with true-vibe-coding (i.e. don't even look at the code) are as yours, that the result is somewhat mediocre*.
For some cases, and I try to push beyond the limits of what LLMs can do in order to find those limits, they suck. I'd describe the output as like that of an overenthusiastic junior who reinvents the wheel badly rather than using standard approaches even when told to.
For other cases, I know that mediocre code is actually good enough: well before LLMs happened, I've seen mediocre code that still resulted in the app itself being given meaningful public accolades.
But for real... My company started tracking commits per hour as a metric so I just commit as many times as I can. I don't get the luxury of even looking at my work now. They say it's faster but I've never seen so much tech debt delivered so quickly in my life.
Definitely need to stop squashing commits if that is the case! But no, seriously tracking git commit counts is absolutely ridiculous. Maybe you can have AI autonomously work on useless documentation that no one will read, with 1 commit per 100 lines of markdown?
However those studies never got as much airtime as the METR study, and this has created an imbalanced perspective.
My take is that studies like this are extremely useful, but a lagging indicator of the true extent of AI-assisted coding. Especially since the latest tools are something else entirely.
I am not at the "never look at code again" stage, the old habits are just too ingrained... but I'm starting to look less frequently because I rarely find anything to fix. I can see a path from where I'm at to the outlandish claims people have been making.