You can't compare the accuracy of speech recognition to LLM task completion rate...

HarHarVeryFunny · on March 13, 2024

Sure, and no doubt people paying for speech recognition 25 years ago were finding uses for it too. It depends on your use case.

A 13% success rate is both wildly impressive and also WAY below the level where I would personally find something like this useful. I can't even see reaching for a tool that I knew would fail 90% of the time, unless I was desperate and out of ideas.

falcor84 · on March 14, 2024

I disagree. I think about this a bit as having a developer intern, on whom I can't rely to take much of a workload, and definitely nothing on the critical path, but I could say to them "Take a look at these particular well-defined tasks on the backlog and see which ones you could make some progress on" - I feel there's good value in that.

And the nice thing about an AI here is that I think it will actually find a different subset of these tasks to be easy than a human would.

HarHarVeryFunny · on March 14, 2024

Yeah, but a developer intern already has human-level AGI to support the on-the-job developer training you are going to help give them. Any LLM available today, or probably in next 5-10 years for that matter, has neither AGI nor the ability to learn on the job.

My experience of working with interns, or low-skill developers, is that the benefit normally flows one way. You are taking time out from completing the project to help them learn. Someone/something of low capability isn't going to be relieving you of the large or complex tasks that would actually be useful, and be a time saver - they are going to try to do the small/simple tasks you could have breezed through, and suck up a lot of your time having to find out and explain to them how they messed up. Of course Devin doesn't even have online learning, so he'd be making the same mistakes over and over.

oytis · on March 13, 2024

> A nearly-there yet incomplete solution to a Github issue is still valuable to an engineer who knows how to debug it.

Not sure if I can agree. There would definitely be a value in looking at what libraries the solution uses, but otherwise it may be easier to write it oneself, especially when the mistakes are not humanlike.