LLMs aren't wrong by a small percentage, they are wrong by a small number of tokens. They can miss a zero or be off by 100% and its just a token difference, to the LLM that is a minor mistake since everything else was right but it is a massive mistake in practice.
I watch math classes on youtube and some lecturers make symbolic mistakes all the time. Minus instead of a plus, missing exponents, saying x but writing y, etc. They only notice it when something unexpected contradicts down the line.
They got it right as you said, it just took a bit longer. That doesn't contradict what I said, humans can get things right very reliably by looking over the answers especially if you have another human to help look at the answers. An AI isn't comparable to a human, it is comparable to a team of humans, two ChatGPTs can't get more accurate by correcting each others answers but two humans can.
A professor might be able to iterate to a correct answer but a student might not.
And ChatGPT is definitely able to get improve its answer by iterating, it just depends on the toughness of the problem. If it's too difficult, no amount of iteration will get it much closer to the correct answer. If it's closer to its reasoning limits, then iterating will help.
But if you stop them just there, an error persists. A professor is “multi-modal” and in a constant stream of evebts, including their lecture plan and premeditated key results. Are you sure that at some level of LLM “intelligence”, putting it into the same shoes wouldn’t improve the whole setting enough? I mean sure, they make mistakes. But if you stop-frame a professor, they make mistakes too. They don’t correct immediately, only after a contradiction gets presented. Reminds me how LLMs behave. Am I wrong here?
Edit: was answering to gp, no idea how my post got here
Asking the LLM to correct itself doesn't improve answers since they will happily add errors to correct answers when asked to correct it. That makes it different from humans, humans can iterate and get better, our current LLMs can't.
> But if you stop them just there, an error persists
But humans doesn't stop there when they are making things that needs to be reliably correct. When errors aren't a big deal humans make a lot of errors, but when errors costs life humans become very reliable by taking more time and looking things over. They still sometimes makes mistakes that kills people, but very rarely.
So many things contribute to human error it is probably impossible to make a 1 to 1 parallel with LLM's. For instance, the fact that you are being recorded is in many cases a significant performance drop.