> Humans are good at performing reliable calculations with pen and paper.
Speak for yourself. Even though I've always been strong at my conceptual understanding and problem solving in math, I always found it difficult to avoid arithmetic mistakes on pen and paper and could never understand why I was assessed on that. I could have done so much better in high-school math if I was allowed to use a programmable computer for the calculations.
And I think it's the same for LLMs, we should assess them on doing the arithmetic in a single pass, but rather on writing the code to perform the calculation, and responding based on that.
Maybe a lot of people suffer from a degree of dyscalculia, but in my experience if you do it a lot you just stop making mistakes. Not just me, many others I've seen reliably do calculations pretty quick without making errors, you just do everything twice as you go and then arithmetic errors go to basically 0.
But I do acknowledge that there are probably some or many humans that maybe can't reach that level of reliability with arithmetics.
Speak for yourself. Even though I've always been strong at my conceptual understanding and problem solving in math, I always found it difficult to avoid arithmetic mistakes on pen and paper and could never understand why I was assessed on that. I could have done so much better in high-school math if I was allowed to use a programmable computer for the calculations.
And I think it's the same for LLMs, we should assess them on doing the arithmetic in a single pass, but rather on writing the code to perform the calculation, and responding based on that.