Hacker News new | past | comments | ask | show | jobs | submit login

Last I checked (and confirmed by repeating it just now) GPT-4 did just fine at adding 2 numbers up, because it knows better now than to do that manually and will express it as Python. It does worse if you try to force it to do it step by step like a child and don't reinforce adherence to the rules every step, because just like humans it gets "sloppy" when you try to get it to repeat the same steps over and over.

If you want to measure its ability to do mindlessly repetitive tasks without diverging from instructions, you should compare it to humans doing the same, not expect it to act like a calculator.

If you want to measure its ability to solve problems that involve many such steps that are simple to express but tedious to carry out, ask it to write and evaluate code to do it instead.




The claim was that "LLMs can do math". Below they linked a model from Google that might be capable of that, but as a general rule (and with OpenAI's models specifically) LLMs can't "do math" by any reasonable definition.


I've had it do plenty of math. Some it does badly at, some it does fine. Generally it's not "disciplined" enough to do things that requires lots of rote repetitive tasks, but neither are most humans, and that has improved drastically as they've adjusted it to instead do what most humans do and use tools. Would it be nice if it also got more willing to "stick to it" when given rote tasks? Sure.

But whether or not it can "do maths" to your definition depends very much on what you want it to do, and how you define "do maths". To me it's irrelevant if it's doing the low-level calculations as long as it knows how to express them as code. If I wanted a calculator I'd use a calculator. And I don't consider a calculator able to "do math" just because it can precisely add numbers.

Meanwhile I've had lengthy discussions with GPT about subjects like orbital mechanics and calculating atmospheric effects where it correctly used maths that I had to double-check not because I didn't trust GPT (though I also want't to verify for that reason) but because I didn't know the maths (not that it was anything particularly advanced, but I lost interest in maths during my CS degree and picked the minimum amount of maths I could get away with).

By my definition it can "do maths" just fine. I guess you don't consider my view of that "reasonable". I can live with that, as meanwhile, it will keep doing maths for me when I need it.

Of course this was also a case of moving the goalposts to set up a strawman - in the comment of yours I replied to, you claimed it couldn't reliably add two numbers.


It often fails at basic 3-4 digit arithmetic. If you're stretching that definition far enough to claim that GPT4 can "do math" then I should be able to call myself a commercial pilot because I can land a plane in a sim 20% of the time.

I'm not moving goalposts, the original claim was that LLMs can "do math". Primary school arithmetic is math.

GPT-4 can't do math and that's okay, I don't understand why so many of you are so touchy and defensive about this. It's a limitation that exists, nothing more, nothing less.


GPT-4 is a tiny subset of "LLMs".

If you train a model to do math (and optimize representation for that), it'll do math. GPT-4 just isn't, and, generally speaking, they aren't, because it's much more efficient to train them to "use a calculator". Same as with humans.


You do realize that arithmetic is a very simple symbolic manipulation task? All you have to do is keep track of the carry. I haven't seen an LLM that couldn't get digit by digit addition done, but they always mess up the carry.


Just like humans. Try to get regular people do e.g. add 15-16 digit numbers (where is typically where I'd see GPT4 start to get "sloppy" unless you prompt it the way you would a child who's learning and is still prone to get annoyed and wonder why the hell you make them to it manually), and see how many start making mistakes.

I find it really comical that this is what people complain about GPT over - there's zero benefit to get LLMs to get good at this over other tasks. To the extent we get it "for free" as a benefit of other learning, sure, but when we make kids practice this over and over again to drill doing it without getting sloppy, it has traditionally been out of some belief that it's important, but a computer will always have a "calculator" that is far more efficient than the LLM at its disposal and it's idiocy to care about whether it does that part well the tedious and hard way or knows how to describe the problem to a more efficient tool

I also find it comical that people use tasks where LLMs behaviour is if anything mot human-like, in its tendency to lose focus and start taking shortcuts (before GPT4 started writing Python instead, it'd for a while try really hard to not give you a step by step breakdown and instead clearly take shortcuts even you prompted it heavily to reason through it step by step), when presented with stupidly repetitive tasks as examples of how they're not good enough.


this goes into the heart of what it means to "know".

All human knowledge is "symbolic". that is, knowledge is a set of abstractions (concepts) along with relations between concepts. As an example, by "knowing" addition is to understand the "algorithm" or operations involved in adding two numbers. reasoning is the act of traversing concept chains.

LLMs dont yet operate at the symbolic level, and hence, it could be argued that they dont know anything. LLM is a modern sophist excelling at language but not at reasoning.


Is this rant really necessary? Most models, especially ChatGPT4 can perform carry based addition and there is zero reason for them to fail at it, but the moment you start using quantized models such as the 5 bit mixtral 8x7b the quality drops annoyingly. Is it really too much to ask? It's possible and it has been done. Now I'm supposed to whip out a python interpreter for this stuff, because the LLM is literally pretending to be a stupid human, really?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: