40 points by hatcherdogg on May 19, 2023 | hide | past | favorite | 14 comments

 BLOOMChat is a 175B chat model able to have multilingual conversations after being fine-tuned on English data. Built by SambaNovaAI and Together by fine-tuning chat
 it has some weird math problems, I asked to compare to ChatGPT itself and it responded with that while ChatGPT is trained on 120 billion messages, bloomchat is trained on 1.7 billion messages, thus bloomchat is trained on more dataWhen I asked which is more 1.7 or 120, it said 1.7 is greater number and then started spewing complete garbage math how 1.7 - 120 = 60 and since 60 is more than 0 then 1.7 is more than 120.Utter garbage
 LLMs being bad at math is a known issue, but what they are good at is writing programs. For example:>>> Write a program that calculates if 120 is greater than 0.7.>Sure, here's a program in Python that calculates if 120 is greater than 0.7:`````` if 120 > 0.7: print("Yes, 120 is greater than 0.7") else: print("No, 120 is not greater than 0.7") `````` For straight input/output like what this model is trained on, questions like this don't work well. However if LLMs are equipped with tools (like a code interpreter), they get a lot smarter.
 jug on May 19, 2023 Try to not use it on math. Only GPT-4 has reasonable performance there. GPT 3.5 is also pretty awful. It's apparently extremely hard for any LLM to actually understand math. Maybe because they're language models, not math models, so math is a pretty far fetched "emergent property".
 Nobody does long-form arithmetic in the texts seen by these LLMs. Everyone uses calculators, so the AIs only see the result, not the step-by-step process to get there.I would expect the models to be bad at, say, division of long numbers in the same way humans are bad at doing the same calculations in their head!
 Would sythetically generating millions of calculation examples and adding that to the training data help?
 Yes, but it is a "waste of its capacity" in the same way it's a waste of the human brain's potential to learn numeracy to this level. Calculators are cheap, readily available, and don't make errors.I'd be more interested in having the thing "study maths" in the sense of seeing many examples of Wolfram Language being used in matching English context. That way, it would learn the English->Maths mapping in a format that it can then feed into an industrial-strength mathematics engine. Apparently, Stephen Wolfram is working on this now, but the "full" fine tuning training would require significant funding and time. I suspect OpenAI has other priorities right now, but this type of thing will eventually become a routine way of making specialised and/or more capable large language models.
 jxy on May 19, 2023 Bloom models are hopelessly under-trained. This one is worse than a 13B Vicuna.
 you are not wrong, but I don't think that is their point. It's just a muscle flex to shows you that their hardwares work. If it can train 176B or whatever, it should be an easy peasy for them to train 13B.
 I've got similar garbage out of ChatGPT (though tbf, pre-GPT4), I don't think LLMs can understand math
 Why do people rush to test these on math problems, the things computers are already really good at?
 chii on May 20, 2023 it's an attempt to see if the output is "actual intelligence", or just statistical bullshit.
 Completing math problems is a very narrow part of the intelligence spectrum.
 you just need to teach it to use a calculator https://arxiv.org/pdf/2302.04761.pdf

Search: