It's just so dissonant to me that the tokens in mathematics are the digits, and not bundles of digits. The idea of tokenization makes sense for taking the power off letters, it provides language agnosticism.
But for maths, it doesn't seem appropriate.
I wonder what the effect of forcing tokenization for each separate digit be.
I think that as long as the attention mechanism has been trained on each possible numerical token enough, this is true. But if a particular token is underrepresented, it could potentially cause inaccuracies.