Hacker News new | past | comments | ask | show | jobs | submit login

The llm will know 123 and 45 is a contiguious number just like how humans can tell if you say 123 and then a slight pause 45 as a single number





It's just so dissonant to me that the tokens in mathematics are the digits, and not bundles of digits. The idea of tokenization makes sense for taking the power off letters, it provides language agnosticism.

But for maths, it doesn't seem appropriate.

I wonder what the effect of forcing tokenization for each separate digit be.


I think that as long as the attention mechanism has been trained on each possible numerical token enough, this is true. But if a particular token is underrepresented, it could potentially cause inaccuracies.

It won't 'see' [123, 45] though, but [7633, 2548], or rather sparse vectors that are zero at each but the 7634th and 2549th position.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: