Hacker News new | past | comments | ask | show | jobs | submit login
Positional description matters for transformers arithmetic (arxiv.org)
64 points by Handy-Man 6 months ago | hide | past | favorite | 3 comments



Also positional encodings. A few months ago I did a bunch of experiments with length generalization in addition and multiplication (https://github.com/thomasahle/arithmetic-transformer).

It turned out that not using positional encodings at all (relying entirely on the causality mapping in the attention) did better than all other methods like Learned PoE, Sinusoidal, RoPE and using an LSTM below the transformer.


Can that be part of a multimodal language model to better understand and perform arithmetic operations, enhancing the LLMs numerical reasoning?


Fascinating, thanks for sharing this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: