Positional description matters for transformers arithmetic

thomasahle · 2023-11-28T23:23:41

Also positional encodings. A few months ago I did a bunch of experiments with length generalization in addition and multiplication (https://github.com/thomasahle/arithmetic-transformer).

It turned out that not using positional encodings at all (relying entirely on the causality mapping in the attention) did better than all other methods like Learned PoE, Sinusoidal, RoPE and using an LSTM below the transformer.

Kerbonut · 2023-11-30T16:27:18

Can that be part of a multimodal language model to better understand and perform arithmetic operations, enhancing the LLMs numerical reasoning?

inciampati · 2023-11-29T04:32:31

Fascinating, thanks for sharing this!