This makes it worse IMO. I was starting to think it didn’t have a letter by lett...

Isinlor · 2025-01-02T07:50:47 1735804247

When Can Transformers Count to n?

The Expressive Power of Transformers with Chain of Thought

Transformer needs to retrieve letters per each token while forced to keep internal representation still aligned in length with the base tokens (each token also has finite embedding, while made out of multiple letters), and then it needs to count the letters within misaligned representation.

Autoregressive mode completely alleviate the problem as it can align its internal representation with the letters and it can just keep explicit sequential count.

BTW - humans also can't count without resorting to sequential process.

frikskit · 2025-01-02T11:26:04 1735817164

Thanks!