Maybe go back and read what I said rather than make up nonsense. 'often fail' is...

HarHarVeryFunny · 2025-01-23T02:03:04 1737597784

Did you actually read the CUTE paper ?!

What character level task does it say is no problem for multi-char token models ?

What kind of tasks does it say they do poorly at ?

Seems they agree with me, not you.

But hey, if you tried spelling vs counting for yourself you already know that.

You should swap your brain out for GPT-1. It'd be an upgrade.

danielmarkbruce · 2025-01-23T17:31:59 1737653519

Conclusion:

""" While current LLMs with BPE vocabularies lack direct access to a token’s characters, they perform well on some tasks requiring this information, but perform poorly on others. The models seem to understand the composition of their tokens in direct probing, but mostly fail to understand the concept of orthographic similarity. Their performance on text manipulation tasks at the character level lags far behind their performance at the word level. LLM developers currently apply no methods which specifically address these issues (to our knowledge), and so we recommend more research to better master orthography. Character-level models are a promising direction. With instruction tuning, they might provide a solution to many of the shortcomings exposed by our CUTE benchmark """

That is "having problems with spelling 'games'" and "probably better to use character level models for such tasks". Maybe you don't understand what "spelling games" are, here: https://chatgpt.com/share/67928128-9064-8002-ba4d-7ebc5edf07...