
GPT-3 and Arithmetic - luu
https://nostalgebraist.tumblr.com/post/620663843893493761/bpe-blues
======
memexy
The article is talking about something called BPE but it's not defined
anywhere. Chasing links I see it's about tokenization and how it's "weird".

> BPE tries to be efficient, so it doesn’t waste token slots on spaces if it
> doesn’t have to. A word is almost always preceded by a space, so instead of
> representing “ example text” as four tokens (space, “example,” space,
> “text”), it represents it as two:

> [(' Example', 17934), (' text', 2420)]

Apparently this is a problem because when feeding prompts to the system you
will get different results.

> So far, seems innocuous, right? But what if you’re feeding a prompt into
> GPT-2? Unless you’re hip to this particular issue, you’ll probably type in
> something like

> “Example text”

> which becomes

> [('Example', 16281), (' text', 2420)]

> Compare this to the one above. Yes – instead of token #17934, with the
> preceding space, I’ve unwittingly fed in token #16281, without a preceding
> space.

Maybe someone should figure out how to write a neural network for tokenizing
text in a more natural way since it seems like the tokenizer is hand crafted
and is essentially a hyperparameter that can not be optimized with gradient
descent.

