Wired: LLM are practically AGI Tired: ChatGPT thinks February 29th isn't a valid...

pquki4 · on March 1, 2024

My favorite prompt: asking "How many e's are there in the word egregious". Always says three (ChatGPT 3.5, 4, 4 turbo). When you ask it which three, it realizes its mistake and apologizes (or sometimes tells you where they are that is completely wrong). Looks like it just outputs gibberish for these things.

moozilla · on March 1, 2024

ChatGPT is specifically bad at these kinds of tasks because of tokenization. If you plug your query into https://platform.openai.com/tokenizer, you can see that"egregious" is a single token, so the LLM doesn't actually see any "e" characters -- to answer your question it would have had to learn a fact about how the word was spelled from it's training data, and I imagine texts explicitly talking about how words are spelled are not very common.

Good explanation here if this still doesn't make sense: https://twitter.com/npew/status/1525900849888866307, or check out Andrej Karpathy's latest video if you have 2 hours for a deep dive: https://www.youtube.com/watch?v=zduSFxRajkE

IMO questions about spelling or number sense are pretty tired as gotchas, because they are all basically just artifacts of this implementation detail. There are other language models available that don't have this issue. BTW this is also the reason DALL-E etc suck at generating text in images.

Izkata · on March 1, 2024

> If you plug your query into https://platform.openai.com/tokenizer, you can see that"egregious" is a single token

That says it's 3 tokens.

wruza · on March 1, 2024

It doesn’t even matter how many tokens there is, because LLMs are completely ignorant about how their input is structured. They don’t see letters or syllables cause they have no “eyes”. The closest analogy with a human is that vocal-ish concepts just emerge in their mind without any visual representation. They can only “recall” how many “e”s are there, but cannot look and count.

alickz · on March 1, 2024

>They can only “recall” how many “e”s are there, but cannot look and count.

Like a blind person?

wruza · on March 1, 2024

My initial analogy was already weak, so I guess there's no point in extending it. They key fact here is that tokens are inputs to what essentially is an overgrown matrix multiplication routine. Everything "AI" happens few levels of scientific abstractions higher, and is semantically disconnected from the "moving parts".

brewtide · on March 1, 2024

Pre-cogs, I knew it.

redox99 · on March 1, 2024

" egregious" (with a leading space) is the single token. Most lower case word tokens start with a space.

ToValueFunfetti · on March 1, 2024

The number of tokens depends on context; if you just entered 'egregious' it will have broken it into three tokens, but with the whole query it's one.

fuzztester · on March 1, 2024

Why three tokens, not one?

317070 · on March 1, 2024

without the leading space, it is not common enough as a word to have become a token in its own right. Like the vast majority of lowercase words, in OpenAIs tokenizer you need to start " egregious" with a space character for the single token.

chgs · on March 1, 2024

Chatgpt could say “I don’t know”

devjab · on March 1, 2024

It’s always gibberish, it’s just really good at guessing.

I forgot what exactly it was I was doing. We were trying to get it to generate word lists of words ending with x or maybe it was starting with. For a marketing PoC and it made up oceans of words that not only didn’t start/end with x but mostly didn’t include x at all.

Isn’t this also why it can pass CS exams and job interviews better than like 95% of us, but then can’t help you solve the most simple business process in the world. Because nobody has asked that question two billion times on its training data.

mewpmewp2 · on March 1, 2024

But also it doesn't see characters. It sees tokens. The only way it would be reliably able to solve it is if it had a lookup table of token to characters. Which it likely doesn't.

You couldn't do it either unless you learned the exact matchings of all tokens to all characters in that token and their positions if you were given tokens as an input. You would have learned the meaning of the token, but not what the exact characters it represents.

yousif_123123 · on March 1, 2024

Even if it sees tokens, I don't think it's an impossible task. Certainly an advanced enough LLM should be able to decipher token meanings, to know that a word is made up of the individual character tokens regardless of how the full word is tokenized. Maybe something gpt5 can do (or there's some real technical limitation which I don't understand)

Izkata · on March 1, 2024

> to know that a word is made up of the individual character tokens

A token is the smallest unit, it's not made of further tokens. It maps to a number.

nvader · on March 1, 2024

I think what of is getting at is that given

{the:1, t: 2, h:3, e:4}

There should be somewhere in the corpus, "the is spelled t h e" that this system can use to pull this out. We can ask gpt to spell out individual words in NATO phonetic and see how it does.

littlestymaar · on March 1, 2024

> There should be somewhere in the corpus, "the is spelled t h e" that this system can use to pull this out.

Such an approach would require an enormous table, containing all written words, including first and last names, and would still fail for made up words.

A more tractable approach would be to give it the map between the individual tokens and their letter component, but then you have the problem that this matching depends on the specific encoding used by the model (it varies between models). You could give it to the model during fine-tuning though.

mewpmewp2 · on March 1, 2024

The best approach would be to instruct it to under the hood call a function for such asks and hide the fact that it called a function.

pests · on March 1, 2024

He's saying the LLM will figure out how many letters are in each token.

littlestymaar · on March 1, 2024

They cannot “figure” it, they could learn it but for that it would need to be in it's training data (which isn't because nobody is writing down the actual pairing in every byte pair encoding in plain text. Also the LLM has no clue about what encoding it uses unless you tell it somehow in the fine-tuning process or the prompt.)

wruza · on March 1, 2024

It's as feasible as telling how many chars in html lead to this comment by looking at a screenshot. LLM doesn't see characters, tokens, numbers or its own activations. LLM is a "set of rules" component in a chinese room scenario. Anything an operator of that room does is lower-level.

GGP's idea suggests that an LLM, allegedly as a whole-room, receives something like: "hey, look at these tokens: <tokens>, please infer the continuation". This puts it into a nested-room's-operator position, which (1) it is not, (2) there's no nested room.

mewpmewp2 · on March 1, 2024

The point is though that this is definitely not a task to evaluate an LLMs intelligence with. It's kind of like laughing at Einstein when he wouldn't be able to decipher hieroglyphs in any language without previous training for those hieroglyphs. Could Einstein potentially learn those hieroglyphs? Sure. But is it the best use of his time - or memory space?

botanical · on March 1, 2024

I just asked it with Gemini; at first, it got it right. Then I asked if it was sure and it apologised and said 3 is the correct answer. When asked what are the 3 "e"s, it says:

> The three "e"s in "egregious" are: > > 1. The first "e" is located in the first syllable, between the "g" and the "g". > 2. The second "e" is located in the second syllable, following the "r". > 3. The third "e" is located in the third syllable, following the "i" and before the last "o".

AlienRobot · on Feb 29, 2024

That's because it's trained on 2021 data. No February 29 back then!

pants2 · on March 1, 2024

For the record, both ChatGPT-4 and Gemini Ultra affirmed that it's a valid date. Gemini reasoned through it and GPT-4 ran python code to make sure 2024 was divisible by 4.

roland35 · on March 1, 2024

Interesting... But that isn't exactly true! Centuries that are not divisible by 4 don't count!

swores · on March 1, 2024

I've yet to come across a form of 100 that isn't divisible by 4... since 25 usually still exists!

But I do remember there being some weird niche rules about which years are or aren't leap years, so I'm guessing your comment is basically right just wrongly worded?

latexr · on March 1, 2024

The rule is that leap years are the ones divisible by 4. Unless it’s also divisible by 100. Unless unless it’s divisible by 400.

So 2000 was leap, but 2100, 2200, and 2300 won’t be, but 2400 will be.

swores · on March 1, 2024

Ahh, so it's centuries that aren't divisible by 400 rather than that aren't divisible by 4, that makes more sense!

Thanks for answering

thaumasiotes · on March 1, 2024

It's centuries that aren't divisible by 4. It isn't years that aren't divisible by 4.

inglor_cz · on March 1, 2024

The GP formulated it in a somewhat unclear way. "Centuries" divisible by 4 probably meant "years" divisible by 400.

So, 19th century (1900 is the last year) isn't divisible by 4 (19/4 is not integer), which is the same as saying that 1900 isn't divisible by 400.

This is the main reform of the Gregorian calendar - leap days aren't introduced on xy00 years which aren't divisible by 400. This corrects the length of a year to 365.2425 days, which is fairly close to the real value of 364.2422 days.

The original Julian calendar had year of 365.25 days, which aggregated an error of more than ten days over the centuries.

daseiner1 · on March 1, 2024

Did it also check if 2024 is divisible by 100 but not 400?

peteradio · on March 1, 2024

But akshually neither does my uncle Ned