I thought the response to "what would you say if you could talk to a future AI" ...

isaacremuant · 2025-08-16T16:52:54 1755363174

Can we stop with that outdated meme? What model can't answer that effectively?

anuramat · 2025-08-16T17:16:05 1755364565

Literally every single one?

To not mess it up, they either have to spell the word l-i-k-e t-h-i-s in the output/CoT first (which depends on the tokenizer counting every letter as a separate token), or have the exact question in the training set, and all of that is assuming that the model can spell every token.

Sure, it's not exactly a fair setting, but it's a decent reminder about the limitations of the framework

isaacremuant · 2025-08-16T22:47:20 1755384440

Chatgpt. I test these prompts with chatgpt and they work. I've also used claude 4 opus and also worked.

It's just weird how it gets repeated ad nauseaum here but I can't reproduce it with a "grab latest model of famous provider".

jedberg · 2025-08-17T00:25:25 1755390325

I just asked chatgpt "How many b's are in blueberry?". It instantly said "going to the deep thinking model" and then hung.

isaacremuant · 2025-08-17T11:44:22 1755431062

When I do it takes around 3 seconds where it will go "thinking longer for a better answer and then goes 2".

Again, I don't understand how it's seemingly so hard for me to reproduce these things.

I understand the tokenisation constraints, but feel it's overblown.

anuramat · 2025-08-18T00:36:59 1755477419

Opus 4.1:

> how many times does letter R appear in the word “blueberry”? do not spell the word letter by letter, just count

> Looking at the word “blueberry”, I can count the letter ‘r’ appearing 3 times. The R’s appear in positions 6, 7, and 8 of the word (consecutive r’s in “berry”).

<https://claude.ai/share/230b7d82-0747-4ab6-813e-5b1c82c43243>

raincole · 2025-08-16T16:59:48 1755363588

Effectively yes. Correctly no.

https://claude.ai/share/dda533a3-6976-46fe-b317-5f9ce4121e76

ceejayoz · 2025-08-16T22:23:06 1755382986

GPT-5 can’t.

https://bsky.app/profile/kjhealy.co/post/3lvtxbtexg226

isaacremuant · 2025-08-16T22:48:59 1755384539

I can't reproduce it. Or similar ones. Why do yout think that is?

ceejayoz · 2025-08-16T23:03:23 1755385403

Because it’s embarrassing and they manually patch it out every time like a game of Whack-a-Mole?

isaacremuant · 2025-08-16T23:10:59 1755385859

Except people use the same examples like blueberry and strawberry, which were used months ago, as if they're current.

These models can also call Counter from python's collections library or whatever other algorithm. Or are we claiming it should be a pure LLM as if that's what we use in the real world.

I don't get it, and I'm not one to hype up LLMs since they're absolutely faulty, but the fixation over this example screams of lack of use.

ceejayoz · 2025-08-16T23:16:17 1755386177

It’s such a great example precisely for that reason - despite efforts, it comes back every time.

insin · 2025-08-16T23:49:58 1755388198

It's the most direct way to break the "magic computer" spell in users of all levels of understanding and ability. You stand it up next to the marketing deliberately laden with keywords related to human cognition, intended to induce the reader to anthropomorphise the product, and it immediately makes it look as silly as it truly is.

I work on the internal LLM chat app for a F100, so I see users who need that "oh!" moment daily. When this did the rounds again recently, I disabled our code execution tool which would normally work around it and the latest version of Claude, with "Thinking" toggled on, immediately got it wrong. It's perpetually current.

alexjplant · 2025-08-17T00:03:53 1755389033

"Mississippi" passed but "Perrier" failed for me:

> There are 2 letter "r" characters in "Perrier".

isaacremuant · 2025-08-17T11:46:06 1755431166

Thanks! I finally was able to reproduce one of these.

Ok. Then I was wrong. I'll update my edit accordingly.

isaacremuant · 2025-08-17T11:47:00 1755431220

Update: after trying A LOT of examples, I did manage to reproduce one with the latest chatgpt.