Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought the response to "what would you say if you could talk to a future AI" would be "how many r in strawberry".




Can we stop with that outdated meme? What model can't answer that effectively?

Literally every single one?

To not mess it up, they either have to spell the word l-i-k-e t-h-i-s in the output/CoT first (which depends on the tokenizer counting every letter as a separate token), or have the exact question in the training set, and all of that is assuming that the model can spell every token.

Sure, it's not exactly a fair setting, but it's a decent reminder about the limitations of the framework


Chatgpt. I test these prompts with chatgpt and they work. I've also used claude 4 opus and also worked.

It's just weird how it gets repeated ad nauseaum here but I can't reproduce it with a "grab latest model of famous provider".


I just asked chatgpt "How many b's are in blueberry?". It instantly said "going to the deep thinking model" and then hung.

When I do it takes around 3 seconds where it will go "thinking longer for a better answer and then goes 2".

Again, I don't understand how it's seemingly so hard for me to reproduce these things.

I understand the tokenisation constraints, but feel it's overblown.


Opus 4.1:

> how many times does letter R appear in the word “blueberry”? do not spell the word letter by letter, just count

> Looking at the word “blueberry”, I can count the letter ‘r’ appearing 3 times. The R’s appear in positions 6, 7, and 8 of the word (consecutive r’s in “berry”).

<https://claude.ai/share/230b7d82-0747-4ab6-813e-5b1c82c43243>




I can't reproduce it. Or similar ones. Why do yout think that is?

Because it’s embarrassing and they manually patch it out every time like a game of Whack-a-Mole?

Except people use the same examples like blueberry and strawberry, which were used months ago, as if they're current.

These models can also call Counter from python's collections library or whatever other algorithm. Or are we claiming it should be a pure LLM as if that's what we use in the real world.

I don't get it, and I'm not one to hype up LLMs since they're absolutely faulty, but the fixation over this example screams of lack of use.


It’s such a great example precisely for that reason - despite efforts, it comes back every time.

It's the most direct way to break the "magic computer" spell in users of all levels of understanding and ability. You stand it up next to the marketing deliberately laden with keywords related to human cognition, intended to induce the reader to anthropomorphise the product, and it immediately makes it look as silly as it truly is.

I work on the internal LLM chat app for a F100, so I see users who need that "oh!" moment daily. When this did the rounds again recently, I disabled our code execution tool which would normally work around it and the latest version of Claude, with "Thinking" toggled on, immediately got it wrong. It's perpetually current.


"Mississippi" passed but "Perrier" failed for me:

> There are 2 letter "r" characters in "Perrier".


Thanks! I finally was able to reproduce one of these.

Ok. Then I was wrong. I'll update my edit accordingly.


Update: after trying A LOT of examples, I did manage to reproduce one with the latest chatgpt.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: