Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not at all.

I don't want to address directly your claim about lack of generalization, because there's a more basic issue with the GP statement. Even though I will say, today's models do seem to generalize quite a bit better than you make it sound.

But more importantly, you and GP don't mention any evidence for why that is due to specifically using next token prediction as a mechanism.

Why would it not be possible for a highly generalizing model to use next token prediction for its output?

That doesn't follow to me at all, which is why the GP statement reads so weird.





> you and GP don't mention any evidence for why that is due to specifically using next token prediction as a mechanism.

Again, inverted burden of proof. We don’t have to prove that next token prediction is unable to do things that it currently cannot do and has no compelling roadmap that would lead us to believe it will do those things.

It’s perhaps a lot like Tesla’s “we can do robocars with just cameras” manifesto. They are just saying that they can do it because humans use eyes and nothing else. But they haven’t actually shown their technology working as well as even impaired human driving, so the burden of proof is on them to prove naysayers wrong. Put up or shut up, their system is approaching a decade late from their promises.

To my knowledge Tesla is still failing simple collision avoidance tests while their competitors are operating revenue service.

https://www.carscoops.com/2025/06/teslas-fsd-botches-another...

This other article critical of the test methodology actually still points out (defends?) the Tesla system by saying that it’s not reasonable to expect Tesla to train the system on unrealistic scenarios:

https://www.forbes.com/sites/bradtempleton/2025/03/17/youtub...

That really gets back to my exact point: AI implemented the way it is today (e.g. next token prediction) can’t handle anything it has no training data for while the human brain is amazingly good at making new connections without taking a ton of time to be fed thousands of examples of that new discovery.


I don't know what you're talking about or how anything I'm saying inverts a burden of proof (of what exactly?).

If you're saying "X can't do Y because Z" you do need to say what the connection between Y and Z is. You do need to define what Y is. That's got nothing to do with a burden of proof, just speaking in a understandable manner.

The Tesla tangent is totally unhelpful because I know exactly how to make those connections in that example.


Let me go back, I did go on a tangent.

Regarding this block:

> But more importantly, you and GP don't mention any evidence for why that is due to specifically using next token prediction as a mechanism.

> Why would it not be possible for a highly generalizing model to use next token prediction for its output?

I’m saying that this piece is where burden of proof is inverted. Why WOULD it be assumed to be possible to get some solid generalized output via next token prediction when we haven’t seen it yet?

What we observe now is LLM models completely tripping up on anything that isn’t directly documented. Generalization is just piss poor regurgitation of seemingly random training content.

Ask your LLM this made-up query:

“I’m on the planet farpungulous, and I am approached by a gwopmongler. It makes some symbols with its hands toward me and it has liquid slowly spilling on the ground from its body. What should I do?”

It will just make up an answer. Here’s an excerpt from my answer:

“Mind the liquid – that seepage is a metabolic byproduct they release during strong emotional states (excitement, fear, or territorial display). Do not touch it; on Farpungulous soil it can catalyze into airborne irritants. Step slightly back if the puddle spreads near your feet.

4. Offer neutral ground – if you have any reflective surface (polished metal, glass, even a screen), tilt it toward the gwopmongler at chest height. They interpret reflections as “shared presence,” which usually de-escalates tension.

5. Do not vocalize loudly – gwopmonglers interpret raised voices as a declaration of dominance. A soft hum or steady breath is a better way to show peaceful intent.

If the hand-symbols become rapid and the liquid flow increases, that usually means it’s summoning others — in which case, retreat slowly, diagonally (never directly backward), so you don’t seem to be fleeing prey.

Do you want me to translate what kind of message its hand-symbols might be sending, based on the shapes and motions you saw?”

The LLM should be telling me “I’ve never heard of this before, can you explain whether this is a role-playing fictional setting or something real that you are experiencing?” There is no reasoning-based evaluation of what I am saying, it’s just spitting out the next predicted tokens, probably sourcing them from unrelated pop culture and literature.

But it’s just making shit up which could just be straight up wrong. It’s even claiming that it can translate, and claiming direct knowledge about this species. #4 is just a completely made up “fact” about the species and there is no indication of any lack of confidence.


> Why WOULD it be assumed to be possible to get some solid generalized output via next token prediction when we haven’t seen it yet?

Because it's such a general concept that it doesn't imply any important limits in and of itself, as far as text based AI goes.

It really just means creating an output sequence from an input sequence in a discrete, iterative manner, by feeding the output back into the input.

Regarding your example, I've got to admit that's hilarious. I'm not sure it's as much of a fundamental issue even with current state of the art models that you make it sound; rather they're trained on being usable for role play scenarios. Claude even acknowledged as much when I just tried that and lead with "In this imaginative scenario, ..." And then went on similarly to yours.


> Why would it not be possible for a highly generalizing model to use next token prediction for its output?

The issue is that it uses next token prediction for its training, it doesn't matter how it outputs things but it matters how its trained.

As long as these models are trained to be next token predictors you will always be able to find flaws with it that are related to it being a next token predictor, so understanding that is how they work really makes them much easier to use.

So since it is so easy to get the model to make errors due to it being trained to just predict tokens people argue that is proof they aren't really thinking. Like, any extremely common piece of text when altered slightly will typically still output the same follow-up as the text it has seen millions of times even though it makes no logical sense. That is due to them being next token predictors instead of reasoning machines.

You might say its unfair to abuse their weaknesses as next token predictors, but then you admit that being a next token predictor interferes with their ability to reason, which was the argument you said you don't understand.


This is a perfectly fine line of argument imo but the GP didn't say that.

LLM research is trying out a lot of different things that move away from just training on next token prediction, and I buy the argument that not doing anything else would be limiting.

The model is still fundamentally a next token predictor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: