People are going to keep saying this about autoregressive models, how small errors accumulate and can't be corrected, while we literally watch reasoning models say things like "oh that's not right, let me try a different approach".
To me, this is like people saying "well NAND gates clearly can't sort things so I don't see how a computer could".
Large transformers can clearly learn very complex behavior, and the limits of that are not obvious from their low level building blocks or training paradigms.
> while we literally watch reasoning models say things like "oh that's not right, let me try a different approach".
Not saying I disagree with your premise that errors can’t be corrected by using more and more tokens, but this argument is weird to me.
The model isn’t intentionally generating text. The kinds of “oh let me try a different approach” lines I see are often followed by the same approach just taken. I wouldn’t say most of the time, but often enough that I notice.
Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.
> Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.
What does it represent then? What are all these billion weights for? It's not a bag full of NULLs that just pulls next words from a look-up table. Obviously there is some kind of internal process.
Also I don't get why people ignore the temporal aspect. Humans too generate thoughts in sequence, and can't arbitrarily mutate what came before. Time and memory is what forces sequential order - we too just keep piling on more thoughts to correct previous thoughts while they are still in working memory (context).
The text represents a prediction of how a human may respond, one word(ish) at a time, that's it.
With "reasoning" models, the reasoning layer is basically another LLM instructed to specifically predict how a human may respond to the underlying LLM's answer, fake prompt engineering if you will.
There of course is some kind of internal process, but we can't prove any kind of reasoning. We ask a question, the main LLM responds, and we see how the reasoning layer LLM itself responds to that.
Please don't confuse people with wrong information, the reasoning part in reasoning models is the exact same LLM that produces the final answer. For example o1 uses special "thinking" tokens to demarcate between reasoning and answer sections of it's output.
Sure, that's a great clarification though maybr a bit of an implementation detail in this context.
Functionally my argument stands in this context - just because we can see one stream of LLM responses responding to the primary response stream says nothing of reasoning or what is going on internally in the reasoning layer.
> what is going on internally in the reasoning layer.
We literally know exactly what is going on with every layer.
It’s well defined. There are mathematical proofs for everything.
Moreover it’s all machine instructions which can be observed.
The emergent properties we see in LLMs are surprising and impressive, but not magic. Internally what is happening is a bunch of matrix multiplications.
There’s no internal thought or process or anything like that.
It’s all “just” math.
To assume anything else is personification bias.
To look at LLMs outputting text and a human writing text and think “oh these two things must be working in the same way” is just… not a very critical line of thought.
> We literally know exactly what is going on with every layer.
Unless I missed a huge break in the observability problem, this isn't correct.
We know exactly how every layer is designed and we know how we functionally expect that to work. We don't know what actually happens in the model at time of inference.
I.e. we know what pieces were used to build the thing but when we actually use it its a black box - we only know inputs and outputs.
This paper [1] may be an interesting place to start.
We only know how the structures are designed to work, and we have hypothesise of how they likely work. We can't interpret what actually happens when the LLM is actually going through the process of generating a response.
That seems pedantic or unimportant on the surface, but there are some really important implications. At the more benign level, we don't know why a model gave a bad response when a person wasn't happy with the output. On the more important end, any concerns related to the risk of these models becoming self-directed or malicious simply can't be recognized or guarded against. We won't know if a model becomes self-directed until after it acts on it in ways that don't match how we already expect them to work.
Both alignment and interoperability were important research topics for decades of AI research. We effectively abandoned those topics once we made real technological advancement - once an AI-like tool was no longer entirely theoretical we couldn't be bothered focusing resources on figuring out how to do it safely. The horse was already out of the barn.
Does this mean they will turn evil or end up going poorly for us? Absolutely not. It just means that we have to cross our fingers and hope because we can't detect issues early.
> We can't interpret what actually happens when the LLM is actually going through the process of generating a response.
There are 2 things we’re talking about here.
There’s the physical, mechanical operations going on during inference and there’s potentially a higher order process happening as an emergent property of those mechanical operations.
We know precisely the mechanical operations that take place during inference as they are machine instructions which are both man-made and very well understood. I hope we can agree here.
Then there’s potentially a higher order process. The existence of that process and what that process is still a mystery.
We do not know how the human brain works, physically. We can’t inspect discrete units of brain operations as we can with machine instructions.
For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.
Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.
OpenAI may be able to do more in the long term because they don't show the <think> and can spend more of that scratch space on improving answers vs appeasing users, but time will show.
Remember that probabilistic checkable proofs show how random data can improve computation.
The AI field has always had a problem with wishful mnomics.
But it is probably not a binary choice, if we could get the scratch space to reliably simulate Dykstra' shunting and convert to postfix as an example, that would be great.
You don’t know this. I don’t feel like I generate thoughts in sequence, for me it feels hierarchical.
> can't arbitrarily mutate what came before
Uhh… what?
Do you remember your memories as a child?
Or what you ate for breakfast 3 weeks ago?
Have you ever misremembered an event or half remembered a solution to a problem?
The information in human minds are entirely mutable. They are not like computers…
> It's not a bag full of NULLs that just pulls next words from a look-up table.
Funny enough, the attention mechanism that’s popular right now is effectively lots and lots of stacked look up tables. That’s how it’s taught as well (what with the Q K and V)
Tho I don’t think that’s a requirement for LLMs in general.
I find a lot of people who half understand cognition and understand computing look at LLMs and work backwards to convince themselves that it’s “thinking” or doing more cognitive functions like we humans do. It’s personification bias.
> Do you remember your memories as a child? Or what you ate for breakfast 3 weeks ago?
For me, this seems like conjuring up and thinking about a childhood event is like putting what came out of my nebulous 'memory' fresh into context at the point in time you are thinking about it, along with whatever thoughts I had about it (how embarrassed I was, how I felt proud because of X, etc). As that context fades into the past, some of those thoughts may get mixed back into that region of my 'memory' associated with that event.
As the number of self-corrections increases, it also increases the likelihood that it will say "oh that's not right, let me try a different approach" after finding the correct solution. Then you can get into a second-guessing loop that never arrives at the correct answer.
If the self-check is more reliable than the solution-generating process, that's still an improvement, but as long as the model makes small errors when correcting itself, those errors will still accumulate. On the other hand, if you can have a reliable external system do the checking, you can actually guarantee correctness.
Error correction is possible even if the error correction is itself noisy. The error does not need to accumulate, it can be made as small as you like at the cost of some efficiency. This is not a new problem, the relevant theorems are incredibly robust and have been known for decades.
Can you link me to a proof demonstrating that the error can be made arbitrarily small? (Or at least a precise statement of the theorem you have in mind.) I would think that if the last step of error correction turns a correct intermediate result into an incorrect final result with probability p, that puts a lower bound of p on the overall error rate.
LeCun is for sure a source of inspiration, and I think he has a fair critique that still holds true despite what people think when they see reasoning models in action. But I don't think like him that autoregressive models are a doomed path or whatever. I just like to question things (and don't have absolute answers).
I-JEPA and V-JEPA have recently shown promising results as well.
I'd argue that humans are by definition autoregressive "models", and we can change our minds mid thought as we process logical arguments. The issue around small errors accumulating makes sense if there is no sense of evaluation and recovery, but clearly, both evaluation and recovery is done.
Of course, this usually requires the human to have some sense of humility and admit their mistakes.
I wonder, what if we trained more models with data that self-heals or recovers mid sentence?
I think recurrent training approaches like those discussed in COCONUT and similar papers show promising potential. As these techniques mature, models could eventually leverage their recurrent architecture to perform tasks requiring precise sequential reasoning, like odd/even bit counting that current architectures struggle with.
> By design, AR models lack planning and reasoning capabilities. If you generate one word at a time, you don’t really have a general idea of where you’re heading.
I have one minor quibble here, which is that the limitation described isn't a criticism of AR models (whose outputs are only "backward-looking" for their inputs), but just a subset of AR models in popular use. An AR model is fully capable of generating a large state space and doing many computations (even doing many full-connected diffusion steps) before generating the first output token.
That quibble wouldn't be worth mentioning unless AR models had some sort of advantage, but they do, and it's incredibly important. AR factorization of the conditional probabilities allows you to additively consider the loss contribution from each output token -- you can blindly shove whatever data you want into the thing, add up all the errors, and backpropagate, all while guaranteeing that the distribution you're learning is the same distribution from your training data.
If you're not careful, via some mechanism (like AR), the distribution you learn will have almost nothing to do with the distribution you're training on -- a common failure mode being a tendancy to predict "average-looking" sub-tiles in a composite image and only predict images which can be comprised out of those smaller, averge-looking sub-tiles. Imagine (as an example, with low enough model capacity), you had a model generating people and everyone was vaguely 5'10", ambiguously gendered, and a bit tan, contrasted with that same model trained using AR where you'd expect the outputs to be bad in other ways if you had insufficient capacity but to at least have a mix of colors, heights, and genders. Increasing capacity can help, but why bother when something like AR solves it by definition?
I think the author is projecting significantly when he says the goal of AI researchers is to understand and replicate how humans think. If you start from that wrong assumption of course it looks silly for them to be doing anything other than neuroscience research, the author's field.
It's like saying the stockfish developers should stop researching mixed NN and search methods because they don't understand how humans play chess yet.
This is mainly a misunderstanding due to the way I phrased it. This is what I think. I know for a fact that is the case for other AI researchers having watched many conferences - "all of them" is not what I meant (I wrote "many other") and we certainly need people to approach problems from different perspectives and backgrounds, since they will benefit from each other in the end. Not going to lie I'm a bit disappointed to see these kind of comments.
Fair enough about your motivation, however you also to further in saying that the best way to achieve and exceed human intelligence is to first understand it. That didn't pan out for chess, it hasn't contributed much to our current SOTA approaches to many other problems where LLMs are king, and I'm not sure why neuroscientists are so confident in some future where their field is the key to intelligence when their track record of breakthroughs is so poor.
I admit my phrasing was poor there, and I got too excited. I will clarify since I don't really disagree with you or what others said (claiming it's the best way is an overstatement).
Well, one could say that neural networks pioneers modeled their ideas on simplified brain structures representations. Modern neural networks have little in common with an actual biological brain, however, the inspiration remains there (even for modern NNs like CNNs). I recall the intent was there too, originally: providing a framework to study biological cognition in the 50s. Then it evolved to become a new paradigm in computer science so that we have programs able to learn and adapt for problems that are formally too complicated for deterministic solutions.
Is anyone aware of a formalization of the idea that to get “symbols” out of fuzzy probability distributions one needs distributions whose value goes exactly to zero over some regions of the domain? I.e. Gaussian mixtures won’t cut it. And they will need very high Fourier frequencies.
I have the gut feeling that until a model allows for a small probability that 2x3 is 7, there will always be hallucinations. Probabilities need to be clamped to zero to emulate symbolic behaviour.
Symbolic behavior is artificial and not how humans think either. 0 is not a probability (neither is 1) - a value of 0 or 1 basically breaks calculations by dragging everything along to the limit, the same way infinity does, or 0 in the denominator (in fact, that's what 1 and 0 translate to if you switch to logprobs or other equivalent ways to calculate probabilities).
Consider: if you clamp the probability distribution of answers to 2x3, so that it's 0 everywhere else and 1 at 6, you're basically saying that it is fundamentally impossible for you to misunderstand the question, or make mistake in the answer, or that you're dreaming, or hallucinating, or that you've momentarily forgotten that the question was preceded by "In base 4, what is ", or any number of other things that absolutely are possible, even if highly unlikely, in the real world.
> But what is the original purpose of AI research? I will speak for myself here, but I know many other AI researchers will say the same: the ultimate goal is to understand how humans think. And we think the best (or the funniest) way to understand how humans think is to try to recreate it.
Eh. To riff on Dijkstra, this is like submarine engineers saying their ultimate goal is to understand how fish swim.
I come from a medical science background, where I studied the brain from a "traditional" neuroscience perspective (biology, pathology, anatomy, psychology and whatnot). That the best way is actually to try to recreate it is honestly how I feel whenever I read about AI advancements where the clear goal is to achieve/surpass human intelligence, something we don't fully understand yet.
“What I cannot create, I do not understand.” someone clever once said.
It doesn't really follow that we (humans) have to replicate how we (humans) gained intelligence, there very well could be a shortcut that doesn't involve millions of years of getting eaten by tigers.
The author (and Chomsky) fail to understand that LLMs (as well as human brains) are not just autoregressive models, but nonlinear autoregressive models. Put a slightly different way, you can describe LLMs as autoregressive, but only by taking liberties with the classical definition of 'autoregressive.'
The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question. On the contrary, the human mind is a surprisingly efficient and even elegant system that operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations. – Noam Chomsky
It's as if Chomsky has either never heard of transformers, or doesn't understand what they do.
Before speaking a sentence, we have a general idea of what we’re going to say; we don’t really choose what to say next based on the last word. That kind of planning isn’t something that can be represented sequentially.
It's as if the author (and Chomsky) has never seen a CoT model in action.
Author here and I welcome the feedback, but I don't really understand your point. My post is clearly not dismissive of efforts to make LLMs reason using CoT prompting techniques and post-training, and I think such efforts are even mentioned. The model remains autoregressive either way, and this reasoning is not some kind of magic that makes them behave differently - these improvements only make them perform (much) better on given tasks.
Additionally, I'm not dismissive of the non-linear nature of transformers which I'm familiar with. Attention mechanism is a lot more complex than a linear relationship between the prediction and the past inputs, yes. But the end result remains sequential prediction. Ironically, diffusion models are kind of the opposite: sequential internally, parallel prediction at each step.
(Note: added note on terminology since the confusion arised by my use of "linearity", which was not referring to the attention mechanism itself. I've read so many papers that are perfectly fine with the use of "autoregressive" for this paradigm that I forgot some people coming from traditional statistics may be confused. Also "based on the last word" was wrong and meant "last words" or "previous words", obviously.)
All that being said, I don't think it's fair to say one doesn't understand how transformers work solely because of semantic interpretation. I appreciate the feedback though!
Not saying that our current approaches will lead to intelligence. No one can know.
It could very well be that the internal mechanism of our thought has an auto-regressive reasoning component.
With the full system effectively "combining" short term memory (what just happened) and "pruned" long-term memory (what relevant things i know from the past) and pushing that into a RAW autoregressive reasoning component.
It is also possible that another specialized auto regressive reasoning component is driving the "prune" and "combine" operations. This whole system could be solely represented in the larger network.
The argument that "intelligence cannot be auto-regressive" seems to be without basis to me.
> there is strong evidence that not all thinking is linguistic or sequential.
It is possible that a system wrapping a core auto-regressive reasoner can produce non-sequential thinking - even if you don't allow for weight updates.
I completely agree. I never said that "intelligence cannot be auto-regressive", I just questioned whether this can be achieved or not this way. And I don't actually have answers, I just wrote down some thoughts so it would sparkle some interesting discussions about that, and I'm glad it did work (a little) in the end.
I also mentioned that I'm supportive of architectures that will integrate autoregressive components. Totally agree with that.
I guess semantics matter. Language is primarily hierarchical, but its presentation is what's linear. And LLMs mainly learn and work from this presentation; the question is, and one of the main points, whether emerging patterns is enough evidence to show that there's hierarchical thinking.
> The context window can be compared to working memory in humans: it’s fast, efficient but gets rapidly overloaded. Humans manage this limitation by offloading previously learned information into other memory forms, whereas LLMs can only mimic this process superficially at best.
This is just silly. Humans forget things all the time! If I want to remember something I write it down.
> The nature of hallucination is very different between AR models and humans, as one has a world model and the other doesn’t.
I stopped reading at this point. There's not much signal here, just basic facts about LLMs and then leaps to very bold statements.
Here is an interesting experiment I use to help people understand next token prediction. Think of a simple math problem in your head, maybe 3 digit by 2 digit multiplication. Then speak out every single thought you have while solving it.
> There's not much signal here, just basic facts about LLMs and then leaps to very bold statements.
The article wasn't supposed to be informative for people who already know how LLMs work. Like the title said, just wanted to write down some thoughts.
> This is just silly. Humans forget things all the time! If I want to remember something I write it down.
The opposite was never stated. Human memory is of course selective.
> Here is an interesting experiment I use to help people understand next token prediction. Think of a simple math problem in your head, maybe 3 digit by 2 digit multiplication. Then speak out every single thought you have while solving it.
Now a point I'm happy to discuss! The process of solving it is actually quite autoregressive-like, but this is also an example of a common pitfall with LLMs: they purely rely on pattern matching because they don't have the internal representation of what they really deal with (algebra). But we all know that.
The main question is whether LLMs taught to reason actually show that they have this kind of representation. They still work very differently I'd say; even for tasks that seem trivial to humans, reasoning LLMs will make a lot of mistakes before arriving at a plausible-sounding result. Because it was trained to reason, there's a higher chance now that the plausible-sounding result is actually correct. But this property is actually quite interesting once applied to complex tasks that would take too much time and overwhelming for humans, and that's where they shine as powerful tools.
> even for tasks that seem trivial to humans, reasoning LLMs will make a lot of mistakes before arriving at a plausible-sounding result.
Like a lot of my coworkers analyzing a production bug? I would agree if the statement were that LLMs were underpowered compared to a human brain today but I'm not seeing evidence that humans do reasoning in a way that can't be correctly modeled.
From your article and comments, it sounds like the take is something like "humans don't actually reason autoregressively" which could be true, I don't know enough to know, but sort of like saying physics models aren't really how nature works: ultimately LLMs are executable models of the world, it's even in the name.
> From your article and comments, it sounds like the take is something like "humans don't actually reason autoregressively" which could be true, I don't know enough to know, but sort of like saying physics models aren't really how nature works: ultimately LLMs are executable models of the world, it's even in the name.
The conclusion states "Language and thought are not purely autoregressive in humans".
Which doesn't mean humans don't have autoregressive components in their thinking. At least that's my opinion. I don't make this bold statement and I don't know enough to know, too.
> Like a lot of my coworkers analyzing a production bug? I would agree if the statement were that LLMs were underpowered compared to a human brain today
Clearly not in the same way and that was what I was trying to explain with regards to the hallucination issue too. Humans are also learning from proofs, can apply frameworks, etc. there's no denying that. But the internal process of an LLM remains pattern matching and sequential prediction whereas there's more to the human's thinking process.
LLMs are underpowered in some aspects that can't be replicated with autoregressive modeling, but are already stronger in other aspects. That is what I think.
> but I'm not seeing evidence that humans do reasoning in a way that can't be correctly modeled.
Me neither, this is not what my stance is, and I'm actually optimistic about it. I just don't think we should be satisfied with only autoregressive modeling if the ambition is to reach or comprehend human-level intelligence.
You seem to continue to make a lot of claims without basis.
> they purely rely on pattern matching
Yes.
> because they don't have the internal representation of what they really deal with (algebra)
Wait what? No. You can't claim this yet as it's an open question. It may well be the case they do have an internal representation of algebra, and even a world model for that matter, if flawed.
I think you need to be more aware of the current research in LLM interpretability. The answers to these questions are hardly as definitive as you make it seem.
I do actually read a lot about LLM interpretability, and this is my own conclusion (should've phrased it like "they don't seem to have"). I do actually consider this an open question, so I'm a bit confused as to why you think this way - perhaps due to my phrasing (I just had a very long flight), but know that is not the case and I always doubt things. In fact I said after the text you quoted that the question is actually quite open (mentioning reasoning models, but to be honest, it's not exclusive to them, just more apparent in some ways).
I might also clarify (here and probably in my article when I have the time to do so). LLMs "do" build internal models in the sense that, at the same time:
- They organize knowledge by domain in a unified network
- They're capable of generalization (already mentioned and acknowledged at the very beginning of the article)
However these models, while they share parallels with human cognition, lack substance and can't replicate (yet) the deep integrated cognitive model of humans. That is where current interpretability research is at, and probably SOTA LLMs too. My own opinion and speculation is that autoregressive models will never get to a satisfying approximation level of the human-level cognition since humans' thinking process seems to be more than autoregressive components, aligning with current psychology. But that doesn't mean architectures won't evolve.
Do not misunderstand that because I said they're pattern matching machines, that they will be unable to properly "think". In fact, the line between pattern matching and thinking is actually quite blurry.
> You can say LLMs are fundamentally dumb because of their inherent linearity. Are they? Isn’t language by itself linear (more precisely, the presentation of it)?
Any linearity (or at least partial ordering) of intelligence comes from time and causality, not language - in fact the linearity of language is a limitation human cognition struggles to fight against.
I think this is where "chimpanzees are intelligent" comes to the rescue - AI has a nasty habit of focusing too much on humans. It is vacuous to think that chimpanzee intelligence can be reduced to a linear sequence of oohs-and-aahs, although I suspect a transformer trained on thousands of hours of chimp vocalizations could keep a real chimp busy for a long time. Ape cognition is much deeper and more mysterious: imperfect "axioms" and "algorithms" about space, time, numbers, object-ness, identifying other intelligences, etc, seem to be somehow built-in, and all apes seem to share deep cognitive tools like self-reflection, estimating the cognitive complexity of a task, robust quantitative reasoning, and so on. Nor does it really make sense to hand-wave about "evolutionary training data" - there are stark micro- and macro-architectural differences between primate brains and squirrel brains. Not to mention that all species have the exact same amount of data - if it was just about millions of years, why are bees and octopi uniquely intelligent among invertebrates? Why aren't there any chimpanzee-level squirrels? Rather than twisting into knots about "high quality evolutionary data," it makes a lot more sense to point towards evolution pressuring the development of different brain architectures with stronger cognitive abilities. (Especially considering how rapidly modern human intelligence seems to evolved - much more easily explained by sudden favorable mutations vs stumbling into an East African data treasure trove.)
Human intelligence uses these "algorithms" + the more modern tool of language to reason about the world. I believe any AI system which starts with language and sensory input[1], then hopes to get causality/etc via Big Data is doomed to failure: it might be an exceptionally useful text generator/processor but there will be infinite families of text-based problems that toddlers can solve but the AI cannot.
[1] I also think sight-without-touch is doomed to failure, especially with video generation, but that's a different discussion. And AIs can somewhat cheat "touch" if they train extensively on a good video game engine (I see RDR2 is used a lot).
People are going to keep saying this about autoregressive models, how small errors accumulate and can't be corrected, while we literally watch reasoning models say things like "oh that's not right, let me try a different approach".
To me, this is like people saying "well NAND gates clearly can't sort things so I don't see how a computer could".
Large transformers can clearly learn very complex behavior, and the limits of that are not obvious from their low level building blocks or training paradigms.
reply