> while we literally watch reasoning models say things like "oh that's not right, let me try a different approach".
Not saying I disagree with your premise that errors can’t be corrected by using more and more tokens, but this argument is weird to me.
The model isn’t intentionally generating text. The kinds of “oh let me try a different approach” lines I see are often followed by the same approach just taken. I wouldn’t say most of the time, but often enough that I notice.
Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.
> Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.
What does it represent then? What are all these billion weights for? It's not a bag full of NULLs that just pulls next words from a look-up table. Obviously there is some kind of internal process.
Also I don't get why people ignore the temporal aspect. Humans too generate thoughts in sequence, and can't arbitrarily mutate what came before. Time and memory is what forces sequential order - we too just keep piling on more thoughts to correct previous thoughts while they are still in working memory (context).
The text represents a prediction of how a human may respond, one word(ish) at a time, that's it.
With "reasoning" models, the reasoning layer is basically another LLM instructed to specifically predict how a human may respond to the underlying LLM's answer, fake prompt engineering if you will.
There of course is some kind of internal process, but we can't prove any kind of reasoning. We ask a question, the main LLM responds, and we see how the reasoning layer LLM itself responds to that.
Please don't confuse people with wrong information, the reasoning part in reasoning models is the exact same LLM that produces the final answer. For example o1 uses special "thinking" tokens to demarcate between reasoning and answer sections of it's output.
Sure, that's a great clarification though maybr a bit of an implementation detail in this context.
Functionally my argument stands in this context - just because we can see one stream of LLM responses responding to the primary response stream says nothing of reasoning or what is going on internally in the reasoning layer.
> what is going on internally in the reasoning layer.
We literally know exactly what is going on with every layer.
It’s well defined. There are mathematical proofs for everything.
Moreover it’s all machine instructions which can be observed.
The emergent properties we see in LLMs are surprising and impressive, but not magic. Internally what is happening is a bunch of matrix multiplications.
There’s no internal thought or process or anything like that.
It’s all “just” math.
To assume anything else is personification bias.
To look at LLMs outputting text and a human writing text and think “oh these two things must be working in the same way” is just… not a very critical line of thought.
> We literally know exactly what is going on with every layer.
Unless I missed a huge break in the observability problem, this isn't correct.
We know exactly how every layer is designed and we know how we functionally expect that to work. We don't know what actually happens in the model at time of inference.
I.e. we know what pieces were used to build the thing but when we actually use it its a black box - we only know inputs and outputs.
This paper [1] may be an interesting place to start.
We only know how the structures are designed to work, and we have hypothesise of how they likely work. We can't interpret what actually happens when the LLM is actually going through the process of generating a response.
That seems pedantic or unimportant on the surface, but there are some really important implications. At the more benign level, we don't know why a model gave a bad response when a person wasn't happy with the output. On the more important end, any concerns related to the risk of these models becoming self-directed or malicious simply can't be recognized or guarded against. We won't know if a model becomes self-directed until after it acts on it in ways that don't match how we already expect them to work.
Both alignment and interoperability were important research topics for decades of AI research. We effectively abandoned those topics once we made real technological advancement - once an AI-like tool was no longer entirely theoretical we couldn't be bothered focusing resources on figuring out how to do it safely. The horse was already out of the barn.
Does this mean they will turn evil or end up going poorly for us? Absolutely not. It just means that we have to cross our fingers and hope because we can't detect issues early.
> We can't interpret what actually happens when the LLM is actually going through the process of generating a response.
There are 2 things we’re talking about here.
There’s the physical, mechanical operations going on during inference and there’s potentially a higher order process happening as an emergent property of those mechanical operations.
We know precisely the mechanical operations that take place during inference as they are machine instructions which are both man-made and very well understood. I hope we can agree here.
Then there’s potentially a higher order process. The existence of that process and what that process is still a mystery.
We do not know how the human brain works, physically. We can’t inspect discrete units of brain operations as we can with machine instructions.
For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.
Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.
I was never actually talking about the physical mechanisms. Sure we can agree that GPUs, logical gates, etc physically work in a certain way. That just isn't important here at all.
> For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.
I wasn't intending to raise concerns over emergent consciousness or similar. Whether thought goes on is a bit less clear depending on how you define thought, but that still wasn't the point I was making.
We have effectively abandoned the alignment problem and the interoperability problem. Sure we know how GPUs work, and we don't need to assume that consciousness emerged, but we don't know why the model gives a certain answer. We're empowering these models with more and more authority, not only are they given access to the public internet but now we're making agents that are starting to interact with the world on our behalf. Models are given plenty of resources and access to do very dangerous things if they tried to, and my point is we don't have any idea what goes on other than input/output pairs. There's a lot of risk there.
> Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.
Comparing the two is precisely what we're meant to do. If the comparison wasn't intended they wouldn't be called "artificial intelligence". That isn't pedantic, if the term isn't meant to imply the comparison then they were either accidentally or intentionally named horribly.
> I wasn't intending to raise concerns over emergent consciousness or similar
Oh jeez, then we may have just been talking past each other. I thought that’s what you were arguing for.
> That just isn't important here at all.
It is, though. The fact that the underlying processes are well understood means that, if we so wished, we could work backwards and understand what the model is doing.
I recall some papers on this, but can’t seem to find them right now. One suggested that groups of weights relate to specific kinds of high level info (like people) which I thought was neat.
> the comparison wasn't intended they wouldn't be called "artificial intelligence"
Remember “smart” appliances? Were we meant to compare an internet connected washing machine to smart people? Names are all made up.
I do actually think AI is a horrible name as it invites these kinds of comparisons and obfuscates more useful questions.
Machine Learning is a better name, imo, but I’m not a fan of personifying machines in science.
Haha, well its funny sometimes when you realize too late there were two different conversations happening.
I definitely agree on the term machine learning - it seems a much better fit but still doesn't feel quite right. Naming things is hard, but AI seems particularly egregious here.
> The fact that the underlying processes are well understood means that, if we so wished, we could work backwards and understand what the model is doing.
I'm not sure we can take that leap. We understand pretty well how a neuron functions but we understand very little about how the brain works or how it relates to what we experience. We understand how light is initially recognized in the eye with cones and rods, but we don't really know exactly how it goes from there to what we experience as vision.
In complex systems its often easy to understand the function of a small, more fundamental but of the system. Its much harder to understand the full system, and if you do you should be able to predict it. For LLMs, that would mean they could predict a model's output for a given input (even if that prediction has to account to randomness added into the inference algorithm).
OpenAI may be able to do more in the long term because they don't show the <think> and can spend more of that scratch space on improving answers vs appeasing users, but time will show.
Remember that probabilistic checkable proofs show how random data can improve computation.
The AI field has always had a problem with wishful mnomics.
But it is probably not a binary choice, if we could get the scratch space to reliably simulate Dykstra' shunting and convert to postfix as an example, that would be great.
You don’t know this. I don’t feel like I generate thoughts in sequence, for me it feels hierarchical.
> can't arbitrarily mutate what came before
Uhh… what?
Do you remember your memories as a child?
Or what you ate for breakfast 3 weeks ago?
Have you ever misremembered an event or half remembered a solution to a problem?
The information in human minds are entirely mutable. They are not like computers…
> It's not a bag full of NULLs that just pulls next words from a look-up table.
Funny enough, the attention mechanism that’s popular right now is effectively lots and lots of stacked look up tables. That’s how it’s taught as well (what with the Q K and V)
Tho I don’t think that’s a requirement for LLMs in general.
I find a lot of people who half understand cognition and understand computing look at LLMs and work backwards to convince themselves that it’s “thinking” or doing more cognitive functions like we humans do. It’s personification bias.
> Do you remember your memories as a child? Or what you ate for breakfast 3 weeks ago?
For me, this seems like conjuring up and thinking about a childhood event is like putting what came out of my nebulous 'memory' fresh into context at the point in time you are thinking about it, along with whatever thoughts I had about it (how embarrassed I was, how I felt proud because of X, etc). As that context fades into the past, some of those thoughts may get mixed back into that region of my 'memory' associated with that event.
> What's the mechanistic model of "intention" that you're using to claim that there is no intention in the model's operation?
You can’t prove intention, but I can show examples of LLMs lacking intent (as when repeating the same solution even after being told it was incorrect)
> Generating text is the trace of an internal process in an LLM.
Not really sure precisely what you mean by trace, but the output from an LLM (as with any statistical model) is the result of the calculations, not a representation of some emergent internal state.
Not saying I disagree with your premise that errors can’t be corrected by using more and more tokens, but this argument is weird to me.
The model isn’t intentionally generating text. The kinds of “oh let me try a different approach” lines I see are often followed by the same approach just taken. I wouldn’t say most of the time, but often enough that I notice.
Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.