I think an entity might be unable to mean things (or *some* things) but still ab...

mjburgess · 2025-01-12T14:49:15 1736693355

Well we need to get into what exactly gives rise for the ability to mean something -- it isnt as simple as being able to state something truthfully -- which is why I mentioned the imagination. It is more common that people have the ability to mean X in virtue of being able to imagine what it would be to say X and mean it.

ie., the semantics of natural language are grounded in possibilities, and apprehending possibilities is the function of the imagination. I was trying to simplify matters enough to make it clear that if an LLM says, "I have a pen in my hand" it isn't even lying.

I agree with you that the right test for proper language acquisition is modal: how would the system respond in situation S1..Sn. However the present mania for computational statistics has reduced this question to 'what is a y for a given x' as-if the relevant counterfactual was a permutation to the input to a pregiven function. The relevant counterfactuals are changes to the non-lignustic environments that language serves to describe.

How is it that the parents continue to obtain this 'richness' and 'robustness' (ie., performance across changing environments) ? It is by themselves having the capacity to acquire and use meanings in relevant environments. This is something the children lack, and so do LLMs.

For the children to imitate the parents, and the LLM to function as the community of speakers -- those speakers must narrate at length in a manner which can be imitated. If a parent looks at the sky and sees a roketship they can be asked "did you see that spaceship?!" -- but the very young chidlren cannot. They do not know what those words mean, and werent looking at the sky, their whole attention is on trying to immitate the sounds they hear.

Likewise an LLM is limited in modelling non-lingustic shifts via waiting on enough new text being written on these shifts to be retrained on -- there is much reason to expect that no where near enough is written on almost all changes to our environment to enable this. The parents arent going to repeat, "there is a rocket ship in the sky" over-and-over just so the children can hear it. The parents dont need to: they can see. They do not need langauge to be responsive to lingusitic interrogation.

The route LLMs use to obtain their performance is constructing a distribution over historical linguistic records of non-linguistic change, and sampling from this distribution. The mechanism we call 'intelligence' that employs meaning acquires such shifts by being-in-the-world to notice, engage, imagine, interrogate, create, etc. them.

This is where I am making the strong empirical claim: sampling from a distribution over historical language use is 'not enough'. It's fragile and shallow -- though its shallowness is masked by the false (Turing-esq masquerade) that we have to interact with the system thru a single bamboozling I/O boundary: a prompt.

Via such an extreme narrowing of how this supposed linguistic agent is free to employ meaning, its engineers can rig the situation so that its fragility isn't as apparent. But in the end, it will be so.

The test for whether a system is using the meanings of words is indeed, modal: change the non-ligusitic environment (ie., the meanings) and the lanuage ought change. For LLMs this does not happen: they are very very very indirectly responsive to such shifts... because their mechanism of recording them is imitation.

gjm11 · 2025-01-12T15:41:13 1736696473

Ah, we're getting into more concrete considerations now. I think that's better.

(I mean, if you choose to use "mean" in a way that implies certain details in the causal history that I think aren't needed, and choose not to care about what AI systems do but only about that causal history, then all that's something you have the right to choose, and there's limited value in arguing about it. But if you have concrete predictions for what AI systems will be able to do, and what would need to change for them to be able to do more, then to me that's more interesting and more worth arguing about.)

So, I think we're agreed now that the kind of question that really matters, for determining how much understanding some entity has of the words it's taking in and spitting out, is: how does its linguistic behaviour depend on the actual world, and how would it vary if the world were different, and how will it vary as the world changes?

And I think we're agreed that today's LLMs learn about the world largely through their training process, which means that they have rather limited capacity once trained to adapt to the world, which puts real limits on what they can do.

(I think you would also say that it means they don't really understand anything, because they can't adapt in real time as those things change, but again I think that goes too far, firstly because there are plenty of things we consider ourselves to understand even though they lie far in the past and aren't going to change, and secondly because LLMs _do_ have some ability to learn in the short term: if you say "All snorfles are porfles and my dog is a snorfle" and ask it a few sentences later whether you have a porfle, it will probably be able to say yes and explain why.)

I am curious whether you think that, say, Helen Keller was dramatically less able to mean and understand things than most people, on account of being both deaf and blind and therefore dramatically less able to get new information about the world in real time other than "textually" (via Braille and the like). I think the available evidence strongly suggests that Keller was in fact able to understand the world and to mean things in pretty much the exact same way as the rest of us, which in turn strongly suggests that being connected to the physical world only through language isn't necessarily an obstacle to meaning and understanding things.

(Keller did have other links to the physical world; for instance, her tactile sense was perfectly OK. This is actually how she first managed to grasp the idea that the words Anne Sullivan was tracing out on her hands meant something. However, it doesn't seem credible to me that this rather "narrow" channel of information-flow was responsible for Keller's understanding of most of the things she understood.)

Suppose someone builds something like today's multimodal LLMs, but with constant real-time video input, and suppose it's trained on video as well as text. (It's not obvious to me how all that would work technically, but it seems to me that there are things worth trying, and I bet there are people trying them at OpenAI, Google, etc.) Would your objections then disappear?