Hacker News new | past | comments | ask | show | jobs | submit login

I think an entity might be unable to mean things (or some things) but still able to understand what they mean. Kinda-trivial example: I cannot say and mean "I have a pain in the sixth finger on my right hand" because like most people I have only five fingers on my right hand, but I can understand reasonably well what it means. Using indexicals like this is a bit of a cheat, but you did the same in your very first example so I don't feel too bad about it. It's not so clear whether this can happen without them, but arguably it could; e.g., you might take some view along the lines of "statements are only meaningful when they occur within some sort of community" and then an entity isolated from such communities couldn't say-and-mean things, but might still understand what things mean.

Anyway: you say (if I understand you right) that if those concepts diverge the only one you actually care about is agents meaning things. I think that's a mistake, because a lot of the questions we have reason to care about with today's AIs are not about that. Will AIs be able to do all our jobs better and cheaper than we can? That's about their external behaviour and how it relates to the world. Will AIs gain vast power and use it in ways that are very bad for us? Ditto. Will AIs enable new technological innovations that make us all much better off? Ditto. No one will be saying as the killbots destroy their cities "well, this isn't so bad; at least the machines don't really know what it is they're doing". No one will be saying as they enjoy the fruits of Fully Automated Luxury Gay Space Communism "you know, this whole thing feels empty because the machines that make all this possible don't really understand, they just behave as if they do".

If a melon+string+ball arrangement is a faithful enough model of the solar system and it somehow enables me to send spaceships to Uranus, or to discover that orbits are elliptical when I hadn't known it before, or something, then that's a thing of great value.

Your comment about children imitating adults sounds as if you haven't actually taken in the conditions I proposed, because children imitating what their parents say cannot in fact have the property I called "richness". If I talk to a child and they have learned to say some things about stars by listening to their parents, it will not help them when I ask them about something they haven't heard their parents say.

(One can imagine a situation where the child "passes" this test by just relaying everything I say to the parent and then imitating what they say back to me. But the point there isn't that the child is imitating, it's that the child is not really part of the conversation at all, I'm just talking to the parent. And it is clear that nothing like that is happening with AI systems.)

You may imagine that you make your argument more convincing by finishing it up with "anyone to whom this is not obvious is obviously religiously obsessed with some converse belief", but to me at least the opposite is the case.




Well we need to get into what exactly gives rise for the ability to mean something -- it isnt as simple as being able to state something truthfully -- which is why I mentioned the imagination. It is more common that people have the ability to mean X in virtue of being able to imagine what it would be to say X and mean it.

ie., the semantics of natural language are grounded in possibilities, and apprehending possibilities is the function of the imagination. I was trying to simplify matters enough to make it clear that if an LLM says, "I have a pen in my hand" it isn't even lying.

I agree with you that the right test for proper language acquisition is modal: how would the system respond in situation S1..Sn. However the present mania for computational statistics has reduced this question to 'what is a y for a given x' as-if the relevant counterfactual was a permutation to the input to a pregiven function. The relevant counterfactuals are changes to the non-lignustic environments that language serves to describe.

How is it that the parents continue to obtain this 'richness' and 'robustness' (ie., performance across changing environments) ? It is by themselves having the capacity to acquire and use meanings in relevant environments. This is something the children lack, and so do LLMs.

For the children to imitate the parents, and the LLM to function as the community of speakers -- those speakers must narrate at length in a manner which can be imitated. If a parent looks at the sky and sees a roketship they can be asked "did you see that spaceship?!" -- but the very young chidlren cannot. They do not know what those words mean, and werent looking at the sky, their whole attention is on trying to immitate the sounds they hear.

Likewise an LLM is limited in modelling non-lingustic shifts via waiting on enough new text being written on these shifts to be retrained on -- there is much reason to expect that no where near enough is written on almost all changes to our environment to enable this. The parents arent going to repeat, "there is a rocket ship in the sky" over-and-over just so the children can hear it. The parents dont need to: they can see. They do not need langauge to be responsive to lingusitic interrogation.

The route LLMs use to obtain their performance is constructing a distribution over historical linguistic records of non-linguistic change, and sampling from this distribution. The mechanism we call 'intelligence' that employs meaning acquires such shifts by being-in-the-world to notice, engage, imagine, interrogate, create, etc. them.

This is where I am making the strong empirical claim: sampling from a distribution over historical language use is 'not enough'. It's fragile and shallow -- though its shallowness is masked by the false (Turing-esq masquerade) that we have to interact with the system thru a single bamboozling I/O boundary: a prompt.

Via such an extreme narrowing of how this supposed linguistic agent is free to employ meaning, its engineers can rig the situation so that its fragility isn't as apparent. But in the end, it will be so.

The test for whether a system is using the meanings of words is indeed, modal: change the non-ligusitic environment (ie., the meanings) and the lanuage ought change. For LLMs this does not happen: they are very very very indirectly responsive to such shifts... because their mechanism of recording them is imitation.


Ah, we're getting into more concrete considerations now. I think that's better.

(I mean, if you choose to use "mean" in a way that implies certain details in the causal history that I think aren't needed, and choose not to care about what AI systems do but only about that causal history, then all that's something you have the right to choose, and there's limited value in arguing about it. But if you have concrete predictions for what AI systems will be able to do, and what would need to change for them to be able to do more, then to me that's more interesting and more worth arguing about.)

So, I think we're agreed now that the kind of question that really matters, for determining how much understanding some entity has of the words it's taking in and spitting out, is: how does its linguistic behaviour depend on the actual world, and how would it vary if the world were different, and how will it vary as the world changes?

And I think we're agreed that today's LLMs learn about the world largely through their training process, which means that they have rather limited capacity once trained to adapt to the world, which puts real limits on what they can do.

(I think you would also say that it means they don't really understand anything, because they can't adapt in real time as those things change, but again I think that goes too far, firstly because there are plenty of things we consider ourselves to understand even though they lie far in the past and aren't going to change, and secondly because LLMs _do_ have some ability to learn in the short term: if you say "All snorfles are porfles and my dog is a snorfle" and ask it a few sentences later whether you have a porfle, it will probably be able to say yes and explain why.)

I am curious whether you think that, say, Helen Keller was dramatically less able to mean and understand things than most people, on account of being both deaf and blind and therefore dramatically less able to get new information about the world in real time other than "textually" (via Braille and the like). I think the available evidence strongly suggests that Keller was in fact able to understand the world and to mean things in pretty much the exact same way as the rest of us, which in turn strongly suggests that being connected to the physical world only through language isn't necessarily an obstacle to meaning and understanding things.

(Keller did have other links to the physical world; for instance, her tactile sense was perfectly OK. This is actually how she first managed to grasp the idea that the words Anne Sullivan was tracing out on her hands meant something. However, it doesn't seem credible to me that this rather "narrow" channel of information-flow was responsible for Keller's understanding of most of the things she understood.)

Suppose someone builds something like today's multimodal LLMs, but with constant real-time video input, and suppose it's trained on video as well as text. (It's not obvious to me how all that would work technically, but it seems to me that there are things worth trying, and I bet there are people trying them at OpenAI, Google, etc.) Would your objections then disappear?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: