Well it's like birds and airplanes. Do airplanes "fly" in the same sense that bi...

simonh · on April 7, 2023

I think the issue is there are good reasons to think LLMs architected and trained the way they are now can never approach human reasoning capability. That’s because the corpus of human written material is simply grossly inadequate to communicate or encode the knowledge necessary for that.

Our written material assumes huge swathes of contextual knowledge, real world experience, and human lived experience that LLMs don’t and can’t have. At least architected and trained as they are now.

Thats on top of the crippling inability LLMs have to generalise an ability to perform a task from the ability to generate a description of how to do the task. Plus many other similar limitations that would be inexplicable if displayed by a human.

Of course LLMs aren’t the final word in AI development. I think they’re a vitally important step towards general AI, and we’ll get there eventually as we develop ever more capable architectures.

iliane5 · on April 7, 2023

> LLMs architected and trained the way they are now can never approach human reasoning capability

Not sure if you’ve played with GPT-4 but honestly it’s getting there. If you take the bar exam, ChatGPT was in the bottom 10% of participants, GPT-4 is in the top 90%.

It obviously isn’t the ultimate test of reasoning/intelligence but I think we would agree that a human who’s in the top 90% is likely to be pretty smart.

> Of course LLMs aren’t the final word in AI development

Couldn’t agree more. AGI will come from plugging a few of these systems together.

simonh · on April 7, 2023

GPT4 still suffers from the same limitations I outlined earlier though. For example that being able to explain how to do things is independent of being able to actually do them. That’s a crippling cognitive limitation. This is just not as obvious because for some tasks it’s been trained how to do them through different methods.

Let’s imagine a map of cognitive capabilities. Humans are a big area on that map. Previous AI systems were small dots or lines on that map, some of them like AlphaZero extending outside the human zone. ChatGPT is an archipelago of several decent sized blobs disconnected from each other, and some of those edge out lightly outside the human Zone. It’s better at some specific tasks than humans.

The problem is the sometimes large gaps between some of the blobs. Capacity at some tasks tell you nothing about its ability at what we would think of as closely related tasks for a human. For GPT4 even, these are utterly different tasks and if it can do them both, it can often do them for completely different reasons than a human does.

If you test it at say 10 tasks that all happen to fall within its capabilities, those widely separated blobs of ability, you’d think it was incredibly intelligent at a huge range of tasks, unaware of the gaps. With a human you’d know those areas would be connected. But with GPT they are not. It’s by probing the gaps where it fails that we begin to understand how much and in what ways it fundamentally differs from us.

This map is getting harder for outsiders to probe though, because OpenAI is papering over some gaps with tuned training. This is like adding some new blobs in a different colour. These appear to close some gaps and add new capabilities, but the systems in the model that implement those aren’t related to the features of the model that give it its other abilities.