My wife sometimes views me as long-form autocomplete, and sometimes as a spell and grammar checker. Hell, my reply to your comment here is indistinguishable from a "long-form autocomplete".
Point being, that autocomplete has to work in some way. Our LLM autocompletes have been getting better and better at zero-shot completion to arbitrary long-form text, including arbitrary simulated conversations with a simulated human, without commensurate increase in complexity or resource utilization. This means they're getting better and better at compressing their training data - but in the limit, what is the difference between compression and understanding? I can't prove it formally, but I rather strongly believe they are, fundamentally, the same thing.
Also: if it walks like a duck, quacks like a duck, swims like a duck, ducks like a duck, and is indistinguishable from a duck on any possible test you can think of or apply to it, then maybe your artificial faux-duck effectively turned into a real duck?
> what is the difference between compression and understanding? I can't prove it formally, but I rather strongly believe they are, fundamentally, the same thing.
I'm not sure this is true in general. I feel as if I understand something when I grasp it in its entirety, not when I've been able to summarize it concisely. And conceptually I can compress something without understanding it by manually implementing compression algorithms and following their instructions by rote.
I think understanding and compression are plausibly related; one test of whether I understand something is whether I can explain it to a layperson. But I don't see how they're equivalent even asymptotically.
> then maybe your artificial faux-duck effectively turned into a real duck?
I can't really get behind this sentiment. If a language model behaves like a duck in every readily observable particular then we can substitute language models for ducks, sure. But that does not imply that a language model is a duck, and whether it even could be a duck remains an interesting and important question. I'm sympathetic to the argument that it doesn't really matter in day-to-day practice, but that shouldn't stop us from raising the question.
> But I don't see how they're equivalent even asymptotically.
You wrote:
> I feel as if I understand something when I grasp it in its entirety, not when I've been able to summarize it concisely.
But what does it mean to "grasp it in its entirety"? To me, it means you learned the patterns that predict the thing and its behavior. That understanding lets you say, "it is ${so-and-so}, because ${reason}", and also "it will do ${specific thing} when ${specific condition} happens, because ${reason}", and have such predictions reliably turn true.
To me, replacing a lot of memorized observations with more general principles - more general understanding - is compression.
A simplified model: you observe pairs of numbers in some specific context. You see (1, 2) and (3, 6), then (9, 18), then (27, 54), and then some more numbers you quickly notice all follow a pattern:
Pair_n = (x, y), where:
- y = 2*x
- x = 3^n
A thousand of such pairs pass you by, before they finally stop. Do you remember them all? It's not a big deal ever since you figured out the pattern - you don't need to remember all the number pairs, you only need to remember the formula above, and that n started at 0 and ended at 999.
This is what I mean by understanding being fundamentally equivalent to compression: each pattern or concept you learn lets you replace memorizing some facts with a smaller formula (program) you can use to re-derive those facts. It's exactly how compression algorithms work.
My wife sometimes views me as long-form autocomplete, and sometimes as a spell and grammar checker. Hell, my reply to your comment here is indistinguishable from a "long-form autocomplete".
Point being, that autocomplete has to work in some way. Our LLM autocompletes have been getting better and better at zero-shot completion to arbitrary long-form text, including arbitrary simulated conversations with a simulated human, without commensurate increase in complexity or resource utilization. This means they're getting better and better at compressing their training data - but in the limit, what is the difference between compression and understanding? I can't prove it formally, but I rather strongly believe they are, fundamentally, the same thing.
Also: if it walks like a duck, quacks like a duck, swims like a duck, ducks like a duck, and is indistinguishable from a duck on any possible test you can think of or apply to it, then maybe your artificial faux-duck effectively turned into a real duck?