> *I view it as long-form autocomplete.* My wife sometimes views me as long-form...

mrtranscendence · on May 17, 2023

> what is the difference between compression and understanding? I can't prove it formally, but I rather strongly believe they are, fundamentally, the same thing.

I'm not sure this is true in general. I feel as if I understand something when I grasp it in its entirety, not when I've been able to summarize it concisely. And conceptually I can compress something without understanding it by manually implementing compression algorithms and following their instructions by rote.

I think understanding and compression are plausibly related; one test of whether I understand something is whether I can explain it to a layperson. But I don't see how they're equivalent even asymptotically.

> then maybe your artificial faux-duck effectively turned into a real duck?

I can't really get behind this sentiment. If a language model behaves like a duck in every readily observable particular then we can substitute language models for ducks, sure. But that does not imply that a language model is a duck, and whether it even could be a duck remains an interesting and important question. I'm sympathetic to the argument that it doesn't really matter in day-to-day practice, but that shouldn't stop us from raising the question.

TeMPOraL · on May 17, 2023

> But I don't see how they're equivalent even asymptotically.

You wrote:

> I feel as if I understand something when I grasp it in its entirety, not when I've been able to summarize it concisely.

But what does it mean to "grasp it in its entirety"? To me, it means you learned the patterns that predict the thing and its behavior. That understanding lets you say, "it is ${so-and-so}, because ${reason}", and also "it will do ${specific thing} when ${specific condition} happens, because ${reason}", and have such predictions reliably turn true.

To me, replacing a lot of memorized observations with more general principles - more general understanding - is compression.

A simplified model: you observe pairs of numbers in some specific context. You see (1, 2) and (3, 6), then (9, 18), then (27, 54), and then some more numbers you quickly notice all follow a pattern:

  Pair_n = (x, y), where:
  - y = 2*x
  - x = 3^n

A thousand of such pairs pass you by, before they finally stop. Do you remember them all? It's not a big deal ever since you figured out the pattern - you don't need to remember all the number pairs, you only need to remember the formula above, and that n started at 0 and ended at 999.

This is what I mean by understanding being fundamentally equivalent to compression: each pattern or concept you learn lets you replace memorizing some facts with a smaller formula (program) you can use to re-derive those facts. It's exactly how compression algorithms work.

And yes, in this sense, we are lossy compressors.