The problem is there is no way to infer the right answer to 0! given the trainin...

advael · 2024-06-18T18:54:38.000000Z

I think there's variance in people's degree of compositionality, as well as how quickly they can pick up on novel relationships. Testing "intelligence" in humans has always been kind of fraught in the first place, but any capability we may care to measure is going to permit degrees, and there will be some variance in humans on it. We should expect this. There's variance in goddam everything

We should also expect machine learning systems to have somewhat different properties from human minds. Like computers are more likely to accomplish perfect recall, and we can scale the size of their memory and their processing speed. All these confounding variables can make it hard to make binary tests of a capability, which is really what ARC seems like it's trying to do. One such capability that AI researchers will often talk about is conceptual compositionality. People care about compositionality because it's a good way to demonstrate that an abstract model is being used to reason about a situation, which can be used in unseen but perhaps conceptually similar situations. This "generalization" or "abstraction" capability is really the goal, but it's hard to reason about how to test it, and "composition" (That is, taking a situation that's novel, but a straightforward application of two or more different abstractions the agent should already "know") is one more testable way to try to tease it out.

As you point out, humans often fail this kind of test, and we can rightly claim that in those cases, they didn't correctly grasp the insight we were hoping they had. Testing distilled abstractions versus memorization or superficial pattern recognition isn't just important to AI research, it's also a key problem in lots of places in human education