Can anyone explain what ARC actually is? All I see is some colored rows. It has ...

adamviola · 2024-12-30T18:29:55 1735583395

Most benchmarks contain a bunch of examples of a particular task - e.g., each example in an image classification benchmark is an image and its associated class. The approach for doing well on these types of benchmarks has historically been (1) train a large model with (2) lots of data. However, each item in the ARC benchmark is totally unique task. The network is presented a handful of examples (questions and answers) of the unique task and is asked to complete one instance of the task. Importantly, the tasks are a secret. The only way that models can “prepare” for ARC is by getting familiar with the public priors of the ARC tasks - e.g., the colored grid world. As a result, ARC evaluates the ability of models to learn new tasks with limited data at test time. This is a thing humans do very well that models do not (at least up until now).