Hacker News new | past | comments | ask | show | jobs | submit login
Reflecting on o3 "beating ARC": are we reliving the ImageNet 2012 moment again?
7 points by artificialprint 7 days ago | hide | past | favorite | 3 comments
AlexNet came and blown everything out of the water. Then you can reflect how much [a lot] progress there has been since 2012 till now just on this little dataset.

o3 beating ARC is such a harder dataset, I don't even want to compare them. So how much progress there will be from just this?

Next 10 years gonna be bonkers.






For others who don't know what the "ImageNet 2012 moment" references:

"The aforementioned major breakthrough, the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), was a defining moment for the use of deep neural nets for image recognition. A convolutional neural network (CNN) designed by Alex Krizhevsky, published with Ilya Sutskever and Krizhevsky’s PhD advisor Hinton, halved the existing error rate on Imagenet visual recognition to 15.3 percent. The CNN was dubbed “AlexNet.”

From: https://medium.com/neuralmagic/2012-a-breakthrough-year-for-...


Can anyone explain what ARC actually is? All I see is some colored rows. It has to find the pattern of the colors?

Most benchmarks contain a bunch of examples of a particular task - e.g., each example in an image classification benchmark is an image and its associated class. The approach for doing well on these types of benchmarks has historically been (1) train a large model with (2) lots of data. However, each item in the ARC benchmark is totally unique task. The network is presented a handful of examples (questions and answers) of the unique task and is asked to complete one instance of the task. Importantly, the tasks are a secret. The only way that models can “prepare” for ARC is by getting familiar with the public priors of the ARC tasks - e.g., the colored grid world. As a result, ARC evaluates the ability of models to learn new tasks with limited data at test time. This is a thing humans do very well that models do not (at least up until now).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: