> AI already blows most people out of an IQ test at a fraction of the computational power of a brain
AFAIK, IQ tests used in psychological evaluations do not contain any randomness so exact answers are almost always in distribution. I haven't seen someone compare AI to an IQ test that is not in distribution.
On ARC-AGI, which is mildly similar to a randomly generated IQ test, humans still are much better than LLMs. https://arcprize.org/ (scroll down for chart)
Sorry, you're right that the chart on the home page does not have human performance. The leaderboard chart does: https://arcprize.org/leaderboard. And the leaderboard by default shows scores for ARC-AGI 1 and 2. The models are much worse at 2 than 1; the best performing model scores around 15% (Grok 4, thinking), while humans are at ~100%.
Thanks, and do we know if the humans are average people off the street, or unusually-intelligent people?
EDIT: OK, I see there are 3 types of humans:
"Avg. Mturker" does worst. "Stem Grad" and "Human Panel" are basically equivalent in terms of quality but differ in cost.
It's not obvious to me whether an average Mturker would be more or less clever than the average person. Mturk doesn't pay very well, so you'd think you'd have to be below average to want to do it. But potentially it attracts people of above-average intelligence who just happen to live in the third world?
AFAIK, IQ tests used in psychological evaluations do not contain any randomness so exact answers are almost always in distribution. I haven't seen someone compare AI to an IQ test that is not in distribution.
On ARC-AGI, which is mildly similar to a randomly generated IQ test, humans still are much better than LLMs. https://arcprize.org/ (scroll down for chart)