"couldnt even" is a high bar. it is an unsolved problem to make small models (truly, no bullshit) perform at human intelligence level than to make large models do the same. the bar JG had to pass was a bit lower than that, but Apple's marketing team unfortunately overpromised.