This is so impressive that it brings out the pessimist in me.
Hopefully my skepticism will end up being unwarranted, but how confident are we that the queries are not routed to human workers behind the API? This sounds crazy but is plausible for the fake-it-till-you-make-it crowd.
Also given the prohibitive compute costs per task, typical users won't be using this model, so the scheme could go on for quite sometime before the public knows the truth.
They could also come out in a month and say o3 was so smart it'd endanger the civilization, so we deleted the code and saved humanity!
That would be a ton of problems for a small team of PhD/Grad level experts to solve (for GPQA Diamond, etc) in a short time. Remember, on EpochAl Frontier Math, these problems require hours to days worth of reasoning by humans
The author also suggested this is a new architecture that uses existing methods, like a Monte Carlo tree search that deepmind is investigating (they use this method for AlphaZero)
I don't see the point of colluding for this sort of fraud, as these methods like tree search and pruning already exist. And other labs could genuinely produce these results
In the medium term the plan could be to achieve AGI, and then AGI would figure out how to actually write o3. (Probably after AGI figures out the business model though:
https://www.reddit.com/r/MachineLearning/s/OV4S2hGgW8)
Hopefully my skepticism will end up being unwarranted, but how confident are we that the queries are not routed to human workers behind the API? This sounds crazy but is plausible for the fake-it-till-you-make-it crowd.
Also given the prohibitive compute costs per task, typical users won't be using this model, so the scheme could go on for quite sometime before the public knows the truth.
They could also come out in a month and say o3 was so smart it'd endanger the civilization, so we deleted the code and saved humanity!