Hacker News new | past | comments | ask | show | jobs | submit login

This is so impressive that it brings out the pessimist in me.

Hopefully my skepticism will end up being unwarranted, but how confident are we that the queries are not routed to human workers behind the API? This sounds crazy but is plausible for the fake-it-till-you-make-it crowd.

Also given the prohibitive compute costs per task, typical users won't be using this model, so the scheme could go on for quite sometime before the public knows the truth.

They could also come out in a month and say o3 was so smart it'd endanger the civilization, so we deleted the code and saved humanity!






That would be a ton of problems for a small team of PhD/Grad level experts to solve (for GPQA Diamond, etc) in a short time. Remember, on EpochAl Frontier Math, these problems require hours to days worth of reasoning by humans

The author also suggested this is a new architecture that uses existing methods, like a Monte Carlo tree search that deepmind is investigating (they use this method for AlphaZero)

I don't see the point of colluding for this sort of fraud, as these methods like tree search and pruning already exist. And other labs could genuinely produce these results


I had the ARC AGI in mind when I suggested human workers. I agree the other benchmark results make the use of human workers unlikely.

I'm very confident that queries were not routed to human workers behind the API.

Possibly some other form of "make it seem more impressive than it is," but not that one.


this is an impressive tinfoil take. but what would be their plan in the medium term? like once they release this people can check their data

How can people check their data?

In the medium term the plan could be to achieve AGI, and then AGI would figure out how to actually write o3. (Probably after AGI figures out the business model though: https://www.reddit.com/r/MachineLearning/s/OV4S2hGgW8)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: