This is so impressive that it brings out the pessimist in me. Hopefully my skept...

kvn8888 · 2024-12-20T23:58:47 1734739127

That would be a ton of problems for a small team of PhD/Grad level experts to solve (for GPQA Diamond, etc) in a short time. Remember, on EpochAl Frontier Math, these problems require hours to days worth of reasoning by humans

The author also suggested this is a new architecture that uses existing methods, like a Monte Carlo tree search that deepmind is investigating (they use this method for AlphaZero)

I don't see the point of colluding for this sort of fraud, as these methods like tree search and pruning already exist. And other labs could genuinely produce these results

agnosticmantis · 2024-12-21T00:41:38 1734741698

I had the ARC AGI in mind when I suggested human workers. I agree the other benchmark results make the use of human workers unlikely.

aetherson · 2024-12-21T00:08:27 1734739707

I'm very confident that queries were not routed to human workers behind the API.

Possibly some other form of "make it seem more impressive than it is," but not that one.

rsanek · 2024-12-21T00:00:01 1734739201

this is an impressive tinfoil take. but what would be their plan in the medium term? like once they release this people can check their data

agnosticmantis · 2024-12-21T00:26:27 1734740787

How can people check their data?

In the medium term the plan could be to achieve AGI, and then AGI would figure out how to actually write o3. (Probably after AGI figures out the business model though: https://www.reddit.com/r/MachineLearning/s/OV4S2hGgW8)