Why do people keep taking OpenAIs marketing spin at face value? This keeps happening, like when they neglected to mention that their most impressive Sora demo involved extensive manual editing/cleanup work because the studio couldn't get Sora to generate what they wanted.
It might be because (very few!) mathematicians like Terence Tao make positive remarks. I think these mathematicians should be very careful to use reproducible and controlled setups that by their nature cannot take place on GPUs in the Azure cloud.
I have nothing against scientists promoting the Coq Proof Assistant. But that's open source, can be run at home and is fully reproducible.
Keep in mind those mathematicians were kept in the dark about the funding: it is incredibly unethical to invite a coauthor to your paper and not tell where the money came from.
It's just incredibly scummy behavior: I imagine some of those mathematicians would have declined the collaboration if the funding were transparent. More so than data contamination, this makes me deeply mistrustful of Epoch AI.
I can't parse any of this, can you explain to a noob? I get lost immediately: funding, coauthor, etc. Only interpretation I've come to is I've missed a scandal involving payola, Terence Tao, and keeping coauthors off papers
Wait, I think I somehow knew Epoch AI was getting money from OpenAI. I'm not sure how, and I didn't connect any of the facts together to think of this problem in advance.
Because they are completely gullible and believe almost everything that OpenAI does without questioning the results.
On each product they release, their top researchers are gradually leaving.
Everyone now knows what happens when you go against or question OpenAI after working for them, which is why you don't see any criticism and more of a cult-like worship.
Because the models have continually matched the quality they claim.
Ex. look how much work "very few" has to do in the sibling comment. It's like saying "very few physicists [Einstein/Feynman/Witten]"
Its conveniently impossible to falsify the implication that the inverse of "very few" say not positive things. i.e. that the vast majority say negative things
You have to go through an incredible level of mental gymnastics, involving many months of gated decisions, where the route chosen involved "gee, I know this is suspectable to confirmation bias, but...", to end up wondering why people think the models are real if OpenAI has access to data that includes some set of questions.
> Because the models have continually matched the quality they claim.
That's very far from true.
"Yes, I know that the HuggingFace arena and coding assistant leaderboards both say that OpenAI's new model is really good, but in practice you should use Claude Sonnet instead" was a meme for good reason, as was "I know the benchmarks show that 4o is just as capable as ChatGPT4 but based on our internal evals it seems much worse". The latter to the extent that they had to use dark UI patterns to hide ChatGPT-4 from their users, because they kept using it, and it cost OpenAI much more than 4o.
OpenAI regularly messes with benchmarks to keep the investor money flowing. Slightly varying the wording of benchmark problems causes a 30% drop in o1 accuracy. That doesn't mean "LLMs don't work" but it does mean that you have to be very sceptical of OpenAI benchmark results when comparing them to other AI labs, and this has been the case for a long time.
The FrontierMath case just shows that they are willing to go much farther with their dishonesty than most people thought.
https://news.ycombinator.com/item?id=40359425