Are there any specifics about how this was trained? Especially when 5.1 is only a month old. I'm a little skeptical of benchmarks these days and wish they put this up on llmarena
edit: noticed 5.2 is ranked in the webdev arena (#2 tied with gemini-3.0-pro), but not yet in text arena (last update 22hrs ago)
I’m extremely skeptical because of all those articles claiming OpenAI was freaking out about Gemini - now it turns out they just casually had a better model ready to go? I don’t buy it.
Yeah I've noticed with Claude, around the time of the Opus 4.5 release, at least for a few days, Sonnet 4.5 was just dumb, but it seems temporary. I feel that redirected resources to Opus.
how do you know this is a better model? I wouldn't take any of the numbers at face value especially when all they have done is more/better post-training and thus the base pre-trained model capabilities is still the same. The model may just elicit some of the benchmark capabilities better. You really need to spend time using the model to come to any reliable conclusions.
edit: noticed 5.2 is ranked in the webdev arena (#2 tied with gemini-3.0-pro), but not yet in text arena (last update 22hrs ago)