Different languages maybe? I find Sonnet v2 to be lacking in Rust knowledge compared to 4o 11-20, but excelling at Python and JS/TS. O1's strong side seems to be complex or quirky puzzle-like coding problems that can be answered in a short manner, it's meh at everything else, especially considering the price. Which is understandable given its purpose and training, but I have no use for it as that's exactly the sort of problem I wouldn't trust an LLM to solve.
Sonnet v2 in particular seems to be a bit broken with its reasoning (?) feature. The one where it detects it might be hallucinating (what's even the condition?) and reviews the reply, reflecting on it. It can make it stop halfway into the reply and decide it wrote enough, or invent some ridiculous excuse to output a worse answer. Annoying, although it doesn't trigger too often.
Sonnet v2 in particular seems to be a bit broken with its reasoning (?) feature. The one where it detects it might be hallucinating (what's even the condition?) and reviews the reply, reflecting on it. It can make it stop halfway into the reply and decide it wrote enough, or invent some ridiculous excuse to output a worse answer. Annoying, although it doesn't trigger too often.