It seems that OpenAI have got the PR machine working amazingly. The Cursor CEO said it's the best, as did Simon Willison (https://simonwillison.net/2025/Aug/7/gpt-5/).
But I've found it terrible. For coding (in Cursor), it's slow, fails with tool calls often (no MCP just stock Cursor tools) and stored some new application state in globalThis - something that no model has ever attempted to do in over a year of very heavy Cursor / Claude Code use).
For a summarization/insights API that I work on, it was way worse than gpt-4.1-mini. I tried both mini and full gpt5, with different reasoning settings. It didn't follow instructions, and output was worse across all my evals, even after heavy prompt adjustment. I did a lot of sampling and the results were objectively bad.
Am I the only one? Has anyone seen actual real-world benefits of GPT-5 vs other models?
Claude Code certainly not as easy to engineer with, though it is less expensive. For instance the @feature isn’t as robust as cursors ime. Also no shift+enter is quite a pain. Linting doesn’t “just work”, cursor with Claude 4.0 max is really thorough, I think even better than GPT-5. Not that Sonnet is better but that whatever “ensemble” of models cursor uses with sonnet seems to both adhere and tool call better than with GPT-5. GPT-5 often says what it will do and then says “say go and I’ll go” or says “you should run command x”, but doesn’t just DO it. Also for bug fixes in difficult codebases nothing beats Gemini 2.5 pro