It seems like you might need less output tokens for the same quality of response...

		throwaway0123_5 10 days ago \| parent \| context \| favorite \| on: GPT-5 It seems like you might need less output tokens for the same quality of response though. One of their plots shows o3 needing ~14k tokens to get 69% on SWE-bench Verified, but GPT-5 needing only ~4k.