“GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o a...

pdabbadabba · 2025-04-14T21:03:15 1744664595

What's wrong with the em-dash? That's just...the typographically correct dash AFAIK.

clbrmbr · 2025-04-15T11:16:41 1744715801

Maybe a reference to the OpenAI models loving to output em-dashes?

drexlspivey · 2025-04-14T20:53:46 1744664026

Should have named it 4.10

clbrmbr · 2025-04-15T11:17:41 1744715861

But it’s so much weaker than 4.5 in broader tasks… maybe more optimized against benchmarks but it’s just no replacement for a huge model.