I think GPT writes code the best. How well will it write in version 5.6? It give...

Topfi · 2026-06-26T19:46:24 1782503184

Purely subjective, but I tend to prefer reading Opus 4.8 output over GPT 5.5 code, even when the latter can have a higher overall ceiling. The former is just a bit more convenient to review.

seviu · 2026-06-26T17:30:54 1782495054

I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.

On the contrary, pi + glm + DeepSeek… bliss.

Fable was a different kind of beast though. Rip.

square_usual · 2026-06-26T19:41:18 1782502878

Every time I use opus these days I go shut up... you are not fable.. Hard to imagine how just three days with it changed how I saw LLM use.

ftkftk · 2026-06-26T21:02:56 1782507776

Same.

baq · 2026-06-26T18:17:23 1782497843

Yeah, Opus/GPT need multiple rounds of reviews from each other to get to clean auto review. Fable was like, it is done and indeed… crickets in bot comments. ‘No issues’ galore.

aaroninsf · 2026-06-26T20:49:36 1782506976

I wonder if this will hold as other models with different biases achieve parity.

arizen · 2026-06-26T17:57:52 1782496672

Ditto on GLM 5.2 + DeepSeek V4 Flash combo.

For most important work (complex, cross-domain inquiries etc.), I still rely on Codex GPT 5.5 though.

whalesalad · 2026-06-26T18:58:29 1782500309

GPT-5.5 has been really hard to beat imho. I've spent $$$ on Opus, Deepseek v4 Pro and recently started to dogfood GLM-5.2 (which is not bad) but I cannot really trust any of them (almost blind) like I can trust GPT-5.5. It gives me tremendous confidence. I cannot say the same for any of the others I mentioned.

baddash · 2026-06-26T20:44:31 1782506671

how much does your setup cost you? just curious

enraged_camel · 2026-06-26T18:08:05 1782497285

>> I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.

I'm working in a 600k+ LoC codebase that has complex domain-specific logic and lots of moving parts. I find that Codex 5.5 is pretty good at surgical fixes, but does not go out of its way to explore and figure out what those surgical fixes might break. So I only use it to work on parts of the system that are pretty isolated from everything else so that risk of regression is small.

HarHarVeryFunny · 2026-06-26T17:57:21 1782496641

I'm suspect on how much of a coding advance it will be.

Seems odd that their announcement has zero coding benchmarks, with the closest related thing being terminal bench.

hereme888 · 2026-06-26T18:31:41 1782498701

Tracking model performance on Artificial Analysis makes me think these models are constantly optimized/tuned in some way or another. GPT 5.5 was scoring in the mid 60's when it was first released, now it's almost 10 points higher.

jdw64 · 2026-06-26T18:05:13 1782497113

Maybe I'll know once I try it? Honestly, for small functions or methods, I don't think there's a huge difference between models. But the larger the code gets, the more noticeable the difference seems to be.

Personally, I think this kind of coding experience varies from person to person

vanuatu · 2026-06-26T18:08:54 1782497334

sadly with all the labs benchmaxxing I feel like you just have to try the model for a while to really evaluate how good it is, especially for each individual use case

MangoCoffee · 2026-06-26T19:46:16 1782503176

>zero coding benchmarks

"What gets measured gets managed"

artursapek · 2026-06-26T18:02:47 1782496967

They claim extreme performance on ExploitBench, which Mythos was touted as being incredible at. https://x.com/OpenAI/status/2070555278576439306

HarHarVeryFunny · 2026-06-26T20:10:38 1782504638

My guess is that it's same base model as 5.5, but with additional post-training to improve and benchmaxx on a few things like that.

If they really thought it was competitive with Mythos/Fable across the board, then why wouldn't they release a broader set of benchmarks, and why price it day 1 at 1/2 the cost of Fable?

andriy_koval · 2026-06-26T18:40:55 1782499255

On graph, they are still slightly bellow Mythos. Maybe enough to not be prohibited by US government?

8bitsout · 2026-06-26T19:06:03 1782500763

Is it possible for you to provide examples? What were you trying to solve? What was your solution and why was GPT's solution superior and faster?

ignoramous · 2026-06-26T19:50:05 1782503405

> ... why was GPT's solution superior and faster?

Not saying that's the case with OP, but I've found folks sometimes just rationalize it so [0] as they're paying top dollar for it (especially, when compared to may be less capable but affordable models).

[0] https://en.wikipedia.org/wiki/Choice-supportive_bias

stagger87 · 2026-06-26T18:06:31 1782497191

> I even referenced multiple code bases on GitHub

Well, GPT referenced every GitHub code base, no wonder it won! :)

pawelduda · 2026-06-26T17:33:27 1782495207

How do you judge what is a good or bad thing to learn from a LLM? So you don't have to unlearn the bad bits later

jdw64 · 2026-06-26T17:35:33 1782495333

When I searched for papers on using LLMs, I found that typically, you can have an LLM generate code and then ask it to find GitHub projects similar to that code. Then you can learn by looking at the pull requests and seeing how they structure things In the old days, if I wanted to understand why memory offsets, padding techniques, or data layout structures were written a certain way, I had to stare at a senior programmer's code all day or wait for them to reply. But LLMs, while they do flatter me, explain things at a level I can actually understand. And LLMs don't get annoyed.

jdw64 · 2026-06-26T17:53:02 1782496382

There's a lot of tacit knowledge in programming.

-Why do you cut API boundaries this way? -Why do you change the order of struct fields? -Why do you deliberately insert padding?

Most of it depends on the background and context. Sometimes you add it, sometimes you don't. To understand this tacit knowledge, you need access to senior developers. But their attitude often depends on how promising the student is and what background they come from. On top of that, you don't have to rely on the respondent's mood, authority, or availability.

Programming is fundamentally a field that requires seniors. In my case, I had no such seniors at all. I learned to code by buying codebases from failed companies and studying them. My first job didn't hire me as an employee—they hired me as the CEO of a subcontracting company (because that was structurally more advantageous for the contract). So I wasn't given the patience to learn programming fundamentals gradually. I had to pay penalties if I failed. Most of the projects I worked on were the kind where failure meant bankruptcy for me. Naturally, there was no one to teach me.

Most of my knowledge comes from reverse-engineering the code I purchased.

People say LLM code contains falsehoods, but commercially sold code has always had falsehoods too. Honestly, if we're just talking ratios, LLM code has fewer falsehoods.

In that sense, I still think it's a matter of context. If LLM code is false, was human code ever really true? LLMs do lie. They generate plenty of incorrect code. But humans do the same thing. If a problem comes up, you just look it up then and there. For me, LLMs and humans aren't all that different.

hereme888 · 2026-06-26T18:34:27 1782498867

What do you think of modern open-source codebases presently available to the public? Is closed-source/proprietary code that much better?

aenis · 2026-06-26T22:09:57 1782511797

Closed, proprietary code is way, way worse.

Good programmers are ashamed to push anything less than good (at least in their own opinion) to popular public repos. Some of those same pedantic programmers have no problem pushing crap in enterprise repos, and feel absolved because they are pushed to focus on deadlines, new features, and refactoring is very rarely planned for. I did and managed a lot of corporate software development in companies big and small, and did my fair bit of M&As and looked at codebases of successful companies. I dont ever recall feeling impressed. And I am regularly impressed by the aesthetic qualities of popular open source packages. I think commercial code is mostly shit, with the exception of regulated, serious industries (power, space, flight, etc.).

Razengan · 2026-06-26T20:13:19 1782504799

Codex 5.4/5.5 has been great for me as well compared to Claude Opus.

I've been mostly using it for Godot/GDScript code reviews, rubber duckying, asking it for better ideas for naming stuff (one of the hardest problems in programing)

I still can't trust it for generating code for entire files/classes/projects, because it's still icky, creating unnecessary variables and functions, using multiple `if`s instead of `and` or `or`, but it's good enough for generating Mac/iOS apps for my personal use in SwiftUI because fuck trying to keep up with Apple's documentation, or even migrating ancient Visual Basic stuff I made as a kid up to SwiftUI :)

> So using GPT brings both fear and excitement.

Only excitement for me. I've never been more productive, not because I ask AI to make something for me, but it helps me make what I was already going to, but better and quicker.

AI like any other tool could help smart people be smarter and dumb people be dumber, rather kinda like Toklien's Ring: You could be Sauron or you could be Bilbo or Frodo, or you could be Gollum :)

fatata123 · 2026-06-26T18:58:59 1782500339

No offense but have you considered the strong possibility that you’re just not good at what you do? I am occassionally pleased but mostly annoyed or disappointed… but never getting anything close to chills. That sounds downright weird.

adamtaylor_13 · 2026-06-26T20:33:32 1782506012

No offense but have you considered the strong possibility that you're just holding it wrong? You're entitled to your opinion, but OP is hardly the first person to say something like this and is surrounded by tons of folks saying the exact same thing. Just because it sounds weird to you, doesn't mean it's not true.

_se · 2026-06-26T21:56:01 1782510961

Everyone saying it is in the "not as good as they think they are" camp is the very obvious explanation.

applfanboysbgon · 2026-06-26T20:38:47 1782506327

By definition, 50% of developers are below average, so there are indeed "tons of folks" who are not very good at what they do.

salutis · 2026-06-26T21:00:34 1782507634

That is not how averages work. By definition of mean, perhaps.

Xenoamorphous · 2026-06-26T21:13:32 1782508412

Indeed. Most people have more arms than average, which must be 1.9 something.

cl3misch · 2026-06-26T21:27:06 1782509226

That is how a median is defined, not the mean.

cmrdporcupine · 2026-06-26T22:11:58 1782511918

"no offense..."

... then says offensive thing.