I think GPT writes code the best. How well will it write in version 5.6? It gives me chills.
Recently, I went head-to-head with GPT on nearly 2,000 lines of code, and GPT's solution was superior and faster. I even referenced multiple codebases on GitHub while trying, but they were incomparable to GPT.
So using GPT brings both fear and excitement.
The fear comes from realizing that this level of code is now the average for most people. The excitement comes from knowing that I can now study and learn at this level too.
I'm really looking forward to seeing how much more advanced the code will be with the upgrade to 5.6.
Purely subjective, but I tend to prefer reading Opus 4.8 output over GPT 5.5 code, even when the latter can have a higher overall ceiling. The former is just a bit more convenient to review.
Yeah, Opus/GPT need multiple rounds of reviews from each other to get to clean auto review. Fable was like, it is done and indeed… crickets in bot comments. ‘No issues’ galore.
GPT-5.5 has been really hard to beat imho. I've spent $$$ on Opus, Deepseek v4 Pro and recently started to dogfood GLM-5.2 (which is not bad) but I cannot really trust any of them (almost blind) like I can trust GPT-5.5. It gives me tremendous confidence. I cannot say the same for any of the others I mentioned.
>> I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.
I'm working in a 600k+ LoC codebase that has complex domain-specific logic and lots of moving parts. I find that Codex 5.5 is pretty good at surgical fixes, but does not go out of its way to explore and figure out what those surgical fixes might break. So I only use it to work on parts of the system that are pretty isolated from everything else so that risk of regression is small.
Tracking model performance on Artificial Analysis makes me think these models are constantly optimized/tuned in some way or another. GPT 5.5 was scoring in the mid 60's when it was first released, now it's almost 10 points higher.
Maybe I'll know once I try it? Honestly, for small functions or methods, I don't think there's a huge difference between models. But the larger the code gets, the more noticeable the difference seems to be.
Personally, I think this kind of coding experience varies from person to person
sadly with all the labs benchmaxxing I feel like you just have to try the model for a while to really evaluate how good it is, especially for each individual use case
My guess is that it's same base model as 5.5, but with additional post-training to improve and benchmaxx on a few things like that.
If they really thought it was competitive with Mythos/Fable across the board, then why wouldn't they release a broader set of benchmarks, and why price it day 1 at 1/2 the cost of Fable?
Not saying that's the case with OP, but I've found folks sometimes just rationalize it so [0] as they're paying top dollar for it (especially, when compared to may be less capable but affordable models).
When I searched for papers on using LLMs, I found that typically, you can have an LLM generate code and then ask it to find GitHub projects similar to that code. Then you can learn by looking at the pull requests and seeing how they structure things
In the old days, if I wanted to understand why memory offsets, padding techniques, or data layout structures were written a certain way, I had to stare at a senior programmer's code all day or wait for them to reply. But LLMs, while they do flatter me, explain things at a level I can actually understand. And LLMs don't get annoyed.
-Why do you cut API boundaries this way?
-Why do you change the order of struct fields?
-Why do you deliberately insert padding?
Most of it depends on the background and context. Sometimes you add it, sometimes you don't. To understand this tacit knowledge, you need access to senior developers. But their attitude often depends on how promising the student is and what background they come from. On top of that, you don't have to rely on the respondent's mood, authority, or availability.
Programming is fundamentally a field that requires seniors. In my case, I had no such seniors at all. I learned to code by buying codebases from failed companies and studying them. My first job didn't hire me as an employee—they hired me as the CEO of a subcontracting company (because that was structurally more advantageous for the contract). So I wasn't given the patience to learn programming fundamentals gradually. I had to pay penalties if I failed. Most of the projects I worked on were the kind where failure meant bankruptcy for me. Naturally, there was no one to teach me.
Most of my knowledge comes from reverse-engineering the code I purchased.
People say LLM code contains falsehoods, but commercially sold code has always had falsehoods too. Honestly, if we're just talking ratios, LLM code has fewer falsehoods.
In that sense, I still think it's a matter of context. If LLM code is false, was human code ever really true? LLMs do lie. They generate plenty of incorrect code. But humans do the same thing. If a problem comes up, you just look it up then and there. For me, LLMs and humans aren't all that different.
Good programmers are ashamed to push anything less than good (at least in their own opinion) to popular public repos. Some of those same pedantic programmers have no problem pushing crap in enterprise repos, and feel absolved because they are pushed to focus on deadlines, new features, and refactoring is very rarely planned for. I did and managed a lot of corporate software development in companies big and small, and did my fair bit of M&As and looked at codebases of successful companies. I dont ever recall feeling impressed. And I am regularly impressed by the aesthetic qualities of popular open source packages. I think commercial code is mostly shit, with the exception of regulated, serious industries (power, space, flight, etc.).
Codex 5.4/5.5 has been great for me as well compared to Claude Opus.
I've been mostly using it for Godot/GDScript code reviews, rubber duckying, asking it for better ideas for naming stuff (one of the hardest problems in programing)
I still can't trust it for generating code for entire files/classes/projects, because it's still icky, creating unnecessary variables and functions, using multiple `if`s instead of `and` or `or`, but it's good enough for generating Mac/iOS apps for my personal use in SwiftUI because fuck trying to keep up with Apple's documentation, or even migrating ancient Visual Basic stuff I made as a kid up to SwiftUI :)
> So using GPT brings both fear and excitement.
Only excitement for me. I've never been more productive, not because I ask AI to make something for me, but it helps me make what I was already going to, but better and quicker.
AI like any other tool could help smart people be smarter and dumb people be dumber, rather kinda like Toklien's Ring: You could be Sauron or you could be Bilbo or Frodo, or you could be Gollum :)
No offense but have you considered the strong possibility that you’re just not good at what you do? I am occassionally pleased but mostly annoyed or disappointed… but never getting anything close to chills. That sounds downright weird.
No offense but have you considered the strong possibility that you're just holding it wrong? You're entitled to your opinion, but OP is hardly the first person to say something like this and is surrounded by tons of folks saying the exact same thing. Just because it sounds weird to you, doesn't mean it's not true.
Recently, I went head-to-head with GPT on nearly 2,000 lines of code, and GPT's solution was superior and faster. I even referenced multiple codebases on GitHub while trying, but they were incomparable to GPT.
So using GPT brings both fear and excitement.
The fear comes from realizing that this level of code is now the average for most people. The excitement comes from knowing that I can now study and learn at this level too.
I'm really looking forward to seeing how much more advanced the code will be with the upgrade to 5.6.