Yeah, I recognized the PR author from Twitter (same avatar) and man he really does come across as incredibly juvenile. Shamelessly talking up OpenAI while shitting on Claude models and the motivation is just so transparent.
You made the argument and provided zero supporting evidence. As it stands, it's merely an opinion, and appears to be an uninformed one until you prove otherwise. That's what people are asking you to do.
Sigh, your supporting evidence is a record of someone saying something, which itself is merely an opinion.. men in glass houses and all that. The interesting thing about my opinion is that while it may not be AS informed as yours, it is notably above the average level of knowledge when it comes to BLS.
<< That's what people are asking you to do.
No. What I am being asked to do is: "Show me a better way, but I only accept a better way that is already utilized by someone else". Not a recipe for a thoughtful exchange of ideas.
I think this is a false dichotomy because which approach is acceptable depends heavily on context, and good engineers recognize this and are capable of adapting.
Sometimes you need something to be extremely robust and fool-proof, and iterating for hours/days/weeks and even months might make sense. Things that are related to security or money are good examples.
Other times, it's much more preferable to put something in front of users that works so that they start getting value from it quickly and provide feedback that can inform the iterative improvements.
And sometimes you don't need to iterate at all. Good enough is good enough. Ship it and forget about it.
I don't buy that AI users favor any particular approach. You can use AI to ship fast, or you can use it to test, critique, refactor and optimize your code to hell and back until it meets the required quality and standards.
Yes, it is a false dichotomy but describes a useful spectrum. People fall on different parts of the spectrum and it varies between situations and over time as well. It can remind one that it is normal to feel different from other people and different from what one felt yesterday.
Too risky, and doesn't make sense from a cost-benefit perspective. Iran uses cheap and disposable weapons that are also effective. If you think about how much a single US ship costs, and the political price of US service members dying, I think the picture becomes clear.
The decision of the US Navy to not provide escort services makes perfect sense and it is no surprise.
The only thing that is newsworthy about it is that this has exposed yet another lie of Trump, who at some point has promised that the traffic will not be affected, because USA will provide such escort services.
Yeah, listen... I'm glad these types of studies are being conducted. I'll say this though: the difference between pre- and post-Opus 4.5 has been night and day for me.
From August 2025 through November 2025 I led a complex project at work where I used Sonnet 4.5 heavily. It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.
So I'm much more interested in reading studies like this over the next two years where the start period coincides with Opus 4.5's release.
> It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.
Very much agree. Gave a presentation on AI to a group earlier this week and I spent a third of the time talking about the Opus 4.5 inflection point in AI history. First time using that model the day it was released it was so clear that it knew what it was doing at a different level. People still jump around to different models or tools or time frames when talking about AI and usefulness, but those have no meaning if they’re not using the Opus 4.5 and 4.6 models and anthropic harnesses of Claude code or cowork.
I’m interested in the studies along with the history of AI and if they’re going to realize that was the point when things changed, because for us devs, that was the moment.
I gave a similar presentation in January which covers the AI features that emerged in 2025 that culminated in the step-function in capability in Nov'25 and where I went from there.... (certainly my GitHub activity is bright green since)
The presentation was created with Claude Code to prove itself; never going back to Keynote/PowerPoint. Press 'X' key to disable "safe mode". Prompts are in the repo.
reply