This is very interesting. Especially the last part where it shows gpt-5.2 and gpt-oss and their very similar and unique outcome of being 90%+ Serious.
I tested this locally and got the same result with gpt-oss 120b. But only on the default 'medium' reasoning effort. When I used 'low' I kept getting more playful responses with emojis and when I used 'high' I kept getting more guessing responses.
I had a lot of fun with this and it provided me with more insight than I would have thought.
LLM's are good at making stuff from scratch and perfect when you don't have to worry about the codes future. 'Research' can be a great tool. But LLMs are horrible in big codebases and multiple micro services. Also at making decision, never let it make a decision for you. You need to know what's happening and you can't ship straight AI code. It can save time, but it's not a lot and it won't replace anyone.
We have a large monorepo at my company. You're right that for adding entirely new core concepts to an existing codebase we wouldn't give an AI some vague requirements and ask it to build something – but we wouldn't do that for a human engineer either. Typically we would discuss as a team and then once we've agreed on technologies and an approach someone will implement it relying heavily on AI to write the actual code (because it's faster and generally won't add dumb bugs like typos or conditional logic error).
Almost everything else at this point can be done by AI. Some stuff requires a little support from human engineers, but honestly our main bottlenecks at this point is just QA and getting the infra to a place where we can rapidly ship stuff into production.
> You need to know what's happening and you can't ship straight AI code.
I think there is some truth to this. We are struggling to maintain a high-level understanding of the code as a team right now, not because there is no human that understands, but because 5 years ago our team would have probably been 10-20x larger given the amount we're shipping. So when one engineer leaves the company or goes on holiday we find we lose significantly more context of systems than you historically would with larger teams of engineers. Previously you might have had 2-3 engineers who had a deep understanding of a single system. Now we have maybe 1-2 engineers who need to maintain understanding of 5-6 systems.
That said, AI helps a lot with this. Asking AI to explain code and help me learn how it works means I can pick up new systems significantly quicker.
Yes. I mostly work on Quarkus microservices and use cursor with auto agent mode.
> we wouldn't give an AI some vague requirements and ask it to build something
> we would discuss as a team
seems like a reasonable workflow. It's the polar opposite of what was written in the blog post. That is the usual, easy way people use agents and what I think is the wrong path. May I also ask what language and/or framework you work with where so much context works good enough?
> Asking AI to explain code and help me learn how it works means I can pick up new systems significantly quicker.
I don't want to sound rude, but what was your reason to go from scratch instead of joining an already established, open source effort? The likes of Cline, Roo, Continue, ...
The three options you offered are all VC funded "open source"
This is where the real AI bubble is. VC funded startups who's main plan is likely an acquisition. I'm not interested in those kinds of "open source" anymore, they want to lock you in to their product. Also, https://ssotax.org/
Overuse of bold markup, particularly to begin each bullet point.
Overuse of "Here's..." to introduce or further every concept or idea.
A few parts of this article particularly jump out, such as the 2 lists following the "The SMS Flooding Attack" section (which incidentally begins "Here's where..."). A human wouldn't write them as lists (the first list in particular), they'd be normal paragraphs. Short bulleted lists are a good way to get across simple bite-sized pieces of information quickly, but that's in cases where people aren't going to read a large block of text, e.g. in ads. Overusing them in the wrong medium, breaking up a piece of prose like this, just hurts its flow and readability.
There would't be a problem if there was transparency and clear boundaries. The future is simply enjoying what you want, but we have to get there past these first steps.
I tested this locally and got the same result with gpt-oss 120b. But only on the default 'medium' reasoning effort. When I used 'low' I kept getting more playful responses with emojis and when I used 'high' I kept getting more guessing responses.
I had a lot of fun with this and it provided me with more insight than I would have thought.
reply