Came here to say this -- humans are not always rational actors. I get asked questions all the time, which I have no special knowledge of, and which the asker could have easily Googled or ChatGPTed. And yet...
I feel like caching should be mentioned in tradeoffs, right? If you change the tool list frequently, that's a cache bust. In long sessions that seems like it could significantly affect costs.
Great question... and there are two answers depending on what you were originally referring to:
re: Claude Code... we actually don't filter or modify the tool list so all tools stay visible -- disallowed calls get blocked at execution time with an error message. No cache busts on transitions, the model sees the full tool sets. The cost there is prompt caching dollars not latency I suppose
re: The research (Rust agent + Ollama) the model only receives tool schemas for the current states' allowed tools. Ollama does have a KV cache reuse facility so changing the tool list busts that cache. Depending on your workflow this can happen as many times as you expect your states to transition until completion. For simple workflows this is 3-5x. Within each state the tool list is stable and cache operates normally. Presenting fewer tools instead of dozens on every agent processing step reduces input tokens and decision complexity, which is where the measurable gains come from.
Both enforce the same constraints depending on the execution interface. The schema level filtering in the research is the S-tier approach. Adding tools/list filtering to the MCP gateway would be beneficial if possible (it looks like we could only filter MCP tools not core ones, which could provide tangible benefit. I've added this evaluation to the roadmap.
I have read and heard takes similar to OP's probably 50+ times from different people in the last few months (and years, now), and I agree mostly.
But I can't get over the myopic nature of this perspective. Technological advances often change the nature of work, and therefore change the nature (or location) of the "struggle".
I can imagine some hunter-gatherers probably admonishing early farmers at the dawn of agriculture for losing the "struggle" of hunting and foraging for their own food. It's much easier to drive a car than tame, train, and ride a horse. And so on throughout time.
So now with AI, some things that were hard before are now easy. So we move on to the next hard thing that maybe before was impossible or unimaginable. There is still hard work to be done.
This (from the September 2025 post) now evokes the Curb theme:
> Like you, we have seen numerous reports that more and more firms are capping their total headcount in favor of leaning on more AI tools, leading to downsizing their intern and new-graduate hiring. This is resulting in increased sidelining of new college graduates. But we think this misreads the moment completely, so we’re heading in the opposite direction.
> While we are excited about what AI tools can help do, we have a different philosophy about their role. AI tools make great team members even better, and allow firms to set more ambitious goals. They are not replacements for new hires — but ways to multiply how new hires can contribute to a team.
This is the predominant (public) talking point. And it’s true.
But along with that: when you have effective people becoming even more effective with AI, it becomes glaringly obvious who the INeffective people are. At which point it becomes hard to justify keeping those people around.
(That often includes people who are otherwise effective but aren’t utilizing agents and are therefore losing their edge.)
Before AI, it was impossible to measure productivity. Some tried with misguided metrics like lines of code added but that just incentivized writing obtuse code.
Stuff just gets done, I guess? Projects move faster, people onboard faster with less intervention, etc. The speedup seems noticeable enough that it doesn’t need precise measuring.
If the speed up is noticeable enough then coming up with a metric should be easy?
I haven’t noticed a speed up in my own org though the feeling of engineers rushing to implementation has become more pronounced. Team members no longer understand what others are doing and siloing has become intense even within my team.
Quality matters as well as speed though: reworking comes at a cost, so you really need to be tracking more than one metric. A lot of problems are caused by optimising for one metric above all else.
Impossible to measure in absolute terms but I think it's clear productivity increases relatively when LLMs are used. At least that's my strong experience.
It's important to say a large layoff isn't performance related, because it helps those who got laid off find new work. Even if it was all performance related, you want someone else to hire your former employees.
And, in a large layoff, it's likely to be at least partially true. Large layoffs work better when they're done quickly, when there's signs of layoffs but no information, many people will head for the exits themselves... which helps your headcount numbers, but ideally you want to keep people who are good at figuring stuff out and taking appropriate action and instead they've left. So... lay off people who are 'known performance issues', but also lay off some whole teams that have a mix of performance, and then do some random assignment and catch a mix of performance, because getting direct managers involved to pick who goes means having too many people know about the lay offs.
> This is NOT a cost cutting exercise.
Yeah, this one isn't credible. If it was about something other than costs, like pivoting to a new market, you would offer first choice of jobs for the new market. Even if it's look at our productivity, 20% of our employees have nothing to do, it takes a lot of spin to say not paying them to twiddle their thumbs is something other than cost cutting.
Didn't a few large tech companies fail even that low bar of decency? I seem to recall news of layoffs in the not too distant past where the employer openly let it be known those chosen were chosen for performance reasons, e.g. https://www.cnbc.com/2025/01/14/meta-targeting-lowest-perfor...
That to me is a pretty clear reason to question the accuracy of those two claims. Insiders are saying that even people who were performing well in very profitable groups are being cut, which is hard to square with the stated motivations.
Training can be socialized by asking people to take govt loans on further education and then letting the people default on them. Why should company spend their profits on it? /s
I want to agree, I do. But this point is plainly wrong in my observations:
> The enterprise version of that is I don’t want a CRM unless at least two other giant enterprises have successfully used that CRM for six months. [...] You want solutions that are proven to work before you take a risk on them.
Perhaps not for every category of software and every company. But in practice, any SaaS app that is just CRUD with some business logic + workflows is, imo, absolutely vulnerable to losing customers because people within their customers' orgs vibe coded a replacement.
They are perhaps even more at risk because would-be new customers don't ever even bother searching to find them as an option because they just vibe code a competitor in-house.
The vulnerability lies primarily in the fact that most of these SaaS apps were talking about are _wrong_ to some meaningful degree. They don't fully fit how your company works, and they never did. There is something about them that you are forced to work around in some way. This is true because it is impossible to build a universally perfect product, to perfectly fit it to every business requirement of every user in every company.
But now it is relatively cheap to build the perfect version for your company in-house. Or maybe even just for YOU.
I think medium/long-term this will mean a redistribution of technical talent from SaaS companies to industry companies. Instead of paying millions for SaaS subscriptions, industry companies will spend fewer millions building precisely what they need in-house with the help of AI. Not every SaaS and not every company, but I already see this happening at my company right now.
But it is working primarily because of the Max subscription model. If I could use my Max subscription to get $5000 worth of tokens for only $200 via OpenCode or Pi, I would drop Claude Code today. I think a lot of people (and enterprises) are of a similar opinion. Not saying Claude Code would have no users, but its dominance would be greatly diminished.
But you can’t and the reason that you care also has to do with the same production process.
It’s not like a separate company made the terminal app versus the model. If we think that the desktop app is bad, but the model is good then that’s still an endorsement of the software process.
If we think the model doesn’t matter at all, then that’s an even bigger endorsement. Is the model has no content worth talking about over the nearest competitors or an open source alternative, then the remainder is marketing and polish.
I just don’t understand how people can look at a company that is capacity constrained in this market and think that they’re doing things poorly.
I think the argument here ignores a critical fact: a huge factor in Claude Code's popularity is the Claude Max plans. These plans give you potentially thousands of dollars of tokens for a capped $200.
Speaking for myself, I long for the day I can dump the comparatively garbage experience of Claude Code for something more enjoyable and OSS like OpenCode. But the fact is that it is simply not economically viable to do so.
So the PMF is not really for Claude Code alone -- it is for Claude Code + Claude Max.
I don't really see why evals are assumed to be exclusively in the domain of data scientists. In my experience SWEs-turned-AI Engineers are much better suited to building agents. Some struggle more than others, but "evals as automated tests" is, imo, so obvious a mental model, and can be so well adapted to by good SWEs, that data scientists have no real role on many "agent" projects.
I'm not saying this is good or bad, just that it's what I'm observing in practice.
For context, I'm a SWE-turned-AI Engineer, so I may be biased :)
I think there's a lot of methodological expertise that goes into collecting good eval data. For example, in many cases you need human labelers with the right expertise, well designed tasks, well defined constructs, and you need to hit interrater agreement targets and troubleshoot when you don't. Good label data is a prerequisite to the stuff that can probably be automated by the AI agent (improving the system to optimize a metric measured against ground truth labels). Data scientists and research scientists are more likely to have this skillset. And it takes time to pick up and learn the nuances.
As someone who works with real licensed engineers (electrical, civil), I wish we would use the term "agentic software engineering" to describe this. Omitting "software" here betrays a very SWE-centric mindset.
Agents are coming for the other engineering disciplines as well.
reply