Hacker Newsnew | past | comments | ask | show | jobs | submit | red_hare's commentslogin

OpenAI API also supports defer_loading https://developers.openai.com/api/docs/guides/tools-tool-sea...

And it's not actually necessary for it to exist at the API level. It's a pattern. Making it API-side is just an optimization.

To do it client-side: 1. Define a single tool, tool_search 2. List the names of your deferred tools in context (or tool_search's description) 3. When tool_search is called, match the query against the tool names (or names + descriptions) 4. Append the matched tool def to the context in a new <system>-esque tag

Claude Code (as of the leak) does this client side. You can even see the custom matching function and A/B tests about whether to include the descriptions.

Whether or not that tool definition comes from MCP or a local definition is kind of beside the point.


+1

Its crazy that people are still discussing this. It's ancient history. Deferred tool loading, large contexts, and prompt caching have made 2026 completely different from 2025.

Also, the "CLI saves token" debate really falls apart when step one of using the CLI is running "--help". The problem remains: if knowing how to call the thing isn't in parametric memory, it has to be in context.


Build a more specific skill the for the exact workflow you want?

Skill still needs to be loaded in context, what would it change?

I think that what they mean is that instead of ten perfectly orthogonal "unix philosophy" tools (skills) for the agent to compose when solving a problem, each with an API surface (description text) the size of Texas, you'd want to can each composition in a shell script (or a bespoke rust binary, if you enjoy watching your bot perform some heavy lifting) that only solves one problem but solves it so focused that the accompanying skill description barely consumes more context than the tool's self descriptive name.

I still didn't follow, you mean to pipe things between tool calls? Like if you want to query something and then update another without the intermediate getting brought in context?

Instead of requiring each session to understand the n tools used to solve a particular problem, you bundle up the solution in a conventional script (that's what I meant by "can", as in canning) that the agent can use with very little documentation in the context. When the model is smart enough to figure out the composition of underlying tools during regular execution, it will also be able to do the canning up as a script and write the lightweight documentation that turns the script into a skill. Subsequent use will only require that lightweight documentation in context.

There was a time, in the early to mid 2010s, when the phrase "Fake News" was almost exclusively used by people in publishing to talk about a very real rise in editorial disruption as news readers shifted from being desktop and homepage-driven to mobile and facebook-driven.

And then, one day, the politicians started saying it...


I work for an "AI-native" company now and have found this to be the case.

EVERYONE (engineers, pms, managers, sales) uses Claude Code to read and write Google Docs (google workspace mcp). Ideas, designs, reports. It's too much for one person to read and, with a distributed async team, there's an endless demand for more.

So for every project there's always one super Google Doc with 50 tabs and everyone just points their claude code at it to answer questions. It's not to be read by a human, it's just context for the agent.


Everyone cranks out endless pages of slop, that everyone else then has to ingest. Anthropic collects a fee from all of you and is the only winner here.

I'm looking forward to the impending crash when the AI providers actually start charging what it costs to run these models. It's going to be a bloodbath, and it's going to be cathartic as fuck.


This is literally losing the whole process to a stochastic parrot.


They are so far removed from the process they can claim they are any % more productive and no one is able to contradict them. Call it a ‘productivity theatre’

The economic reality check is going to be devastating. It won’t be a crash of AI as a tech, it will be a crash of every ‘AI native’ company that does not even know what is their product any more.


I really hope that more people become aware of how much of our society is turning into kayfabe. Just think of the rise of all the new types of ____ theatre like this that have been coined over the last decade or more. It's not an accident or fad, it reflects something true that's happening to society at large. Everything authentic and valuable is being turned into something inauthentic, based only on conjured up perceived value and competition to fulfill the perception, and not real or useful purposes. It's all in the service of propping up systems that no longer function for the majority of people, or even for basic needs. And until a lot more people are willing to point out that the emperor is quite naked, even at their own social or financial risk, this will continue to rot everything down to the foundation.


The US reinventing the worst parts of Soviet but putting a glossy and chipper veneer on it.


To be fair, a lot of those people were stochastically parroting by themselves for years already. They are just capable to stochastically parrot more.

These companies have enough market power that they can afford to be ineffective. So they were. And they are ineffective in novel way.


If that's true, it's very unsustainable.

Gemma-4 26B-A4B + M5 MacBook Pro + OpenCode isn't Claude Code _yet_, but it's good enough that if I were forced to use it I would be fine.


Yes, it's amazing how quickly so many tech companies have hitched their tooling to these big AI vendors seemingly without any thought towards whether they'll still exist a year or three or five from now. Insane behavior. To the (debatable!) extent that AI coding tools are useful at all wouldn't it be a hell of a lot smarter to self-host? At least that way you have some control over QoS, and a stable, predictable result... Or maybe nobody cares about that kind of thing anymore? What happened to basic business math in this industry?


The basic business math is (to start) software companies realizing that spending $10k, $20k, $50k (more ?) per year, per developer for current models at current token rates might not be particularly insane, given the value return.

Models are likely going to keep getting better, and as costs go down, demand is likely to rise faster.


> as costs go down

Huh? Why would that happen? Indications are that costs will likely go up, especially if currently vendors are selling tokens at a loss.


The main operational expense of a million LLM tokens is pennies of electricity.

Even if you generously depreciate the GPU and other hardware, it’s hard to believe inference at scale in April 2026 isn’t highly profitable.


> The main operational expense of a million LLM tokens is pennies of electricity.

I think you meant dollars of electricity.


I don’t think so.

https://www.theregister.com/2024/03/18/nvidia_turns_up_the_a...

A Blackwell 8X node consumes about 15kw, let’s up that to 50kw to generously account for cooling and everything else.

A US kWh is something like $0.20, so running that node for an hour costs ~$10.

Nvidia got 30,000 parallel TPS out of DeepSeek-R1 on that node:

https://developer.nvidia.com/blog/nvidia-blackwell-delivers-...

So that $10 buys you over 100M tokens or … pennies per million.

I’m sure these numbers are off, but not by an aggregate two orders of magnitude.


It’s getting better on both the hardware and the software fronts the barbarians are banging at the gates.


I love the batteries included in Helix. Just the right amount that I don't need much else.

At this point I just want a decent Helix-Evil-Mode.


I had a few years of writing clojure for work ten years ago and it's still my mental model of how I think about programming.


TheVerge launched a full RSS Feed for paid subscribers about a year ago and I've never so happily subscribed to something.


I feel the same about Claude Code. It's a fast but average developer at just about everything and there are some things that average developers are just consistently bad at and therefore Claude is consistently bad at.


I'm not sure, I think you overestimate the average developer. But then, the average code doesn't end up in public repositories, it spends decades in enterprise codebases rotting.

At this point I'd rather review LLM generated code than a poor developer's.


That person's actions were only possible because the administration explicitly decided to put that much unchecked power into poorly vetted individuals.


> poorly vetted individuals.

Interesting choice of words and application when discussing gripes against entire administrations.


Why is it interesting?

Why does this admin get a pass from you for their employees actions?


You wouldn't hold a Democrat admin responsible for the broad competence of their appointees and direct hires?


I would. I’m saying, that you didn’t.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: