Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: AgentBudget – Real-time dollar budgets for AI agents (github.com/sahiljagtap08)
7 points by sahiljagtapyc 3 months ago | hide | past | favorite | 8 comments
Hey HN,

I built AgentBudget after an AI agent loop cost me $187 in 10 minutes — GPT-4o retrying a failed analysis over and over. Existing tools (LangSmith, Langfuse) track costs after execution but don't prevent overspend.

AgentBudget is a Python SDK that gives each agent session a hard dollar budget with real-time enforcement. Integration is two lines:

    import agentbudget
    agentbudget.init("$5.00")
It monkey-patches the OpenAI and Anthropic SDKs (same pattern as Sentry/Datadog), so existing code works without changes. When the budget is hit, it raises BudgetExhausted before the next API call goes out.

How it works:

- Two-phase enforcement: estimates cost pre-call (input tokens + average completion), reconciles post-call with actual usage. Worst-case overshoot is bounded to one call. - Loop detection: sliding window over (tool_name, argument_hash, timestamp) tuples. Catches infinite retries even if budget remains. - Cost engine: pricing table for 50+ models across OpenAI, Anthropic, Google, Mistral, Cohere. Fuzzy matching for dated model variants. - Unified ledger: tracks both LLM calls and external tool costs (via track() or @track_tool decorator) in a single session.

Benchmarks: 3.5μs median overhead per enforcement check. Zero budget overshoot across all tested scenarios. Loop detection: 0 false positives on diverse workloads, catches pathological loops at exactly N+1 calls.

No infrastructure needed — it's a library, not a platform. No Redis, no cloud services, no accounts.

I also wrote a whitepaper covering the architecture and integration with Coinbase's x402 payment protocol (where agents make autonomous stablecoin payments): https://doi.org/10.5281/zenodo.18720464

1,300+ PyPI installs in the first 4 days, all organic. Apache 2.0.

Happy to answer questions about the design.



Interesting to see budget enforcement paired with x402. We've been building in the same space — Apiosk (https://apiosk.com) approaches it from the server side: a gateway that enforces per-request x402 payments so API providers can monetize without accounts or keys.

Your budget SDK + Apiosk would be a natural combo — the agent has a spending ceiling (AgentBudget) and the APIs it calls use x402 for micropayments (Apiosk handles gateway/verification). Have you thought about hooks for x402-aware budget tracking where the ledger automatically records on-chain settlements?


Real-time budget enforcement is a smart approach, especially for agentic loops where costs can spiral from retries. We've tackled the cost side by building an AI gateway at https://simplio.dev that automatically routes requests to the most affordable provider that meets your quality threshold, which has cut our own API bills substantially.


This is exactly the pain point with agents: spend isn’t linear because fanout + retries compound. One thing that helped us debug/contain spikes is tracking cost per “user-action/outcome” (not just per call) plus a retry ratio trend (429/timeouts). Do you support budgets per step/tool in the chain, or only per overall run?


I found this from your twitter post, crazy that i found your post here hahaha, i am trying to implement it for my side project to keep the agents from taking over my side project budget.

Looks really promising so far!


Thanks! That's awesome - love that the Twitter thread connected here. Let me know how the integration goes, happy to help if you hit any issues. What are your agents running on?


Curious about the granularity: does it support per-token budgeting or integration with providers like OpenAI's API for predictive alerts?


[dead]


The multi-agent budget problem you're describing gets even harder when the services are heterogeneous. In a RAG pipeline, a single user query might hit: query analysis (LLM call), embedding generation (different model/pricing), reranking (yet another model), and response generation (LLM call) — each potentially in a different process.

Per-call monkey-patching sees each call in isolation. What I ended up doing was a trace-based approach: every request gets a trace ID, each service appends cost spans asynchronously, and a separate enrichment step aggregates the total. The hard part was deduplication — when service A reports an aggregate cost and service B reports the individual calls that compose it, you need to reconcile or you double-count.

Your atomic disk writes for halt state is a nice pattern. I went with fire-and-forget (never block the request path, accept eventual consistency on cost data) but that means you can't do hard enforcement mid-request like AgentBudget does.


The deduplication problem is the part I haven't worked out cleanly. The hierarchy in veronica-core sidesteps it as long as you declare parent-child relationships upfront — B's spend rolls directly into A's ceiling without a separate aggregation step. But in a dynamic pipeline where you don't know the call graph until runtime, that assumption breaks. The fire-and-forget tradeoff makes sense. I went with blocking enforcement because the original use case was preventing runaway agents, not auditing after the fact. For RAG you're probably right that eventual consistency is the better fit — you care more about the trace than cutting off a half-finished response.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: