nagi_builds's comments

nagi_builds · 2026-03-08T12:57:53 1772974673

Hey HN. I've been building SkyClaw for the past few months — it's a runtime for running LLM-powered agents over chat channels (Telegram, Discord, Slack) with a focus on reliability and not wasting tokens.

The problem I kept hitting with other agent setups: they're fragile. Process dies mid-task, you lose everything. The model burns through context repeating the same failed approach. Token budgets aren't managed, so you hit limits on complex tasks and the conversation just breaks.

Here's how SkyClaw handles these:

Crash recovery: After every tool round, the full session state gets checkpointed to SQLite. If the process dies, it resumes from the last checkpoint. Tasks that can't complete during graceful shutdown get saved automatically.

Token budgeting: A context manager allocates tokens across 7 priority tiers — system prompt and tool definitions always stay, recent messages get a guaranteed window, memory search is capped at 15%, cross-task learnings at 5%, and older history fills whatever's left. When messages get dropped, a summary gets injected so the model doesn't lose the thread.

Self-correction: A failure tracker counts consecutive failures per tool. After 2 failures on the same tool, it injects a prompt telling the model to try a fundamentally different approach instead of retrying the same thing. Cross-task learning: After a task completes, the runtime analyzes what tools were used, what failed, and what strategy rotations happened. It stores these as lessons and injects them into future tasks at 5% of the context budget. So the agent actually gets better over time within a workspace.

Model routing: A complexity analyzer looks at the task and routes it — simple questions go to fast/cheap models, multi-step tasks get the expensive ones. This alone cut my API costs significantly.

Parallel tools: Up to 5 tools run concurrently. Dependency detection uses union-find grouping: read-read pairs are independent, write-write or write-read pairs get serialized, shell commands always run alone.

Agent delegation: The runtime can spawn scoped sub-agents for subtasks — each gets its own model config, tool set, and timeout. Max 10 per task, max 3 concurrent. Sub-agents can't spawn further sub-agents, so no runaway recursion.

On the infrastructure side: circuit breaker with exponential backoff and jitter for provider errors, automatic memory backend failover, a watchdog that monitors and auto-restarts subsystems, S3/R2 file storage, OpenTelemetry metrics, multi-tenant workspace isolation, and OAuth with PKCE.

The proactive trigger system (file watches, cron, webhooks, thresholds) is off by default and rate-limited. I was paranoid about an agent deciding to do things on its own, so it requires global opt-in and destructive operations need confirmation.

Everything is feature-flagged — you only compile the channels and backends you actually use. Written in Rust with tokio, 905 tests, zero clippy warnings. Happy to answer questions about the architecture or any of the tradeoffs.