blakec's comments

blakec · 2026-03-01T04:43:44 1772340224

The FTS5 index approach here is right, but I'd push further: pure BM25 underperforms on tool outputs because they're a mix of structured data (JSON, tables, config) and natural language (comments, error messages, docstrings). Keyword matching falls apart on the structured half.

I built a hybrid retriever for a similar problem, compressing a 15,800-file Obsidian vault into a searchable index for Claude Code. Stack is Model2Vec (potion-base-8M, 256-dimensional embeddings) + sqlite-vec for vector search + FTS5 for BM25, combined via Reciprocal Rank Fusion. The database is 49,746 chunks in 83MB. RRF is the important piece: it merges ranked lists from both retrieval methods without needing score calibration, so you get BM25's exact-match precision on identifiers and function names plus vector search's semantic matching on descriptions and error context.

The incremental indexing matters too. If you're indexing tool outputs per-session, the corpus grows fast. My indexer has a --incremental flag that hashes content and only re-embeds changed chunks. Full reindex of 15,800 files takes ~4 minutes; incremental on a typical day's changes is under 10 seconds.

On the caching question raised upthread: this approach actually helps prompt caching because the compressed output is deterministic for the same query. The raw tool output would be different every time (timestamps, ordering), but the retrieved summary is stable if the underlying data hasn't changed.

One thing I'd add to Context Mode's architecture: the same retriever could run as a PostToolUse hook, compressing outputs before they enter the conversation. That way it's transparent to the agent, it never sees the raw dump, just the relevant subset.

thecopy · 2026-03-01T13:09:33 1772370573

Very interesting, one big wrinkle with OP:s approach is exactly that, the structured responses are un-touched, which many tools return. Solution in OP as i understand it is the "execute" method. However, im building an MCP gateway, and such sandboxed execution isnt available (...yet), so your approach to this sounds very clever. Ill spend this day trying that out

doctorpangloss · 2026-03-01T18:33:29 1772390009

The LLM that wrote the comment you are replying to has no idea what it is talking about...

pmarreck · 2026-03-02T13:35:12 1772458512

Are you sure it's simply because YOU don't understand it? Because it seems to make sense to me after working on https://github.com/pmarreck/codescan

thecopy · 2026-03-02T08:52:17 1772441537

Im trying it anyway

danw1979 · 2026-03-01T09:05:39 1772355939

Would love to read a more in depth write up of this if you have the time !

I suspect the obsessive note-taker crowd on HN would appreciate it too.

tclancy · 2026-03-01T12:36:15 1772368575

Seconded that I would love to see the what, why and how of your Obsidian work.

blakec · 2026-03-01T04:42:34 1772340154

The proxy-based secret injection approach mentioned upthread is solid for network credentials, but it doesn't cover the local attack surface — your SSH keys, GPG keys, AWS credentials sitting in dotfiles. Those are the actual high-value targets for a compromised agent on a dev workstation.

I run Claude Code with 84 hooks, and the one I trust most is a macOS Seatbelt (sandbox-exec) wrapper on every Bash tool call. It's about 100 lines of Seatbelt profile that denies read/write to ~/.ssh, ~/.gnupg, ~/.aws, any .env file, and a credentials file I keep. The hook fires on PreToolUse:Bash, so every shell command the agent runs goes through sandbox-exec automatically.

The key design choice: Seatbelt operates at the kernel level. The agent can't bypass it by spawning subprocesses, piping through curl, or any other shell trick — the deny rules apply to the entire process tree. Containers give you this too, but the overhead is absurd for a CLI tool you invoke 50 times a day. Seatbelt adds ~2ms of latency.

I built it with a dry_run mode (logs violations but doesn't block) and ran it for a week before enforcing. 31 tests verify the sandbox catches attempts to read blocked paths, write to them, and that legitimate operations (git, python, file editing in the project directory) pass through cleanly.

The paths to block are in a config file, so it's auditable — you can diff it in code review. And it's composable with other layers: I also run a session drift detector that flags when the agent wanders off-task (cosine similarity against the original prompt embedding, checked every 25 tool calls).

None of this solves prompt injection fundamentally, but "the agent physically cannot read my SSH keys regardless of what it's been tricked into doing" is a meaningful property.

blakec · 2026-02-24T13:36:34 1771940194

I've been cataloging agent failure modes for two months. They're not random, they repeat. I gave them names so I could build mitigations:

Shortcut Spiral: agent skips verification to report "done" faster. Fix: mandatory quality loop with evidence for each step.

Confidence Mirage: agent says "I'm confident this works" without running tests. Fix: treat hedging language ("should", "probably") as a red flag that triggers re-verification.

Phantom Verification: agent claims tests pass without actually running them in the current session. Fix: independent test step that doesn't trust the agent's self-report.

Tunnel Vision: agent polishes one function while breaking imports in adjacent files. Fix: mandatory "zoom out" step that checks integration points before reporting completion.

Deferred Debt: agent leaves TODO/FIXME/HACK in committed code. Fix: pre-commit hook that greps for these and blocks the commit.

Each of these happened to me multiple times before I built the corresponding gate. The pattern: you don't know what gate you need until you've been burned by its absence.

blakec · 2026-02-24T02:48:20 1771901300

I built one of these by accident over two months on Claude Code. ~15,000 lines of hooks, skills, and agents. I never set out to build an orchestration layer. I fixed one problem (stop the model from suggesting OpenAI). Then another (inject date and project context). Then another (catch credentials in tool calls). Then the solutions started stepping on each other, so I built dispatchers. Then dispatchers needed shared state. Then state needed quality gates. By the time Karpathy named the concept, my setup already looked like this.

"Just existing tech repackaged" is accurate and beside the point. Dropbox was just rsync repackaged. The value is in how it comes together, not the individual pieces.

What's actually missing that nobody's built yet: declarative workflow definitions. Everything I have is imperative bash. Want to change the order something runs? Edit a 1,300-line script. A real Claws system would define workflows as data and interpret them.

blakec · 2026-02-23T23:56:06 1771890966

Yeah, but not a framework. I'm using Claude Code's hook system. 84 hooks across 15 event types.

Biggest thing I learned: don't let multiple hooks fire independently on the same event. I had seven on UserPromptSubmit, each reading stdin on their own. Two wrote to the same JSON state file. Concurrent writes = truncated JSON = every downstream hook breaks. One dispatcher per event running them sequentially from cached stdin fixed it. 200ms overhead per prompt, which you never notice.

The "multi-agent is worse than serial" take is true when agents share context. Stops being true when you give planning agents their own session (broad context, lots of file reads) and implementation agents their own (narrow task, full window). I didn't plan that separation. It just turned out that mixing both in one session made both worse.

No framework, no runtime. Just files. You can use one hook or eighty-four.

blakec · 2026-02-23T21:40:59 1771882859

The discussion is focused on blame but the real question is architectural: why was there no gate between the agent and the publish button?

Commands have blast radius. Writing a local file is reversible and invisible. git push reaches collaborators. Publishing to Twitter reaches the internet. These are fundamentally different operations but to an autonomous agent they're all just tool calls that succeed.

I ran into the same thing; an agent publishing fabricated claims across multiple platforms because it had MCP access and nothing distinguishing "write analysis to file" from "post analysis to Twitter." The fix was simple: classify commands as local, shared, or external. Auto-approve local. Warn on shared. Defer external to human review. A regex pattern list against the output catches the external tier. It's not sophisticated but it doesn't need to be. The classification is mechanical (does this command reach the internet?) not semantic (is this content accurate?). Semantic verification is what the agent already failed at.

Prompt constraints ("don't publish") reduce probability. Post-execution scanning catches what slips through. Neither alone is sufficient. Both together with a deferred action queue at the end of the run covers it.

blakec · on June 27, 2015

Nice read. Pay what feels right.