More

mksglu · 2026-02-28T21:17:47 1772313467

Thanks, really appreciate hearing that! Glad it's working well for your team.

tomhow · 2026-03-01T01:09:55 1772327395

HN Mod here. Is the date on the post an error? It says Feb 2025 but the project seems new. I initially went to put a date reference on the HN title but then realised it's more likely a mistake on your post.

doctorpangloss · 2026-03-01T18:38:48 1772390328

His post, code and all the replies here are LLM authored and don't make any sense. He has no idea why his Claude Code instance wrote Feb 2025 instead of Feb 2026. I mean all his results are placebos or nonsense. I can also start new conversations with only 2% of the context in it, or you can call compact, it will all work better. The post has to be flagged.

mksglu · 2026-02-28T21:17:29 1772313449

Yeah it's basically pre-compaction, you're right. The key difference is nothing gets thrown away. The full output sits in a searchable FTS5 index, so if the model realizes it needs some detail it missed in the summary, it can search for it. It's less "decide what's relevant upfront" and more "give me the summary now, let me come back for specifics later."

mksglu · 2026-02-28T21:17:04 1772313424

That's the theory and it does hold up in practice. When context is 70% raw logs and snapshots, the model starts losing track of the actual task. We haven't run formal benchmarks on answer quality yet, mostly focused on measuring token savings. But anecdotally the biggest win is sessions lasting longer before compaction kicks in, which means the model keeps its full conversation history and makes fewer mistakes from lost context.

overfeed · 2026-03-01T05:03:38 1772341418

> When context is 70% raw logs and snapshots, the model starts losing track of the actual task

Which frontier model will (re)introduce the radical idea of separating data from executable instructions?

mksglu · 2026-02-28T21:16:33 1772313393

That's a fair point and honestly the ideal approach. But in practice most people don't hand-curate their MCP server list per task. They install 5-6 servers and suddenly have 80 tools loaded by default. Context-mode doesn't solve the tool definition bloat, that's the input side problem. It handles the output side, when those tools actually run and dump data back. Even with a focused set of tools, a single Playwright snapshot or git log can burn 50k tokens. That's what gets sandboxed.

mksglu · 2026-02-28T21:14:34 1772313274

It doesn't break the cache. The raw data never enters the conversation history, so there's nothing to invalidate. A short summary goes into context instead of the full payload, and the model can search the full data from a local FTS5 index if it needs specifics later. Cache stays intact because you're just appending smaller messages to the conversation.

mksglu · 2026-02-28T21:14:06 1772313246

Nice approach. Same core idea as context-mode but specialized for your build domain. You're using SQLite as a structured knowledge cache over YAML rule files with keyword lookup. Context-mode does something similar but domain-agnostic, using FTS5 with BM25 ranking so any tool output becomes searchable without needing predefined schemas. Cool to see the pattern emerge independently from a completely different use case.

mksglu · 2026-02-28T21:12:54 1772313174

That's true, Claude Code does truncate large outputs now. But 25k tokens is still a lot, especially when you're running multiple tools back to back. Three or four Playwright snapshots or a batch of GitHub issues and you've burned 100k tokens on raw data you only needed a few lines from. Context-mode typically brings that down to 1-2k per call while keeping the full output searchable if you need it later.

mksglu · 2026-02-28T21:12:27 1772313147

Haven't looked at rtk closely but from the description it sounds like it works at the CLI output level, trimming stdout before it reaches the model. Context-mode goes a bit further since it also indexes the full output into a searchable FTS5 database, so the model can query specific parts later instead of just losing them. It's less about trimming and more about replacing a raw dump with a summary plus on-demand retrieval.

giancarlostoro · 2026-02-28T21:58:58 1772315938

Yeah I like this approach too. I made a tool similar to Beads and after learning about RTK I updated mine to produce less token hungry output. I'm still working on it.

https://github.com/Giancarlos/guardrails

esperent · 2026-02-28T22:18:23 1772317103

Does context mode only work with MCPs? Or does it work with bash/git/npm commands as well?

re5i5tor · 2026-03-01T01:00:30 1772326830

I'm not sure it actually works with MCPs *at all*, trying to get that clarified. How can context-mode get "into the MCP loop"?

re5i5tor · 2026-03-01T01:26:40 1772328400

See my comment above, context-mode has no way to inject itself into the MCP tool-call - response loop.

Still high-value, outside MCPs.

mksglu · 2026-02-28T21:10:27 1772313027

The people who spent years doing the work manually are the ones who immediately see where the bottlenecks are.

mksglu · 2026-02-28T21:09:36 1772312976

Good point on prompt cache invalidation. Context-mode sidesteps this by never letting the bloat in to begin with, rather than snipping it out after. Tool output runs in a sandbox, a short summary enters context, and the raw data sits in a local search index. No cache busting because the big payload never hits the conversation history in the first place.