We’ve been working on a tool called CodeLedger to solve a problem we kept seeing with AI coding agents (Claude Code, Cursor, Codex):
They’re powerful, but on real codebases they:
- read too much irrelevant code
- edit outside the intended scope
- get stuck in loops (fix → test → fail)
- drift away from the task
- introduce architectural issues that linters don’t catch
The root issue isn’t the model — it’s:
- poor context selection
- lack of execution guardrails
- no visibility at team/org level
---
What CodeLedger does:
It sits between the developer and the agent and:
1) Gives the agent the right files first
2) Keeps the agent inside the task scope
3) Validates output against architecture + constraints
It works deterministically (no embeddings, no cloud, fully local).
---
Example:
Instead of an agent scanning 100–500 files, CodeLedger narrows it down to ~10–25 relevant files before the first edit :contentReference[oaicite:0]{index=0}
---
What we’re seeing so far:
- ~40% faster task completion
- ~50% fewer iterations
- significant reduction in token usage
---
Works with:
Claude Code, Cursor, Codex, Gemini CLI
---
Repo + setup:
https://github.com/codeledgerECF/codeledger
Quick start:
npm install -g @codeledger/cli
cd your-project
codeledger init
codeledger activate --task "Fix null handling in user service"
---
Would love feedback from folks using AI coding tools on larger codebases.
Especially curious:
- where agents break down for you today
- whether context selection or guardrails are the bigger issue
- what other issues are you seeing.