flash_us0101's comments

flash_us0101 · 2026-03-16T04:31:07 1773635467

CLI is great when you know what command to run. MCP is great when the agent decides what to run - it discovers tools without you scripting the interaction.

The real problem isn't MCP vs CLI, it's that MCP originally loaded every tool definition into context upfront. A typical multi-server setup (GitHub, Slack, Sentry, Grafana, Splunk) consumes ~55K tokens in definitions before Claude does any work. Tool selection accuracy also degrades past 30-50 tools.

Anthropic's Tool Search fixes this with per-tool lazy loading - tools are defined with defer_loading: true, Claude only sees a search index, and full schemas load on demand for the 3-5 tools actually needed. 85% token reduction. The original "everything upfront" design was wrong, but the protocol is catching up.

flash_us0101 · 2026-03-16T01:23:48 1773624228

Most replies here are about writing code faster. But there's a gap nobody's talking about: AI agents are completely blind to running systems.

When you hit a runtime bug, the agent's only tool is "let me add a print statement and restart". That works for simple cases but it's the exact same log-and-restart loop we fall back to in cloud and containerized environments, just with faster typing.

Where it breaks down: timing-sensitive code, Docker services, anything where restarting changes the conditions you need to reproduce.

I've had debugging sessions where the agent burned through 10+ restart cycles on a bug that would've been obvious if it could just watch the live values.

We've given agents the ability to read and write code. We haven't given them the ability to observe running code. That's a pretty big gap.

vel0city · 2026-03-16T01:30:26 1773624626

I've used agents to look at traces, stack dumps, and have used them to control things like debuggers. I've had them exec into running containers and poke around. I've had them examine metrics, look into existing logs, look at pcaps, and more. Any kind of command I could type into a console they can do, and they can reason about the outputs of such a command.

In fact last night I had it hacking away at a Wordpress template. It was making changes and then checking screenshots from a browser window automatically confirming it's changes worked as planned.

flash_us0101 · 2026-03-16T02:50:32 1773629432

That's close to what I'm thinking about. Curious what debugger setup you're using with agents - are you giving them access via MCP or just having them run CLI commands?

vel0city · 2026-03-16T15:17:44 1773674264

I just have it run CLI commands directly, usually with its own limited credentials and with me reviewing what its going to call outside of a small list of whitelisted commands. It'll then often do a good job composing things to filter using jq and other tools.

For things I have them do a good bit I've written out some basic skills with example usages of how to use those tools. I've also told it in its AGENTS.md to review man pages and issue --help if it isn't confident in how to use a tool.

In a way, imagine you're needing to teach a halfway technically competent person how to use your desired tool. Write a short, concise document about how to use the command. Include the common flags and options you might want it to use, give it some example output. If you see it making the same mistakes over and over, update the skill. Once you've got that skill ironed out, it can be very good at using the tool and understanding its outputs. You can even ask the agent assistance in writing the skill, and suggest it updates the skill when it has trouble doing things you've asked it to do.

One other thing I do, for agents I'm using to debug things I'll tell it in its AGENTS.md that it is only around for fact finding and investigations, that it should not modify environments or do things that change state. It can make recommendations and ask for me to choose to make changes, but never attempt to make any calls which may mutate the environment. Obviously, just asking it to do so doesn't mean it will never do it, but so far I haven't had it actually attempt to do things outside of what I've asked. But I'm also very picky about letting it reach out to things I don't completely control, as context poisoning is a good way to get burned.

And when its hopping in to try and diagnose an issue, give it context as to what you know about the environment. Give it some documentation. If you've got a coworker telling you about what they're seeing, feed that in as well. Imagine if you had someone just telling you "the system is down, fix it!" versus "when I go to this page on this site, it takes too long to load and often ends up giving me a 503 error". Which would you be more successful at rapidly finding a solution for?

threethirtytwo · 2026-03-16T01:26:05 1773624365

easy, give the logs timestamps, the LLM can sort the order.

flash_us0101 · 2026-03-16T02:45:17 1773629117

Timestamps aren't the issue. The problem is the cycle itself: stop the process, add the log line, restart, wait for the right conditions to hit that code path again. For anything timing-sensitive or dependent on external state, each restart changes what you're trying to observe.

flash_us0101 · 2026-03-12T09:22:35 1773307355

Thanks for sharing! Was thinking of doing similar tool myself. That's great alternative to -dangerously-skip-permissions

schipperai · 2026-03-12T09:46:27 1773308787

You are welcome!