Ajay4030's comments

Ajay4030 · 2026-05-19T21:05:00 1779224700

Most observability stacks instrument infrastructure execution well. They miss reasoning behavior — when an agent re-plans, switches tools, and eventually escalates, every OTel span looks healthy even as the agent silently degrades. I've been building the agentsre library to address this. This week's addition: reasoning_trace.py — a structured decision log emitter and CloudWatch metric publisher that adds Layer 3 (reasoning) observability on top of your existing OTel Layer 1 stack. The core metric is RTD (Reasoning Trace Depth) — how many re-planning cycles an agent goes through before completing or escalating. Baseline for healthy agents is 0–1. Above 3 is a warning. Above 5 is critical. RTD rises before HER (Human Escalation Rate), before user-visible latency, and before cost anomalies appear in billing. It's your earliest signal for silent agent degradation. GitHub: https://github.com/Ajay150313/agentsre MIT licensed. Zero external dependencies for core logic. boto3 optional for CloudWatch integration. One structured JSON entry per agent task — not per tool call. Curious whether others have built reasoning level telemetry, or whether everyone is still relying on infrastructure spans to catch agent issues.