"Making execution paths inspectable reduced failures far more than swapping models."
100% this. We found the same thing. The issue isn't that the model is dumb; it's that the state is opaque. Once we stopped treating agents like "chatbots" and started treating them like "state machines" (where you can see the exact input/output of every step), our debugging time dropped by 90% even with cheaper models.
That's actually the core thesis of Lár [0] — forcing every step to produce a structured diff so you have a complete "Flight Recorder" of the crash.
I've spent the last few months building Lár (Irish for "core"). It's a Python framework for building AI agents, but heavily inspired by the philosophy of "Glass Box" engineering rather than magical "Black Boxes".
The Problem:
Most agent frameworks today (LangChain, AutoGen) feel like magic. They hide the prompt chains, the state transitions, and the retry logic. When they break in production, debugging is a nightmare.
The Solution:
Lár is designed to be the "PyTorch for Agents". It uses a "Define-by-Run" architecture where:
1. Agents are just directed graphs (Nodes + Edges).
2. Every state transition is immutable and logged.
3. The engine produces a JSON "Flight Log" that makes the agent 100% auditable (useful for 21 CFR Part 11 compliance in healthcare/finance).
Tech Stack:
- IDE Friendly: Clone, `pip install`, and run. Build an agent in minutes.
- Zero Friction Models: Switch from Cloud to Local in 1 line. Just change `"gpt-4"` to `"ollama/phi4"`. No code refactoring.
- Hybrid Architecture: We proved that using code for logic (instead of LLMs) makes Lár 60x cheaper and significantly faster than standard "Chain" frameworks.
- Enterprise Patterns: Includes 18 Core Patterns out of the box (e.g., A/B Testing, Resumable Graphs, Security Firewalls).
- Just-in-Time Integrations: Don't wait for API wrappers. Drag our "Integration Builder" prompt into your IDE and get a type-safe tool in 30 seconds.
- Air-Gap Capable: No telemetry, no hidden clouds. Run entirely offline.
It’s open source (Apache 2.0). I’d love to hear what you think about the "Audit-first" approach vs the current "Chat-first" trend.
100% this. We found the same thing. The issue isn't that the model is dumb; it's that the state is opaque. Once we stopped treating agents like "chatbots" and started treating them like "state machines" (where you can see the exact input/output of every step), our debugging time dropped by 90% even with cheaper models.
That's actually the core thesis of Lár [0] — forcing every step to produce a structured diff so you have a complete "Flight Recorder" of the crash.
[0] https://github.com/snath-ai/lar