I've been building AI agents for a while. After trying every framework out there and talking to many founders building with AI, I've noticed something interesting: most "AI Agents" that make it to production aren't actually that agentic. The best ones are mostly just well-engineered software with LLMs sprinkled in at key points.
So I set out to document what I've learned about building production-grade AI systems: https://github.com/humanlayer/12-factor-agents. It's a set of principles for building LLM-powered software that's reliable enough to put in the hands of production customers.
In the spirit of Heroku's 12 Factor Apps (https://12factor.net/), these principles focus on the engineering practices that make LLM applications more reliable, scalable, and maintainable. Even as models get exponentially more powerful, these core techniques will remain valuable.
I've seen many SaaS builders try to pivot towards AI by building greenfield new projects on agent frameworks, only to find that they couldn't get things past the 70-80% reliability bar with out-of-the-box tools. The ones that did succeed tended to take small, modular concepts from agent building, and incorporate them into their existing product, rather than starting from scratch.
The full guide goes into detail on each principle with examples and patterns to follow. I've seen these practices work well in production systems handling real user traffic.
I'm sharing this as a starting point—the field is moving quickly so these principles will evolve. I welcome your feedback and contributions to help figure out what "production grade" means for AI systems!
Like you, biggest one I didn't include but would now is to own the lowest level planning loop. It's fine to have some dynamic planning, but you should own an OODA loop (observe, orient, decide, act) and have heuristics for determining if you're converging on a solution (e.g. scoring), or else breaking out (e.g. max loops).
I would also potentially bake in a workflow engine. Then, have your model build a workflow specification that runs on that engine (where workflow steps may call back to the model) instead of trying to keep an implicit workflow valid/progressing through multiple turns in the model.
[0]: https://mg.dev/lessons-learned-building-ai-agents/