Why did early multi-agent autonomous systems like AutoGen initially fail to meet the hype?
Multi-agent systems mapped well to our human understanding of how to organize work amongst ourselves (i.e. delegation, role responsibility, worker-to-worker communication).
But despite being amassing a ton of developer interest and producing some eye-opening demos, enterprises struggled deploying these implementations to production and opted for simpler single-agent approaches instead.
So why did they fail and what can we learn from the experience?
Watch Eric Zhu and I do a retrospective on AutoGen from our times at Microsoft and Microsoft Research in this clip from the latest episode of the Humans of AI Podcast.
In this episode of Humans of AI, I sit down with my friend Eric Zhu to unpack what are the frameworks and architectures powering modern AI products today.
From our times building Semantic Kernel, Autogen, and GraphRAG at Microsoft, we offer a retrospective on the evolution of this space and where we think it'll go in 2026 and beyond.
We dive into how AI systems are actually being built today, why early agent frameworks were brittle, and what’s changing as models, tooling, and abstractions improve. Eric shares insights from hands-on experience building AI systems, including lessons around memory, orchestration, open source, and why building has become dramatically faster than even a few years ago.
We also talk about the human side of all this—how builders should think about careers in AI, avoiding burnout, and navigating a fast-moving field without getting overwhelmed.
Whether you’re an AI engineer, founder, researcher, or just curious about where things are headed, this conversation goes deep into the why behind modern AI systems.
Like a lot of people over the past week, I asked myself that question. After an embarrassingly long time searching, reading X/Reddit and watching TikToks on the subject, I walked away more confused than informed.
What used to be a no-brainer credit card with Bilt 1.0, suddenly became a big-brained endeavor with Bilt 2.0.
Before: “Pay rent, get points.”
Now: “Calculate your spend-to-rent ratio, Use Bilt Cash to offset your rent/mortgage fees (huh?), Choose multiplier tiers, etc...”
After using this I'm even more confused. For what I spend in a month, there is zero upside to using it for rent: if this calculator is correct, every $100 of the first $800 I spend on rent reduces the net value by $10. I thought this was supposed to be suited towards paying rent, not towards not paying rent!
In fact, with the $495/year Palladium card, all you need to break even is $50/mo on travel. If this calculator is correct, I find it hard to believe that would pencil out for Column N.A. considering I'd be getting $600/year in annual credits. Assuming there's no monthly spend requirement past the initial $4k in 3 months and I can just get the rewards with one month of purchasing thereafter, all I need to do is buy 6 month-long transit passes and I'm already up $155 for the year (ignoring the $1400 welcome bonus since I'd need to spend $4k to get it).
It seems like a great deal (half a year of free transit passes every year) but surely there's a gotcha somewhere in the fine print.
What can we learn from microservices when applied to AI agents?
Introducing MicroAgents!
Check out this Semantic Kernel blogpost that my colleague Chris and I wrote exploring patterns for getting agents to use reliably use a large number of tools.
Charles Packer is a current 5th-year PhD student at UC Berkeley and author of MemGPT an open-source project teaching large language models to manage their own memory for unbounded context! Charles is also a core member of the Berkeley Artificial Intelligence Research (BAIR) Lab and Real-Time Intelligent Secure Explainable Systems (RISE) Lab where his work has spanned reinforcement learning and autonomous driving.
Aayush Mudgal is a Senior Machine Learning Engineer at Pinterest, currently leading the efforts around Privacy Aware Conversion Modelling. Ayush has expertise in large-scale recommendation systems, personalization, and ads marketplaces, and in the past has conducted research on intelligent tutoring systems, developing data-driven feedback to aid students in learning computer programming.
A great episode with Adam Steinle, founder and CEO of hona.ai, a customizable, HIPAA-Compliant AI that searches and converts 1000s of health records into one simple patient overview!
Adam started out as a biomedical engineer, and after stints at Goldman Sachs and Meta, he returned to his roots to help medical systems communicate so that providers can learn about their patients more effectively.
Definitely give the episode a listen to hear Adam's story!
And if you want to share your story on the Humans of AI Podcast or want to partner in any way, please reach out!
An amazing conversation with Shishir Patil the creator of the Gorilla LLM, a large language model specifically trained to use APIs!
Shishir is currently a 5th year PhD student at the University of California, Berkeley whose work broadly covers ML-Systems, LLMs, Edge-ML, and Sky computing.
Definitely give the episode a listen to hear Shishir's story.
And to read more about #GorillaLLM, check out the project page!
Multi-agent systems mapped well to our human understanding of how to organize work amongst ourselves (i.e. delegation, role responsibility, worker-to-worker communication).
But despite being amassing a ton of developer interest and producing some eye-opening demos, enterprises struggled deploying these implementations to production and opted for simpler single-agent approaches instead.
So why did they fail and what can we learn from the experience?
Watch Eric Zhu and I do a retrospective on AutoGen from our times at Microsoft and Microsoft Research in this clip from the latest episode of the Humans of AI Podcast.
https://youtu.be/2cnxea3xkzM