I am not sure what's the stereotype, but I tried using langchain and realised most of the functionality actually adds more code to use than simply writing my own direct API LLM calls.
Overall I felt like it solves a problem doesn't exist, and I've been happily sending direct API calls for years to LLMs without issues.
JSON Structured Output from OpenAI was released a year after the first LangChain release.
I think structured output with schema validation mostly replaces the need for complex prompt frameworks. I do look at the LC source from time to time because they do have good prompts backed into the framework.
IME you could get reliable JSON or other easily-parsable output formats out of OpenAI's going back at least to GPT3.5 or 4 in early 2023. I think that was a bit after LangChain's release but I don't recall hitting problems that I needed to add a layer around in order to do "agent"-y things ("dispatch this to this specialized other prompt-plus-chatgpt-api-call, get back structured data, dispatch it to a different specialized prompt-plus-chatgpt-api-call") before it was a buzzword.
It's still not true for any complicated extraction. I don't think I've ever shipped a successful solution to anything serious that relied on freeform schema say-and-pray with retries.
> so it's not a panacea you can count on in production.
OpenAI and Gemini models can handle ridiculously complicated and convoluted schemas, if I needed complicated JSON output I wouldn’t use anything that didn’t guarantee it.
I have pushed Gemini 2.5 Pro further than I thought possible when it comes to ridiculously over complicated (by necessity) structured output.
When my company organized an LLM hackathon last year, they pushed for LangChain.. but then instead of building on top of it I ended up creating a more lightweight abstraction for our use-cases.
I am coaching table-tennis, and sometimes I tell people that we only actually "learn" while we sleep. So, without sleeping, the brain doesn't have time to "save" the new information for future use.
Not sure if it's factually correct, but it seems about right, sleeping seems to be the magic sauce, and the time when all memories are written from RAM to disk.
Also, funny how they included GPT-5.0 and 5.1 but not 5.2... I'm pretty sure they ran the benchmarks for 5.0, then 5.1 came out, so they ran the benchmarks for 5.1... and then 5.2 came out and they threw their hands up in the air and said "fuck it".
Good point. It also reminded me of when I was trying to optimize my app for some scenarios, then I realized it's better to optimize it for ALL scenarios, so it works fast and the servers can handle no matter what. To be more specific, I decided NOT to cache any common queries, but instead make sure that all queries are fast as possible.
I've recently added error tracking to my self-hosted analytics app (UXWizz), and the way I did it is simply add extra events to each user/session. Once you have the concept of a session or user, you can simply attach errors or logs as Events stored for that user. This solves the main problem mentioned in the article, where you don't know what happened, plus being an Event stored in a MySQL database, you can still query it.
Why not simply use Events for logging, instead of plain strings?
reply