I started setting up my workflows using Temporal. It deploys as relatively light weight local app. For an isolated local installation it uses SQLite. It makes the process of dealing with API retries and organizing workflows and tasks really simple. I recommend giving it a try. It is, philosophically, exactly what this article is suggesting, but it adds an incredibly rich and flexible interface for agents to work with. Additionally, the web UI makes it very easy to inspect workflows, review agent execution, etc. Temporal also encodes much higher reliability into your system, almost for free. Distributed and reliable systems are hard, don't reinvent the wheel IMO.
If you find yourself wanting things like an easy way to then introspect your SQLite database, figure out what is happening in the workflow, compose individual tasks, make workflows trivially callable, etc, give Temporal a look.
Alongside this, I have mostly moved away from files for agents. Markdown and JSON are great, but also feel like traps when building out smaller local apps. LLMs are great at SQLite and you can render anything you want out of it (Markdown, JSON, etc). It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown. You get a nice portable self contained data management system that encourages agents to be more disciplined about how they structure their data than a bunch of files. It also continues to scale into MySQL/Postgres if your little local projects start to outgrow or become more formal, you already have schema and discipline around data.
It sounds like you’re running this mostly on a single machine?
Temporal gets much more complex with scale. Cassandra isn’t fun to manage. Ringpop and TChannel are hard to debug when things go wrong. The SQL backend support doesn’t support horizontally scaled replicas (just single instance) due to consistency requirements. Depending on how your code is written, modifying code baked into workflows becomes complex, as anything that modifies the history event ordering breaks determinism in already-deployed workers.
We use it heavily and everyone who started on it doing simple scripting/automation all love it, everyone who built real production systems on top of it all hate it. Possibly operator error, but my experience hasn’t matched the rosy picture painted in these comments.
Word on HN is that you're either paying more money than you expected for temporal's managed solution or taking on substantial ops burden ultimately running their very heavy system yourself.
I wouldn't know, I've not done either, but I'd like to learn more from your or other's experience.
I told an agent to set it up for me for some local stuff. It is written in Go. It has a painless path to run on a local SQLite DB. My agents use it to organize and coordinate workflows. It handles retries and long horizon tasks fine. As far as I can tell for the core workflows and tasks pieces it’s great. MIT license. Like anything it isn’t free to manage but it offers a lot in return. High reliability systems are hard. Temporal only solves some of it. It is far better than rolling it yourself.
I think a genuine problem right now is people are building agentic work flows and learning the hard way highly reliable agentic work flows are hard. Agents are unreliable. They are both not deterministic and not the backing APIs have pretty high error rates. Temporal has solved that pain for me and made it easy to diagnose problems.
I don’t have anything really large scale running. But big enough that it takes billions of tokens and high reliability to finish.
Could you expand on the "substantial ops burden"? Let's say you're using a managed Postgres instance as the underlying data store, how substantial is the ops burden in that case? I understand that temporal is actually a set of 4 or so microservices on top of a data store, but if you're already running a distributed system backed by k8s or something like that, it doesn't seem like it adds significant incremental ops on top of that. But I could be wrong.
My devops coworker just shrugs, pumps out some yaml and helm and away it goes.
It really depends on your experience and tolerance for a lot of things.
Usually maintenance burden doesent start to make itself known till you get off the happy path or something breaks. Sometimes it can be a long while before that happens, sometimes it happens right away.
I think it depends a lot on the operational maturity of the company. Some places are running the LGTM observability stack, sentry for error reporting, 24/7 on call rotations, playbooks for all alerts, etc. Those organizations will have less issues running systems like temporal because the operational framework is already there.
Other orgs have never heard of alerts or error reporting and naturally will not catch issues until they are catastrophic (for example services that crash frequently in the background go unnoticed until the crash frequency causes a catastrophic failure). In my experience a lot of issues are pretty simple such as running out of memory, CPU throttling, crashes caused by simple bugs (nil panics). If you have good observability you can catch those issues early.
For example: people rag on Ceph that their cluster somehow got into a broken state, but that really only occurs when abuse of the ceph cluster has went on long enough that the cluster finally reaches the tipping point where it is unrecoverable. If you set ceph up, follow the correct replication rules so components are spread across failure domains, and use the metrics and alerts that are distributed with ceph it is actually quite hard to break the cluster.
Very heavy indeed, people will confuse the durability that Temporal provide with all the other properties a distributed system needs. They will then think that Temporal will solve all their problems.
My favorite lens on SQLite is that it is actually two things:
1. A robust durability implementation
2. A library of high performance data structure and algorithms
The fact this it's SQL is nice, but those two attributes are what make it great.
For example, I'm implement an in-process event log that I want to be durable. I started simple, but soon saw some edge cases and instead of playing whackamole I just swapped to using sqlite as an ordered kv store that gives me ACID.
Another example: ingesting multiple inter related datasets. Instead of a dozen hash maps in memory, I load them up into sqlite (no persistence) and then slice and dice as I need to.
> an example of a case where you'd use SQLite instead of jq or grep through Markdown?
Usually we end up writing a script to incrementally refresh a data-set I'm analyzing (or have someone send me a copy after they pull it).
I've been using sqlite for anything which needs an UPDATE - modifying a row deep inside the data-set with jsonl is a pain.
My github is full of java programs which update sqlite3 files with threadpools and a single big lock around the UPDATE (& then I write or have an agent write code to analyze it).
DuckDB is slowly replacing it in the context of python, simply because of the ease of pushing a UDF into the SQL.
Also because I really like expressing things as LEAD/LAG with a UDF on top.
The moment my JSON has any sort of depth and I need to write a parser for it and potentially account for unspecified behavior. JSON's nice when it's nice, but it's terrible when it's terrible. It's 100x easier to write SQL than writing jq and... dear god if I have to use grep -A or -B, I'm doing something wrong. Constraints are actually a good thing!
The underlying database isn't the most important thing. Just use SQL. Its namespacing (eg, through CTEs) is good and you're more likely to have colleagues who know SQL compared to jq.
SQLite is more efficient for large data sets. A single markdown or JSON file needs to be streamed to locate a piece of data O(n). Updating an existing entry in a sequential file is even worse because you have to rewrite the file. SQLite has the data structures to quickly find data in O(log n) time.
I'm someone else who has inherited a bunch of ad-hoc orchestration systems and also used Temporal quite heavily. The latter does certainly come with some overhead (not so bad in the age of LLMs), but it also guides you along a well-trodden path of good practices. The latter being very important - it means that when you want to take on more advanced capabilities, you probably haven't painted yourself into a corner too badly and can take that on fairly easily. Think: retries, multi-tenancy, multi-lang, observability, etc.
It does, my experience has been that it adds code complexity, deployment complexity, and performance problems. There are some observability benefits, but other ways to solve that. It's possible there are workloads that fit it but not anything I've personally worked on.
Well, just my experience. I installed it, had my agents configure it and it immediately solved problems I had with very little friction. Dealing with long running, long horizon agentic tasks that need very high reliability so I don’t have to babysit. I vibed the first version, realized I was reinventing reliable distributed systems. Stopped vibing and started surveying for something that fit :)
I can vouch for them too, being a super early adopter. One of the best early bets I've ever made. Awesome OSS product, glad the team decided to leave Uber to commercialize it.
AI Has empowered people to build things much more quickly. Not slop if you are even a little conscientious about how you use it. What it does not do fix the human structural problems. If you are solving the wrong problems you aren’t doing anything useful. Just because you can now take an idea to near completion doesn’t mean it was worth doing, but now you spent tokens and a lot of your mental bandwidth to finish it. Or worse you let it become slop and it will fall apart if you even look at it funny.
Previously if I needed to automate something I thought really carefully about it. Now, I still think really carefully about it. I had fun AI coding some tools I always wanted but they were just pet projects for me. I had fun AI slop coding a couple of things, but it was not good software. But if you have a clear and valuable target? AI can absolutely get you there.
Multiply that across all your colleagues and a lot /seems/ to be happening, but what is actually moving the needle?
Well. Start now. Treat it like an algorithm. Schedule reminders to text/email/call/follow up with people. My ADHD was hard. I would just forget about people and not because I don’t care about them. Then I would feel bad and delay even further because of that. Just do the thing. It may never feel natural except with very close and trusted people. That’s okay. Having friends for the sake of it isn’t the point. Being genuinely interested and sharing experiences and common interests and learning from each other are good reasons though.
> Being genuinely interested and sharing experiences and common interests and learning from each other are good reasons though.
(Not OP, but interested to hear more)
In terms of motivation, do you know of a way to begin a sincere and genuine interest in others that doesn't have some ulterior motivations? That may sound kind of mechanical, but what I mean is roughly something like: "I don't know people, so I do not have any 'genuine interest' in them. As a result, any interest that I do have is insincere."
I chose not to have friends for several decades, which has been extremely convenient for the most part, except for tasks that require more than one person, or work-related situations. Not having to worry about offending people, remembering birthdays, messing up my own plans for the needs of others, etc. was very burdensome. However, being able to use people as a job reference, or getting leads on future opportunities from people I used to work with would also be beneficial so I can understand why people would expend the effort. However, retaining a friendship solely for those job-related purposes seems grossly manipulative because there is no sincerity in what I want from them. I do not want them I only want to extract what they can give to me.
Is it simply understood that, if you make friends with someone as an adult, it is inherently with ulterior motivations in mind, whether it be to avoid loneliness, get work-related benefits, or extract knowledge from them? As a child, I think people tended to make friends simply because they were bored and the person seemed neat. Is that why people still try to make friends with people?
A genuine relationship is not transactional. I never expect anything out of a friend, or anyone really. I will simply give them my time, advice, or help because I choose to with zero motivation beyond it making me happy to know I could help someone in some way. I have limits, of course, but I never expect anything in return. It is as simple as that. Some of the time I ask friends for help or they offer it. I don't expect it or do anything in particular to encourage things out of them. A good friendship revolves around the common ground in that space. You like working on cars. You like talking about it and spending time on it. They like the same things. So you spend time on that thing together for no particular reason other than it being more interesting to do with another person.
We live in industrialized society, it is highly dependent on a vast ecosystem of other humans doing specialized jobs. To have a genuine interest you just decide to have one. Why do people choose the hobbies they do. Why this software project over that one. Why do some people like this car or that other car. What motivates people. If you ask people will almost universally be happy to tell you about things they care about. You don't need any particular reason. The fact that you are on HN indicates you at least nominally are interested in others.
I think some rare people genuinely are just happier off in a cabin in the woods, mostly independent of other humans, but we are generally not evolved that way. We simply have a vast amount of chemical and mental machinery dedicated to experiencing life as a social construct and system. Also, having friends to avoid loneliness is that exact machinery we evolved. It isn't required in any logical sense, but in a very real physical sense our bodies and minds reward us for socializing.
That is kind of the point. Doing something for no extrinsic reward. It is a part of practicing gratitude and expecting nothing in return (from the other person). If you experience genuine joy or happiness from helping others I think you are doing alright :)
I helped because I could and I wanted to. It makes me happy to help other people. Happy is a loaded word anyway. What is happiness? Some bits of chemicals in our brain? I have trained my reward function to be happy for something it gets no real material or survival benefit from. Maybe it thinks it is getting some benefit in my default mode network. I help people and they will help me? I am sure the DMN sets forth that narrative at some level. But there is a deeper trick to all of it in which I know there is often no survival or conscious narrative benefit. I just did a pro social human thing. Maybe the hormones that generates are enough to convince my DMN to keep doing it? Maybe if you wire your empathy centers deeply enough to experience things through other people you can convince your DMN it is valuable. IDK how it works, I do meditate on it. But mostly it is just about connection and helping people on their journey through life as countless others have helped me. I figure if I end it with having helped others more than I was helped that's a pretty good score. Sort of like the seinfeld quote about driving a porsche: Having the lowest mileage Porsche when reaching heaven signifies a failure to enjoy life, which is considered one of life's greatest sins?
something like that maybe? Who knows. I am going to keep putting help others mileage out there until my time is up, I am very fortunate in this life.
Most people want to spend time with others in the same kind of way they want to eat food or sleep or watch a movie. It just seems to be built in. People who appear to have ulterior motives are treated suspiciously. Some people seem to need a lot more social time than others, but most people desire at least a little bit of human contact.
You need to go to therapy and speak to a professional about this. Choosing to not have social connections is a deeply anti-human behavior, we were evolved to be social creatures after all.
Maybe you are truly asocial, but you come across as someone severely stunted emotionally if you think companionship means always extracting value out of someone.
I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.
They do still attenuate their latent space on prior conversations turns as authority. That is why I like pure design/review sessions and pure coding sessions, often at the same time. I can often keep design and review in the critic and review role without becoming a sycophant. Coding agent just picks up dispatches and works with very little opinion at all.
There is a great thing. Because the agents can do so much toil you can add things like formal verification, fuzzing, and other feedback mechanisms and quality gates to your projects cheaply. In a human written project you still needed those things, but it cost a lot. Agents require these quality gates and they can implement them for you. The problem with AI documentation is it will just write a lot of useless bullshit unless you guide it on what is important. You can also get agents to identify transitive dependencies via testing and other things.
I adopt the mindset of docs are for humans, tests are for agents. They document formal dependencies and leave a measurable artifact behind. If you identify some behavior or transitive dep in your system, agents document it first with a test codifying the expected behavior. Tests are the source of truth about expected system behavior and you can convince agents to write decent behavioral tests if you ask them to with the right structure. Docs are now cheap and a render, not a long term thing. There is some token efficiency to consider, but still, they are quick and cheap if you don't understand some module or its purpose.
Yeah "plus one" to this. Static analysis, fuzzing, linting, integration tests -- there are all sorts of very useful artifacts which have been around for a long time, but which are very time consuming to implement and then maintain. LLMs shift the economics around producing and maintaining these tremendously, so we can now afford these robust validation mechanisms.
These serve as living documentation which cries out in pain when they get out of sync with the system in question, generating specific error messages -- as opposed to natural language docs which rapidly drift into an ambiguous "kinda useful" state. And the validation is performed mechanically (as opposed to neurally) so no hallucinations are possible.
The one thing I would add is that you do want these artifacts to be human-friendly from a reading perspective -- you want engineers to be able to scan over these and check that they are validating the right things.
> Because the agents can do so much toil you can add things like formal verification, fuzzing, and other feedback mechanisms and quality gates to your projects cheaply
Works great until they sweep you a test under the rug which always passes because the condition is something like if(true) .
That was my point. Validating actual behavioral tests. Not letting them cheat. They still will at times, but like, resd their code, fix it or send a reviewer agent to find and make todo list. If you give them a behavioral test skill it will do a much better job. Sometimes I have to hint to them. I rarely ship anything I have not reviewed at least once.
> Not letting them cheat. They still will at times, but like, resd their code,
Well then, if they "still will", your effort kind of misses the point. Sure maybe, you'll catch it every time and maybe that one time you did not catch it, it was no critical mistake...But it only needs to make that critical mistake once, and all of this effort was in vain.
(as an outsider) what this sounds a lot like to me is trying to manage a very large team of human personnel that have a high turnover rate which is not directly in your control.
Some of them will make mistakes, some of them will cheat, some of them will do things you don't like, and "punishing" them will be less helpful to you due to the high turnover than building a system which instead disincentivizes things from a high level. Which catches bad actions and starts them over.
Classically I think we are more accustomed to "building a team of humans, and being able to chastize or fire a bad employee helps the team grow more cohesive and build accountability".
But it is possible to get the same (less than ideal) situation with teams of humans where accountability cannot be easily instilled into the team as we have with teams of agents.
And then obviously the reason one might consider using such an unusual and difficult to manage team as a tool is when the cost is low and the supply is high, which is purportedly the case with AI at least for the moment.
What particularly gets me is if you use AI with a bit of engineering rigor, especially around design and testing my experience is the latest models are great to work with. They can structure performance and stability tests, implement 90%. Humans have to do the hardest and critical 10% of the design. The current tools are good enough to do virtually all of the implementation now if your quality gates are right and your design is good enough, but you absolutely have to design the right things for your scale and reliability needs or very bad things are in store.
I was in high school in the 90s in the deep south and was taught about the horrors of slavery. Has that changed? Not disagreeing about reconstruction. It established a pattern of letting conservatives get away with their malfeasance and “let’s just get along” politically instead of extracting a real price for crime and destructive behavior.
I replaced common grep with a semantic search wrapper for some projects. It was amusing. It has a response header that lets Claude know it is not using standard grep. Works fine. Have to out smart them ;)
Write in C, snapshot to memory safe production Rust that had been verified and differential tested to within an inch of its life. This is a hard reality. Ports will become cheap. Disposable. The vuln finding capabilities are changing everything and not a lot of solutions are out there yet. We can automate the ports and get them efficient.
OpenSSH is a legitimately high bar, one of the hardest targets in all memory-unsafe software.
Curl is a high bar for a different reason (the same one as sudo): it doesn't do enough to be all that interesting. Stenberg is having trouble keeping up with all the inbounds, but look at the 2026 CVEs: they all seem kind of boring? Exploit developers aren't hunting for "wrong reuse of HTTP Negotiate connection". Like, yes, these are legitimate bugs, important that they get fixed, but none of them are prizes.
By rights, OpenSSH should be a smoking crater. It's not, I believe because of sheer engineering excellence.
If you find yourself wanting things like an easy way to then introspect your SQLite database, figure out what is happening in the workflow, compose individual tasks, make workflows trivially callable, etc, give Temporal a look.
Alongside this, I have mostly moved away from files for agents. Markdown and JSON are great, but also feel like traps when building out smaller local apps. LLMs are great at SQLite and you can render anything you want out of it (Markdown, JSON, etc). It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown. You get a nice portable self contained data management system that encourages agents to be more disciplined about how they structure their data than a bunch of files. It also continues to scale into MySQL/Postgres if your little local projects start to outgrow or become more formal, you already have schema and discipline around data.
reply