Hacker Newsnew | past | comments | ask | show | jobs | submit | abelanger's commentslogin

Agents depend heavily on the quality of their individual components, so it's pretty obvious that demo agents are going to be incredibly unstable. You need a success rate for each individual component to be near 100% or build in a mechanism for corrective action (one of the things that Claude Code does particularly well).


(we haven't looked too deeply into agent-kit, so this is based on my impression from reading the docs)

At a high level, in Pickaxe agents are just functions that execute durably, where you write the function for their control loop - with agent-kit agents will execute in fully "autonomous" mode where they automatically pick the next tool. In our experience this isn't how agents should be architected (you generally want them to be more constrained than that, even somewhat autonomous agents).

Also to compare Inngest vs Hatchet (the underlying execution engines) more directly:

- Hatchet is built for stateful container-based runtimes like Kubernetes, Fly.io, Railway, etc. Inngest is a better choice if you're deploying your agent into a serverless environment like Vercel.

- We've invested quite a bit more in self-hosting (https://docs.hatchet.run/self-hosting), open-source (MIT licenses) and benchmarking (https://docs.hatchet.run/self-hosting/benchmarking).

Can also compare specific features if there's something you're curious about, though the feature sets are very overlapping.


Definitely understand the frustration, the difficulty of Hatchet being general-purpose is that being performant for every use-case can be tricky, particularly when combining many features (concurrency, rate limiting, priority queueing, retries with backoff, etc). We should be more transparent about which combinations of use-cases we're focused on optimizing.

We spent a long time optimizing the single-task FIFO use-case, which is what we typically benchmark against. Performance for that pattern is i/o-bound at > 10k/s which is a good sign (just need better disks). So a pure durable-execution workload should run very performantly.

We're focused on improving multi-task and concurrency use-cases now. Our benchmarking setup recently added support for those patterns. More on this soon!


Hatchet is not stable.


Thanks! Would love to hear more about what type of agent you're building.

We've heard pretty often that durable execution is difficult to wrap your head around, and we've also seen more of our users (including experienced engineers) relying on Cursor and Claude Code while building. So one of the experiments we've been running is ensuring that the agent code is durable when written by LLMs by using our MCP server so the agents can follow best practices while generating code: https://pickaxe.hatchet.run/development/developing-agents#pi...

Our MCP server is super lightweight and basically just tells the LLM to read the docs here: https://pickaxe.hatchet.run/mcp/mcp-instructions.md (along with some tool calls for scaffolding)

I have no idea if this is useful or not, but we were able to get Claude to generate complex agents which were written with durable execution best practices (no side effects or non-determinism between retries), which we viewed as a good sign.


Thanks! Our favorite resources on this (both have been posted on HN a few times):

- https://www.anthropic.com/engineering/building-effective-age...

- https://github.com/humanlayer/12-factor-agents

That's also why we implemented pretty much all relevant patterns in the docs (i.e. https://pickaxe.hatchet.run/patterns/prompt-chaining).

If there's an example or pattern that you'd like to see, let me know and we can get it released.


For an agent that executes locally, or an agent that doesn't execute very often, I'd agree it's arbitrary.

But programming languages make tradeoffs on those very paths (particularly spawning child processes and communicating with them, how underlying memory is accessed and modified, garbage collection).

Agents often involve a specific architecture that's useful for a language with powerful concurrency features. These features differentiate the language as you hit scale.

Not every language is equally suited to every task.


OP here - this type of "checkpoint-based state machine" is exactly what platforms which offer durable execution primitives like Hatchet (https://hatchet.run/) and Temporal (https://temporal.io/) are offering. Disclaimer: am a founder of Hatchet.

These platforms store an event history of the functions which have run as part of the same workflow, and automatically replay those when your function gets interrupted.

I imagine synchronizing memory contents at the language level would be much more overhead than synchronizing at the output level.


This is also how our orchestrator (written in Go) is structured. JP describes it pretty well here (it's a durable log implemented with BoltDB).

https://fly.io/blog/the-exit-interview-jp/


Nice! It makes a lot of sense for orchestrating infra deployments -- we also started exploring Temporal at my previous startup for many of the same reasons, though at one level higher to orchestrate deployment into cloud providers.


Yep, though I haven’t used them, I’m vaguely aware that such things exist. I think they have a long way to go to become mainstream, though? Typical Go code isn’t written to be replayable like that.


I think there's a gap between people familiar with durable execution and those who use it in practice; it comes with a lot of overhead.

Adding a durable boundary (via a task queue) in between steps is typically the first step, because you at least get persistence and retries, and for a lot of apps that's enough. It's usually where we recommend people start with Hatchet, since it's just a matter of adding a simple wrapper or declaration on top of the existing code.

Durable execution is often the third evolution of your system (after the first pass with no durability, then adding a durable boundary).


What are the main differences between temporal and hatchet?


The primary difference is that Hatchet is an all-purpose platform for async jobs, so while durable execution is a pattern that we support, we have a lot of other features like concurrency and fairness control, event ingestion, custom queues, dynamic rate limiting, streaming from a background job, monitoring, alerting, DAG-based executions, etc. There's a bit more on this/our architecture here: https://news.ycombinator.com/item?id=43572733.

The reason I started working on Hatchet was because I'm a huge advocate of durable execution, but didn't enjoy using Temporal. So we try to make the development experience as good as possible.

On the underlying durable execution layer, it's the exact same core feature set.


This reads more like a pitch for open-source than anything else.

> Switching out something, even if it's open source and self-hosted, means that you're rewriting a lot of code.

The point of something open-source and self-hosted is that it resolves nearly all of the "taxes" mentioned in the article. What the article refers to as the discovery, sign-up, integration, and local development tax are all easily solved by a good open-source local development story.

The "production tax" (is tax the right word?) can be resolved by contributions or a good plugin/module ecosystem.


Open source is free if your time is worth nothing.


some people just don't understand business

people is gonna find out why companies pays top dollar for close source alternative vs open source product


I'm a big fan of https://github.com/humanlayer/12-factor-agents because I think it gets at the heart of engineering these systems for usage in your app rather than a completely unconstrained demo or MCP-based solution.

In particular you can reduce most concerns around security and reliability when you treat your LLM call as a library method with structured output (Factor 4) and own your own control flow (Factor 8). There should never be a case where your agent is calling a tool with unconstrained input.


I guess I’ve got some reading and research ahead of me. I definitely would rather support the idea of treating LLM calls more like structured library functions, rather than letting them run wild.

Definitely bookmarking this for reference. Appreciate you sharing it.


> program building is an entropy-decreasing process...program maintenance is an entropy-increasing process, and even its most skillful execution only delays the subsidence of the system into unfixable obsolescence

> Only humans can decrease or resist complexity.

For a simple program, maintenance is naturally entropy-increasing: you add an `if` statement for an edge case, and the total number of paths/states of your program increases, which increases entropy.

But in very large codebases, it's more fluid, and I think LLMs have the potentially to massively _reduce_ the complexity by recommending places where state or logic should be decoupled into a separate package (for example, calling a similar method in multiple places in the codebase). This is something that can be difficult to do "as a human" unless you happen to have worked in those packages recently and are cognizant of the pattern.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: