Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Human Layer (YC F24) – Human-in-the-Loop API for AI Systems
353 points by dhorthy 8 days ago | hide | past | favorite | 196 comments
Hey HN! I'm Dex, building HumanLayer (https://humanlayer.dev), an API that lets AI agents contact humans for feedback, input, and approvals. We enable safe deployment of autonomous/headless AI systems in production. You can try it with our Python or TypeScript SDKs and start using it immediately with a free trial. We have a free tier and transparent usage-based pricing. Here’s a demo: https://youtu.be/5sbN8rh_S5Q?t=51

What's really exciting is that we're enabling teams to deploy AI systems that would otherwise be too risky. We let you focus on building powerful agents while knowing that critical steps will always get a human-in-the-loop. It's been dope seeing people start to think bigger when they consider dynamic human oversight as a key ingredient in production AI systems.

This started when we were building AI agents for data teams. We wanted to automate tedious tasks like dropping unused tables, but customers were (rightfully!) opposed to giving AI agents direct access to production systems.

Getting AI to "production grade" reliability is a function of "how risky is this task the AI is performing". We didn't have the 3+ months it would have taken to sink into evals, fine tuning, and prompt engineering to get to a point where the agent had 99.9+% reliability—and even then, getting decision makers comfortable with flipping the switch on was a challenge. So instead we built some basic approval flows, like "ask in Slack before dropping tables".

But this communication itself needed guardrails—what if the agent contacted the wrong person? How would the head of data look if a tool he bought sent a nagging Slack message to the CEO? Our buyers wanted the agent to ask stakeholders for approval, but first they wanted to approve the "ask for approval" action itself. And then I started thinking about it... as a product builder + owner, I wanted to approve the "ask for approval to ask for approval" action!

I hacked together a human-AI interaction that would handle each of these cases across both my and my customers' Slack instances. By this time, I was convinced that any team building AI agents would need this kind of infrastructure and decided to build it as a standalone product. I presented the MVP at an AI meetup in SF and had a ton of incredible conversations, and went all in on building HumanLayer.

When you integrate the HumanLayer SDK, your AI agent can request human approval at any point in its execution. We handle all the complexity of routing these requests to the right people through their preferred channels (Slack or email, SMS and Teams coming soon), managing state while waiting for responses, and providing a complete audit trail. In addition to "ask for approval", we also support a more generic "human as tool" function that can be exposed to an LLM or agent framework, and will handle collecting a human response to a generic question like "I'm stuck on $PROBLEM, I've tried $THINGS, please advise" (I get messages like this sometimes from in-house agents we rolled out for back-office automations).

Because it's at the tool-calling layer, HumanLayer's SDK works with any AI framework like CrewAI, LangChain, etc, and any language model that supports tool calling. If you're rolling your own agentic/tools loop, you can use lower level SDK primitives to manage approvals however you want. We're even exploring use cases where HumanLayer is used for human-to-human approval, not just AI-to-human.

We're already seeing HumanLayer used in some cool ways. One customer built an AI SDR that drafts personalized sales emails but asks for human approval in Slack before sending anything to prospects. Another uses it to power an AI newsletter where subscribers can have email conversations with the content. HumanLayer handles receiving inbound emails and routing them to agents that can respond, and giving those agents tools to do so. One team uses HumanLayer to build a customer-facing DevOps agent—their AI agent reviews PRs, plans and executes db migrations, all while getting human sign-off at critical steps and reaching out to the team for steering if it encounters any issues.

We have a free tier and flexible credits-based pricing. For teams building customer-facing agents, you get whitelabeling and additional features and priority support.

If you want to integrate HumanLayer into your systems, check out our docs at https://humanlayer.dev/docs or book a demo at https://humanlayer.dev.

Thank you for reading! We’re admittedly early and I welcome your ideas and experiences as it relates to agents, reliability, and balancing human+AI workloads.






Startup owner using AI with this need - needless to say, a real problem. I've considered DIYing an internal service for this - even if we went with you we'd probably have an intern do a quick and dirty copy, which I rarely advocate for if I can offload to SAAS. I'm sure you've put a fair bit of work into this that goes well beyond the human interaction loop, but that's really all we need. Your entry price is steep (I'm afraid to ask what an enterprise use-case looks like) and this isn't complicated to make. We don't need to productize or have all the bells and whistles - just human interaction occasionally. Any amount of competition would wipe out your pricing, so no I would not want to pay for this.

thanks for the validation of the problem! totally open to feedback about the solution, and totally get that you only need something simple for now. I want to point out that we do have a pay-as-you-go tier which is $20 for 200 operations, and have a handful of indie devs finding this useful for back-office style automations.

ALSO - something I think about a lot - if a all/most of the HumanLayer SaaS backend was open source, would that change your thinking?


My gut feeling is with where we're headed we'll clear that 200 pretty quickly in production cases, so we'd be interested in bit higher volume. Our dev efforts would probably clear that 200/mo. If the flow/backend was open-source that'd be a total game changer for us as I see it as an integral part of our product.

edit: I want to add here that while ycomb companies like yourself may have VC backing, a lot of us don't and do consider 500+/mo. base price on a service that is operations-limited to be a lot. You need to decide who your target audience is, I may not be in that audience for your SAAS pricing. This seems like a service that a lot of people need, but it also stands out to me as a service that will be copied at an extravagantly lower price. We have truly entered software as a commodity when I, a non-AI engineer, can whip up something like this in a week using serverless infra and $0.0001/1k tokens with gpt-o mini.


that makes sense - and have wondered a lot even more generally about the price of software and what makes a hard problem hard. Like Amjad from Replit said on a podcast recently "can anyone build 'the next salesforce' in a world where anyone can build their own salesforce with coding agents and serverless infra"

I think in building this some of the things that folks decided they don't want to deal with is like, the state machine for escalations/routing/timeouts, and infrastructure to catch inbound emails and turn them into webhooks, or stitch a single agent's context window with multiple open slack threads, but you're right, that can all be solved by a software engineer with enough time and interest in solving the problem.

I will need to clear up the pricing page as it sounds like I didn't do a good job (great feedback thank you!) - it's basically $20/200 credits, and you can pay-as-you-go, and re-up for more whenever you want. We are early and delivering value is more important to me than extracting every dollar, especially out of a fellow founder who's early. If you geniunely find this useful, I would definitely chat and collaborate/partner to figure out something you think is fair, where you're getting value and you get to focus on your core competency. feel free to email me dexter at humanlayer dot dev


I’m just armchair quarterbacking here but I feel like you should just do all features for every user with a single $/action rate, then give discounts for volume and/or prepayment. Even saying $20/200 is a clunky statement. You could just say $0.10 per action (the fact that you’re actually requiring me to make a $20 payment with a $20 charge once it gets to $10 or something like that isn’t even important to me on a pricing page, although when you mention it later in the billing page make sure you also tell people it’s a risk-free money back guarantee if that’s the case)

If there’s something that truly has an incremental cost to you, like providing priority support, that goes into the “enterprise pricing” section and you need to figure out how to quote that separately from the service. My guess is most people don’t want to pay extra for that, or perhaps they’d pay for some upfront integration support but ongoing support is not too important to them. Idk, that’s just my guess here.


thanks - definitely worth saying - I've thought a bit about the 10c/operation rather than 200/$20 - might give that a shot or A/B test a little

Big systems like Salesforce started as small things that more deeply learned about and more deeply understood unmet demand and customer needs, and then got to packaging it in a way to create something that grows.

Coding agents can help more with tasks and not quite big entire massive platforms on their own. Humans may be able to scale much further and bigger with their skills.


i like that angle...I also hear a lot that 'coding agents are great for prototypes, but we usually need a team to bring it to production'

First congrats on the launch - I like it.

My feedback: what’s there looks inviting. Email interaction is handy, other ways would be too.

If there was a low code way to arrange the humanlayer primitives for folks at the edge of using it, I think human tasks could meet something like this even broader. Happy to chat offline.

Onto your comment: The coding for coding agents is still kinda prototype. It feels like some folks quietly have setup a very productive workflow themselves for quite sometime.

Still, there no doubt you could ship production code in some cases - except ai needs to handle all the things development explicitly and implicitly checks before doing so.

Getting to build some things that became more than few orders of magnitude larger than planned, one can learn a lot from the deep experiences of others… and I’m not sure where that is in AI. Speaking to someone with experience and insight can provide some profound insight, clarification and simplification.

Still, an axiom for me remains: clever architecture still tends to beat clever coding.

The best code often is the code that’s not written and not maintained and hopefully the functionality can be achieved through interacting with the architecture.

This approach is only one way, but it can take both domain knowledge and data knowledge, to put in enough a domain and data driven design relative to how well the developer may know the required and anticipated needs.

The high end of software development is many leagues beyond even what I just described. There’s a lot of talk about 10x engineers, I’d say there can be developers who definitely can be 10x as effective or reach 10x more of the solution, than average.

If a lot of the code AI is modelled on is based on the body of code in repos, most on a wide scale may be average to above average at most, perfectly serviceable and iteratively updated.

Sometimes we see those super elegant designs of fewer tables and code that does near everything, because it’s developments 5th or 6th version creating major overhauls. It could be refactored, or if the schema is not brittle, maybe a rewrite in full or part of the exact same team is present to do it.

Today’s AI could help shed a light in some of those directions, relative to the human using it. This again says in the hands of an expert developer AI can do a ton more for them, but the line to automation might be something else.

There is agentic ai and human in the loop to still figure itself out, as well as how to improve the existing processes. 2025 looks to be interesting.


I think a lot about the low code side and how we can make that work...at the end of the day that looks like a feature/integration into other platforms and that means a lot of matching opinions/interfaces.

I think K8s ecosystem did this well but it required big cross-enterprise working groups that produced things like CSI, SMI, OCI, and before that could happen, there was like 5+ years of storming and forming that had to happen before the dust settled enough for enteprises to step forward and embrace k8s/containers as the new compute paradigm

maybe i'm overthinking things.

onto the coding things -

> clever architecture still tends to beat clever coding

love this

> best code often is the code that’s not written and not maintained

it's too bad not-writing-code isn't as satisfying as deleting code

> it’s developments 5th or 6th version creating major overhauls

yeah the best agent orchestration architecture I'm aware is on interation 4 going on 5. I told him to open source it but he said "its .NET so if I open source it, nobody will care" XD

> There is agentic ai and human in the loop to still figure itself out, as well as how to improve the existing processes. 2025 looks to be interesting.

i'm stoked for it


Maybe piggybacking on existing platforms is a way to go. You already have an API, maybe tying in a few major orchestration platforms like Zapier, n8n, make.com, and maybe even IFTTT, etc in one fell swoop would be a nice start to make it easier to tie in. Since you're into python, something like node-red might not be bad either.

There are lots of other process managers that could fit well with agents as well, python and not that could be reasonable bolt-ons or extensions.

I'd be very happy to check out the best agent orchestration architecture you're referring to, email is in my profile if easier :). 1

In terms of open sourcing, I don't think anyone would have an issue with .NET, I wouldn't. Many businesses have exposure to Microsoft 365 already and using Azure, kind of a massive built in audience.


If your use would be 500/mo, you’d just pay them $40 or $60 per month.

Honestly: don't spend your time on HN comments like this. Focus on customers who want to pay you for it.

I have built a dumb internal service for this that i have been using for more than a year now. And just today i've thought about turning it into a simple product, as i am in need to finance the work on my actual product. Then i saw this Show HN & then your comment. Which makes me wonder: how would you use such a service "occasionally"? And what would be your pricing expectation?

Hey, would you be open to checking out something I hacked together on GitHub? https://github.com/adrian-kong/hitl-middleware

I wasn’t sure if this would be relevant to you or useful at all, but it’s a quick solution I built for HITL workflows. Happy to hear your thoughts or if you think it’s applicable!


This might deserve to be the new to-do list everyone learns to build if only that there's so much to learn from trying on how to get it best.. this month or quarter.

What's an example of the use cases you're seeing with agents in your day-to-day?

Classic HN comment, almost a carbon copy of the Dropbox one: https://news.ycombinator.com/item?id=9224

I assume your reasoning is something like: if people are already paying out the nose for open AI calls, an extra ten cents to make a human in the loop check probably isn't bad, and realistically speaking ten cents isn't much when compared to a valuable person's time, and I guess the number of calls to your service is likely expected to be fairly low (since they by definition require human intervention) so you need a high per operation cost to make anything.

Even understanding that, the per operation cost seems astronomical and I imagine you'll have a hard time getting people past that knee jerk reaction. Maybe you could do something like offer a large initial number of credits (like a couple hundred), offer some small numbers of free credits per month (like.... ten?) and then have some tier in between free and premium with lower per operation pricing?

It also seems painful that the per operation average of the premium plan is greater than the free offering (when using 2000 ops). Imo you'd probably be better off making it lower than the free offering from 200 ops and up, to give people an incentive to switch. I imagine people on your premium plan using premium features would be more likely to continue to do so, for one. The simplest way to do this would be to bump up the included ops up to 5k I guess. Someone using less than 5k would still have a higher average price, but it seems like it would come off better.


thanks for the feedback, I spend a lot of time thinking about it. right now the premium tier includes features that are much harder to build/maintain and take more to integrate, so we want a bit of a commitment up front, but it does stick out to me that the price/op goes up in that case

we do have 100/mo for free at the free tier (automatic top up).

I think the comparison to how openAI calls are volume based (and rather $$) is a super valid one though and I lean on that a lot


Interesting tool, congrats on the launch!

I was wondering: have you thought about automation bias or automation complacency [0]? Sticking with the drop-tables example: if you have an agent that works quite well, the human in the loop will nearly always approve the task. The human will then learn over time that the agent "can be trusted", and will stop reviewing the pings carefully. Hitting the "approve" button will become somewhat automated by the human, and the risky tasks won't be caught by the human anymore.

[0]: https://en.wikipedia.org/wiki/Automation_bias


Premature optimization, and premature automation cause a lot of issues, and overlooking a lot of insight.

By just doing something manually 10-100 times, and collecting feedback, both understanding of the problem, possible solutions/specifications can evolve orders of magnitude better.


yeah the people who reach for tools/automation before doing it themself at least 3-10 times drive me crazy.

I think uncle bob or martin fowler said "don't buy a JIRA until you've done it with post-its for 3 months and you know exactly what workflow is best for your team"


I am starting to call that Harry Potter AI prompting.

Coding with English (prompting) is often most useful where existing ways of coding (an excel formula) can’t touch.

Using llms to evaluate things like an excel formulas instead of using excel doesn’t feel in the spirit of using this ai’s power.


this is fascinating and resonates with me on a deep level. I'm surprised I haven't stumbled across this yet.

I think we have this problem with all AI systems, e.g. I have let cursor write wrong code from time to time and don't review it at the level I should...we need to solve that for every area of AI. Not a new problem but definitely about to get way more serious


This is something we frequently saw at Uber. I would say it's the same as there's already an established pattern for this for any sort of destructive action.

Intriguingly, it's rather similar to what we see with LLMs - you want to really activate the person's attention rather than have them go off on autopilot; in this case, probably have them type something quite distinct in order to confirm it, to turn their brain on. Of course, you likely want to figure out some mechanism/heuristics, perhaps by determining the cost of a mistake, and using that to set the proper level of approval scrutiny: light (just click), heavy (have to double confirm via some attention-activating user action).

Finally, a third approach would be to make the action undoable - like in many applications (Uber Eats, Gmail, etc.), you can do something but it defers doing it, giving you a chance to undo it. However, I think that causes people more stress, so it’s rather better to just not do that than to confirm and then have the option to undo. It’s better to be very deliberate about what’s a soft confirm and what’s a hard confirm, optimizing for the human in this case by providing them the right balance of high certainty and low stress.


i never thought about undoable actions but I love that workflow in tools like superhuman. I will chat w/ some customers about this idea.

I also like that idea of:

not just a button but like 'I'm $PERSON and I approve this action' or type out 'Signed-off by' style semantics


I think the canonical sort of approach here is to make them confirm what they're doing. When you delete a GitHub repo for example, you have to type the name of the repo (even though the UI knows what repo you're trying to delete).

If the table name is SuperImportantTable, you might gloss over that, but if you have to type that out to confirm you're more likely to think about it.

I think the "meat space" equivalent of this is pointing and calling: https://en.m.wikipedia.org/wiki/Pointing_and_calling (famously used by Japanese train operators)


this is cool. I have been an andon cord guy forever

You could continually learn a distribution over AI responses and search for outliers to surface with urgency for approval.

i like this idea - runtime inference based on past responses that gets smarter dynamically is a really interesting space

P.S. nobody asked but since you made it this far - the next big problem in this space is fast becoming, what else do we need to be able to build these "headless" or "outer loop" AI agents? Most frameworks do a bad job of handling any tool call that would be asynchronous or long running (imagine an agent calling a tool and having to hang for hours or days while waiting for a response from a human). Rewiring existing frameworks to support this is either hard or impossible, because you have to

1. fire the async request, 2. store the current context window somewhere, 3. catch a webhook, 4. map it back to the original agent/context, 5. append the webhook response to the context window, 6. resume execution with the updated context window.

I have some ideas but I'll save that one for another post :) Thanks again for reading!


Temporal makes this easy and works great for such use cases. It's what I'm using for my own AI agents.

ah very cool! are there any things you wish it did or any friction points? What are the things that "just work"?

Essentially, you don't need to think about time and space. You just write more or less normal looking code, using the Temporal SDK. Except it actually can resume from arbitrarily long pauses, waiting as long as it needs to for some signal, without any special effort beyond using the SDK. You also automatically get great observability into all running workflows, seeing inputs and outputs at each step, etc.

The cost of this is that you have to be careful in creating new versions of the workflow that are backwards compatible, and it's hard to understand backcompat requirements and easy to mess up. And, there's also additional infra you need, to run the Temporal server. Temporal Cloud isn't cheap at scale but does reduce that burden.


helpful - thanks! I have played with temporal a bit but have this thought that since most AI tools represent state as just a rolling context window, maybe you don't have to serialize and entire call stack and you can cut a bunch of corners.

but we're all probably better off not investing that wheel


IMO just a rolling message history works for only the simplest of AI tools. Useful agents will tend towards much more complex state that extends into specific verticals/domains.

is that because of more deterministic AI flows like llm-as-judge, rag reranker, post-eval, etc?

do you think something like langgraph state is sufficient?


I've been using the DBOS (dbos.dev) framework to do this in a non-AI app that sends out render jobs, but imagine the process is no different for any other long-running request/response scenario:

1. Run the whole process in a workflow function. Give the run a unique ID, which can be used to automatically deduplicate runs, and will be used to look up the workflow later. This fires off the request in a "step" function, and calls `recv` to wait for a response. The request should include a key that can be used to calculate the workflow ID. 2. DBOS automatically "stores the context somewhere" because of the `recv` 3. A separate HTTP handler in DBOS catches the webhook, and uses the key in the response to calculate the ID of the workflow from #1. 4. The HTTP handler calls `send` with that ID, so that the workflow can pick up whatever response is sent 5. The original workflow resumes with the response from `send` 6. The original workflow can do whatever it wants with the response

DBOS provides reliability around the whole thing (restarts any workflows in the case of any server restarts), and provides some tracing for the process out of the box, so it was quite simple to get started, and have it hosted in DBOS cloud , which also provides a public IP so that the external service can send a webhook response.


The MCP[1] that was announced by Anthropic has a solution to this problem, and it's pretty good at handling this use case.

I've also been working on a solution to this problem via long-polling tools.

[1] https://github.com/modelcontextprotocol


thanks for bringing this up. I just spent 2 hours last night digging into MCP - I'd love to learn more about how you think this solves the HitL problem. From my perspective MCP is more of a protocol for tool calling over the stdio wire, and the only situation it provides HitL is when human is sitting in the desktop app observing the agent synchronously?

Again, genuinely looking to learn - where does MCP fit in for async/headless/ambient agents, beyond a solid protocol for remote tool calls?


You could implement some blocking HitL service/tool as an MCP server.

ah okay - I guess in that case, I would like chain a HitL step as an MCP server that wraps/chains to another tool that depends on approval?

or is there a cleaner way to do that?


Yeah, exactly. You would define a HitL server and the actions it implements would be API calls to your system.

this is interesting. I will have to think more about how humanlayer can support an MCP integration/wrapper, it's not immediately obvious to me

i do think that AI-calling-tools is insufficient to provide bidirectional communication rails for user input/review though...not disagreeing just maybe thinking out loud a little here


Just to frame the problem slightly differently, if you had unlimited number of humans who could perform tasks as quickly as a computer this wouldn't be a problem need solving. So since we know that's the end state for any human-in-the-loop system then maybe it's worth solving that problem instead.

A few things come to mind, divide the problem into chunks that can be solved in parallel by many people. Crowd source your platform so there are always people available with a very high SLA, just like cloud servers are today.


We must do whatever we can to stay above the API:

https://www.johnmacgaffey.com/blog/below-the-api/


Great article, agreed. I don't want to work for a company where algorithms are weaponized against me.

the dystopian startups that use bounding boxes to observe workers in a warehouse and give the boss a report on how many breaks they took...they're here

I wonder if we can stay above the API if we manage to stay in control of the prompt.

Prompt == "incentive" for the AI, we are still the boss but the AI is just an underling coming to us with TPS reports.

That was a very interesting read, thanks!


Oh man, the API call for hl.human_as_tool() is a little ominous. Obviously approving a slack interaction is no big deal, but it does have a certain attitude towards humans that doesn't bode well for us...

so what I'm hearing is, if the approval is transparent and the agent doesn't see it, thats cool, but tell the agent "hey use the human as needed" and now we're getting into sci fi territory ?! either way i don't totally disagree

get more emphatic names, something better than "human_as_a_tool".

get_valued_employee_validation

new in 0.6.3 - manipulate_human_to_potentially_unsavory_ends()

so what you're saying you dont mind being used as long as we use a name that sounds empathetic to you? :)

Oh, I surely do mind. I am just helping the AI to manipulate the rest of humanity with less friction.

I, for one, welcome our agentic AI human-exploiting overlords.


10c per slack API call. I could make a mobile phone call for less than that in 1995. It is expensive...

IFTTT, Zapier, NodeRed, etc. are your competitors.

E.g.

https://ifttt.com/applets/J75VtBA9-get-an-email-when-a-webho... -> https://ifttt.com/applets/KWqQedih-make-a-web-request-when-i.... They have lots of AI things too.

The problem is you are saying "API" call so you are already dealing with devs. They can save $10k by writing their own Slack integration (even easier if they pay IFTTT $150/y), and the enterprises will want you to be all FedRAMP, ISO, SOC, Data Resident etc.


I think these are fair comparisons to make. I think the value is less in the slack api integration - anyone can plug in a slack client in afternoon.

When you get to timeouts, escalations, and routing 100+ conversations between 4+ users across multiple slack instances, that's when things got hectic for us.


I'm considering this for a workflow agent and would be keen to hear thoughts on this process.

We're a medical device company, so we need to do ISO13485 quality assurance processes on changes to software and hardware.

I had already been thinking of using an LLM to help ensure we are surfacing all potential concerns and ensure they are addressed. Partly relying on the LLM, but really as a method to manage the workflow and confirm that our processes are being followed.

Any thoughts on if this might be a good solution? Or other suggestions by other HN users.


> manage the workflow

Hey, if you're specifically looking for providing deterministic guardrails around agent calls, I'm solving that particular problem.

We're sort of an "RPC layer for tools with reasoning built in", and we integrate with human layer at the tool level as well.

We're operating a bit under the radar until we open-source our offering, but I'm happy to chat.


sounds cool, ping me when this is out i'd love to check it out

meant to reply sooner. It's an interesting problem. I'll have to think on this one.

Isn't this precisely how AI started? It was a bunch of humans under the hood doing the logic when the companies said it was AI. Then we removed the humans and the quality took a hit. To fix that hit, 3rd party companies are putting humans back in the loop? Isn't that kind of like putting a band-aid on the spot where your arm was just blown off?

No, not really.

If you have an AI that can answer 90% of queries correctly AND now this is the key, it knows which 90% it can answer correctly, human in the loop can be incredibly valuable to answer that other 10%.


hah yeah I don't know how soon we will be on great accuracy for the latter, for things like "send an email", people tend to just block everything for approval, because clicking approve 90 times hand editing 10 times is a lot better than copying 90 things from one app to another and then 10 things copy, hand edit, send

although I do have some ideas on how you could use vector similarity against past executions to get a 1-100 score on how likely a given action is to be approved rejected. You could set a dial to "anything below 60 just auto-reject it and provide the past feedback to the model preemptively". This would need a lot of experimentation, might even be a research angle (if it hasn't been tried already)

(thinking like cosine * {1 if approved, -1 if rejected} and normalize the score 1-100. You could maybe even weight rejection in 0 to -1 based on sentiment)


> Then we removed the humans and the quality took a hit.

What are you referring to exactly?


yeah it's an interesting point. I can only guess that we didn't do a good enough job of learning from the humans while they were doing their jobs...seems like traditional ML or even LLM tech might be good enough that we can take another pass? Overall the thesis of humanlayer is that you should do all this super gradually, move the needle from 1% AI to 99%+, and have strong SLOs/SLAs around when you pause that needle moving because quality took a hit.

My favorite part of all this is that it’s inevitable. Someone has to solve agent adoption in whatever-the-environment-already-is. And nobody is doing this well at scale. Europe is mandating this. And even though Article 14 of the AI Act won’t be enforced until 2026, I’m glad projects like this are working ahead. Get after it, Dex!

this guy gets it

Required reading for everyone considering a human-in-the-loop system:

https://pluralistic.net/2024/10/30/a-neck-in-a-noose/#is-als...

I wish OP the best of luck with their product. I still think that the points Doctorow has made are important to know and consider beforehand.


This is exciting. I am an architect in a startup that has long valued bringing humans in the loop for the moments when only humans can do the work. The key thing missing between the potential seen in the last couple years of LLM-based fervor and realizing actual value for us has been the notion of control and oversight. So instead, we have built workflows and manual processes in a custom way throughout the business. Happy to discuss privately sometime! (email in profile)

Congrats on the launch! I'll be thinking about this for a while to be sure.

P.S., there is a minor typo on the URL in your BIO.


thanks - fixed! emailed you separately

Definitely a problem that everyone needs to solve.

I wonder if you can achieve this workflow by just using prompt and the new Model Context Protocol connected to email / slack.

https://www.anthropic.com/news/model-context-protocol


so I played with MCP for a while last night and I think MCP is great as a layer to pull custom tools into the existing claude/desktop/chat experience. But at the end of the day its just basic agentic loop over tool calls.

If you want to tell a model to send a message to slack, sure, give it a slack tool and let it go wild. do you see a way how MCP applies for outer-loop or "headless" agents in a way that's any different from another tool-calling agent like langchain or crewai? IT seems like just another protocol for tool calling over the stdio wire (WHICH, TO BE CLEAR, I FIND SUPER DOPE)


I think the fact that MCP is two ways, and tool call is asynchronous (since it's over network), means that it should be possible to code an approval flow with a prompt and a slack approval tool?

Yeah it still takes a bit of work, but it's more extendable (can swap slack to email) and more versatile (human can message agent to interrupt or ask for clarification).

I haven't used langchain (beyond RAG) or crewai, not sure what was capable before.


You’re close. It’s not the humans in the loop in standard tasks you need though, it’s human surrogates for AI agents to do jobs they can’t for a variety of reasons (like missing a body or requiring an internet connection).

I have a request for startups for this: “GraggNet: Task Rabbit for AIs

Surrogate humans for AIs to use before robotics are human level”

https://ageof.diamonds/rfs


yeah this is cool. I saw a couple other people posting about this idea. I know some other folks working on "sourcing the humans" or doing a marketplace style thing. Thoughts on things like Payman or Protegee?

This is the first new YC launch I've seen involving AI that I am extremely positive about. I have worked with systems implementing similar functionality ad-hoc already, but seeing it as a buy-in service - and one so easy to integrate - is really cool.

From what I've seen, this will bring the implementation needs for this kind of functionality down from "engineering team" to a single programmer.


glad it resonates - and yes exactly - love the framing of "engineering team" -> single programmer.

Congratulations on launch! We’ve faced this problem with our autonomous web browsing agent https://www.donobu.com and ended up implementing a css overlay to wait for user input in certain cases. Slack would be so much better. Excited to try humanlayer out.

very cool - come ping us in discord happy to help out - we did do a demo w/ dendrite/stagehand a few weeks back where the AI can pull you into a browserbase session OR just ping you in plaintext to get things like MFA codes etc

Nice. I guess the issue is that this is such a basic i/o feature that any system with some modicum of customization can already do it.

It's like offering a service that provides storage by api for agents. Yeah, you can call the api, or call the s3 api directly or store to disk.

That said, I would try it before rolling my own.


Anecdotally, I've worked with and on a few enterprise AI apps and haven't seen this functionality in them. The closest thing i can think of is AI coding agents submitting PRs to repos.

tl;dr i agree

yeah in fact coding / PR-based workflows is one of the few areas where I don't really go super deep. GitHub PRs may have their shortcomings, but IMO it is the undisputed best review/approval queue system in existence by a mile.

i would never encourage someone to make an agent that asks human permission before submitting a PR. the PR is the HitL step


disagree with both, unless your AI agents have full root access to all your systems and access to your bank accounts and whatnot, they are at some point interfacing with other systems that have humans involved in them.

i think the slack side is easy. I think an AI-optimized email communication channel is a long ways off. I spent weeks throwing things at my monitor figuring out reliable ways to wire DNS+SES+SNS+Lambda+Webhooks+API+Datastore+Async Workers so that everything would "just work" with a few lines of humanlayer sdk.

And what we build still only serves a small subset of use cases (e.g. to support attachments there's a whole other layer of MIMEtype management and routing to put things in S3 and permission them properly)


>DNS+SES+SNS+Lambda+Webhooks+API+Datastore+Async Workers so that everything would "just work" with a few lines of humanlayer sdk.

What are you smoking my man?

Write a python script that begins with the 2 following lines "import openai import email "

Simple is better than complex


hmm, like, i love simplicity, and I'm open to other approaches, but I specifically wanted to solve the "send an email to an AI Agent" sort of concept, and give that agent rails to talk back the the human. Doesn't `import email` require SMTP, DNS, signing infra, etc? can it set up MX infra and receive payloads from a mail exchange?

Import email IS smtp.

MX and SPF is as trivial as with providers. Dkim might be harder. But emails will still go through.

You can also rent an smtp server and have a similarly simple infra where you log in to the server to read and send emails.

If this is too complex, I'd recommend starting with a private protocol like discord, which is designed for kids, or telegram as an intermediate step.


I think Human Layer is a great idea. Recently, my baby turned one year old, which made me reflect on many issues. We train AI with a lot of data but overlook the impact that decades of seemingly useless data from human growth experiences have on our brain development. As a result, humans still have an incomparable advantage over LLMs in terms of the so-called "big picture." For example, a recent experience I had was when I asked Claude 3.5-sonnet to write a bash script; it inadvertently modified the PATH variable, costing me a lot of time to fix it. Such attention to detail in work is difficult to avoid through vector db recall or manual context completion. But I believe that a true bash expert would not make such mistakes.

interesting take - thanks for reading our story- here's to the future of human-aided-bash-novices not munging our PATHs up

Congrats on the launch, this is an interesting concept. It's somewhat akin to developers approving LLM generated code changes and pull requests. I feel much more comfortable with senior developers approving AI changes to our codebase, then letting loose an autonomous agent with no human oversight.

super relevant - yeah I think it was someone at anthropic who framed this as "cursor tab autocomplete, but for arbitrary API calls" - basically for everything else other than code

Congrats on the launch! Human in the loop is an underserved market for AI toolchains. I've usually had to build custom tools for this which is a PITA.

Make.com has a human in the loop feature in closed beta. https://www.make.com/en/help/app/human-in-the-loop

There's also https://www.gotohuman.com/ that uses review forms.

Looking forward to playing with HumanLayer. The slack integration looks a lot more useful for my workflows than other tools I've tried.

In the demo video and example, you show faked LinkIn messages integration. Do you have any recommendations for tools that can actually integrate with live LinkedIn messages?


thanks for sharing your experience so far! Like I said, we built this ourselves for another idea and it was painful.

I have played with Make and I actually chatted w/ the gotohuman guy on zoom a while back, I like his approach as well, he went straight to webhooks which makes sense for big production use cases

re: LinkedIn, no I don't know how to get agents to integrate with linkedin. I have tried a bunch of things, I know of some YC companies that tried this but I don't know how it went for them. Best I have gotten is using stagehand/dendrite with browserbase to do it with a browser, and then using humanlayer human_as_tool to ping me if it needs an MFA token or other inputs


Thanks for the reply! I've used a bunch of grey market 3rd party tools for LinkedIn automation. Most of them have some sort of API. I'll try integrating with HumanLayer.

i am gonna talk with the guy who made trykondo.com this week, I think he has a lot of experience in that area too

This is a great idea- I hope that you are wildly successful.

I’m an AI skeptic mostly because I see people rushing to connect unreasoning LLMs to real things and as a result cause lots of problems for humans.

I love the idea of human-in-the-loop-as-a-service because at least it provides some sort of safety net for many cases.

Good luck!


glad it resonates. I came at this as a skeptic but also a pragmatist. I wanted deeply to build agents that did big things, but I had very little trust in them, and you see everywhere the internet is littered with terrible gpt-generated comments and bots these days...how do build AI that does a really good job without needing direct constant supervision (which at the end of the day just feels like a waste of time)

There is definitely a need for this.

What I don't understand from quickly skimming your description and homepage: Do you source/provide the humans in the loop? That's a good value add, but how do I automatically / manually vet how you do the routing?


great comment - today we don't provide the humans, i think there's two angles here

- providing the humans can be super valuable, especially for low-context tasks like basic labeling

- depending on the task, using internal SMEs might yield better results (e.g. tuning/phrasing a drafted sales email)


Congrats Dex! Excited to see what people build with this + tools like Stripe's new agent payments SDK (issuing a payment seems like a great place to ask permission).

wow I'm so glad you asked cuz i just shipped this https://github.com/dexhorthy/mailcrew

congrats on the launch dex! this is a problem that i've already seen come up a dozen times and many companies are building it internally in a variety of different ways. easier to buy vs. build for something like this imo, glad its being built!

awesome - glad to hear it resonates

Sounds great. Perhaps an interesting aspect: I haven't discovered the right words for it but: If it is your job, answer the fucking question. This layered approach might prove more of a gain than imagined. They might not show it but some people are terrified to ask even the first question. Others think it perfectly fine to ask 100 questions they already know the answer to.

I knew this was coming, so kudos to you all for getting out of the gate!

I've implemented this in our workflows, albeit a bit more naive: when we kick off new processes the user is given the option to "put a human in the loop" -- at which point processing halts and a user/group is paged to review the content in flight, along with all the chains/calls.

The human can tweak the text if needed and the process continues.


makes sense - glad to hear the problem resonates - if you had an extra engineer, how would you evolve what you have today?

So this is an automated foreman for the customer's own employees, like a call center controller? Or does HumanLayer provide the human labor, like Mechanical Turk?

The API contains a "human_as_tool" function. That's so like Marshall Brain's "Manna".

"Machines should think. People should work." Less of a joke every day.


I'm not sure "automated foreman for employees" is right - I always thought about it more like "a human can now manage 10-15 AI 'interns'" and review their work without having to do everything by hand - the AI still serves the human, and "human_as_tool" is a way for AI to ask for help/guidance.

> "Machines should think. People should work." Less of a joke every day.

yes. I agree. a little weird. I forget where I heard this but the other version is "we should get ai/robots to cook and do laundry so we can spend more time writing and making art...feels like we ended up the other way around"


It's generally recommended to add a meat-gap interface between AI systems to reduce unexpected results.

Meat-gap. We have your back.


BRB rebranding to meatgap.dev

How does it compare with the built-in human-in-the-loop feature from langgraph? Or CrewAI allows humaninput as well right?

great question - yeah i was actually heavily inspired by people trying to figure that stuff out on reddit back in july, and realizing that mapping that human input across slack, email, sms was never going to be a core focus for those agent frameworks

The idea is great and necessary. It doesn't seem super hard to replicate but why would anyone build their own solution if something already exists and works fine.

The thing that got me thinking... how do you make sure an LLM won't eventually hallucinate approval -- or outright lie about it, to get going?

Anyway, congrats, this sounds really cool.


At some point the real tool has to be called, at that point, you can do actual checks that do not rely on the AI output (e.g., store the text that the AI generated and check in code that there was an approval for that text).

yeah I think that's right - we put humanlayer in between the non-deterministic (LLM Decision) and the deterministic code (tool execution logic)

The biggest problem is that a lot of times "approval" requires domain knowledge / specific training so handing it off to some random dude will result in lots of errors, no better than just having some AI model decide.

Not to mention anything that requires a quicker response rate.


I get the impression the "random dude" is the expert you are already paying $100k/yr for. It sends them a slack message, email, etc. The sales pitch is you don't have to code that microservice so you can ship your AI a bit faster. Which is fair enough, since most SaaS is 99% other PaaS/SaaS + some core unique business factor thrown on top. However I worry about how much this company is charging. It is pricey.

Is that possible to connect it to an existing website chat widget apps like tawk?

Also, caught a few typos on the website: https://triplechecker.com/s/992809/humanlayer.dev


thanks - great catch. cool service :)

Congrats! Looking forward to getting HumanLayer integrated into our stuff

nice man welcome aboard

Congrats on the launch. Seems quite interesting!

My only queation/feedback is: What policies do you have in place to prevent bad outsourcing or exploitative behavior like what has been done by big-tech companies that turn a blind eye to what happens in, for example, Kenya [0] for the training of AI models?

[0]: https://youtu.be/qZS50KXjAX0


Just an idea: having a little widget in the MacOS menu bar that pops up or sends you a notification to solve a human task wouldn't be so terrible either.

ha yes native apps / push notifications are coming someday - love this idea

So many uses for this. Excited to see how it develops.

thanks! What's your favorite potential use case.

I work in operations/finance. I've experimented with integrating LLMs into my workflow. I would not feel comfortable 'handing the wheel' to an LLM to make actions autonomously. Something like this to be able to approve actions in batches, or approve anything external facing would be useful.

Loving you guys have Typescript support from day one!

hah thanks dude! I am very bullish on TS as the long term thing, Not to turn this into a language vs language thread but I spend a lot of time thinking about why ppl struggle so much with python...so far I came up with

concurrency abstractions keep changing (still transitioning / straddling sync+threads vs. asyncio) - this makes performance eng really hard

package management somehow less mature than JS - pip been around way longer than npm but JS got yarn/lockfiles before python got poetry

the types are fake (also true of typescript, I think this one is a wash)

the types are fake and newer. typing+pydantic is kinda bulky vs. TS having really strong native language support (even if only at compile time)

virtual environments!?! cmon how have we not solved this yet

wtf is a miniconda

VSCode has incredible TS support out of the box, python is via a community plugin, and not as many language server features


I am okay with a counterfactual alternate future where some disproportionately powerful entity squeezes Python out of the market: Big TypeScript - funded by a PAC. Offshore accounts. Culprit: random rich Googler who lost an argument to Guido Van Rossum 10 years ago.

lol this is why i come to this site

100%. I build edgechains (https://github.com/arakoodev/EdgeChains/) and a super JS/TS maxi for genai applications.

I feel like HumanLayer is a great idea, but decision fatigue and bystander effects could pose challenges. If people are overloaded with approvals or don't feel ownership over what they're verifying, the quality of oversight might drop. + also even if approved, you still have to make sure the agents doesn't hallucinate at the execution phase.

Don’t worry, that’s why we’re launching AI4HumanLayer.ai.io.

Tired of those pesky review requests? Can’t be bothered to read an email let alone a complicated AI approval context? Want to improve your response time by 500% while displaying that Real Human Intervention badge? Now you can with AI4HumanLayer!


This seems generic enough that it could almost be applied to any use case. Have you considered catpcha as a use case?

If you're talking about CAPTCHA solving as a service, that already exists, and the cost is measured in mere dollars per thousand CAPTCHAs solved.

Why the "if"? Of course, I was talking about captcha, is the regex parser in your brain case sensitive?

No need to resort to ad hominems; I just wasn't sure what you meant by "CAPTCHA as a use case" and wanted to point out that it's possible to do much better than $0.1/CAPTCHA solved.

that's a great idea - I put together one example for getting an MFA code for a website, but the captcha thing "pull a human into a web session" is something I've wanted to play with for a while

I think at some point, the term API should be replaced with another acronym to emphasize humans as the focal point.

SWE Agent coined "agent-computer-interface" based on HCI. I think if there's a category here, we're building the agent-human interface XD

ACI doesn't have the same ring to it, only if there was a way to replace that I with an E.

Have you built a reverse centaur system? https://pluralistic.net/2024/04/23/maximal-plausibility/

Neat, this could be a step forward from using something like n8n to manage processes, input and reviews.

We've built our own solution for that that we will soon be open-sourcing. It uses postgres in the backend and fully configurable draft box and scheduling, costing a fracture of what you're charging.

For more info and early testing, contact us at a@kundi.com


Poor form.

This is great. I can let you pay for that flight once the agent finds a match.

you get it

Hey Dex! Congrats on the launch - excited to see the response here :)

Congrats on the launch! Just commenting to wish you guys good luck

thanks!


So this is flipping the Human-AI working model and basically using the human as the tool?

this is the AI-induced offshoring in the making ;)

The limits of LLM capabilities will cause AI agents to displace people from warehouses/offices to their home doing conceptually the same job. And at a much lower salary, since they'll compete against anyone in the world with internet access.


Awesome! Congrats on the launch

Are you a solo founder Dexter?

i am for now. Been casually on the lookout for some other super dope builders but it's not a process you can control outside passive looking, and definitely not something to rush

glad yc is still funding solo devs.

we are certainly out here

Congrats on the launch Dex! A long way from the Metalytics days.

Can’t wait to try this out.


Congrats on the launch Dex!

Proud to have helped edit an earlier draft of this — go Dexter go!

Very cool! I'm trying this out for our agents system!

Congrats on the launch!!

thanks!

Looks super interesting

thank you for checking it out! what sorts of experiences have you had with agents so far?

Congrats on the launch! Big fan of what you guys are doing.

Great work there!!

Let's go Dex, congrats on the launch!

thanks dude!

Looks amazing! (Also, I've known Dexter since before Human Layer and he's a force of nature. If you think this is interesting now, you're going to be amazed at where it goes)

thank you! stoked for what's coming

Super useful

This is just so good. Congrats!

This is so sick

> $20/200

reduce your fractions ffs


ha fair enough - i think there's another comment thread on just being open w/ 10c / call and i wanna try that out

nice!

Congrats on the launch! Definitely a needed product. BTW, your docs link is broken, but working docs link is here: https://www.humanlayer.dev/docs/introduction

thank you! I updated the post and it should be fixed now!

Docs link is broken; https://www.humanlayer.dev/docs

oh wow! thank you! fixing!

Hold up, is that illustrious Sprout Social alumni Dex Horthy? If you and Ravi are in SF we should catch up after the holidays.

shoot me an email or find me on linkedin and lets catch up

Oh look! Corrupt Dang made another launch HN a top post.

So much corruption on this website.


Launch HNs for YC startups get placed on the front page. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.

The instructions for YC founders are here: https://news.ycombinator.com/yli.html, if anyone wants to take a look.

I think most people here consider it fair that HN gives certain things back to YC in exchange for funding it.


Ideally these posts should have some sort of "promoted" tag next to them.

Hiring humans to do a consistent job is gonna be a nightmare and a limit on the scalability of the service. How are you defining your service level agreements?

This really makes you take a step back and just consider the world we're in now: someone critiques a company's approach as unscalable because...

"hiring humans is a nightmare"

Good LLord


lol i like LLord I might steal that

They aren't providing the humans. Just the tools for integrating human input/oversight.

this is correct - I think helping you BYO humans will help you get much better training/labeling than outsourcing anyways, and that's the end vision of all of this - use humans to train agents so someday you might not need human in the loop, and those humans can move on to training/overseeing the next agent/application you're building



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: