Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: HumanLayer – Human-in-the-Loop for AI Agents (github.com/humanlayer)
5 points by dhorthy 80 days ago | hide | past | favorite | 2 comments
I found myself building a bunch of LLM-backed features that needed to use tool calling, and some of those tools involved doing things that were somewhat high stakes - communicating on my behalf or modifying shared / production data.

one example - I wanted to replace a marketing website with a chatbot + vector DB loaded with the previous content, docs, and blog posts. Between hallucinations, missing knowledge base info, and the LLM generally writing like an psuedo-intellectual high schooler, I realized I couldn't trust it to communicate unsupervised with my website visitors. I needed a way to improve the percent of high-quality responses, and do it this at scale.

I wired up a prototype that would

1. consult me in slack before sending any response down to a website visitor 2. incorporate my feedback into the knowledge base, and 3. reformulate answers until I approved the message 4. send it to the visitor

That prototype evolved into what is now HumanLayer https://github.com/humanlayer/humanlayer#why-humanlayer




nice problem to solve!

i've sorta made ad-hoc systems that do this myself, but didn't think to open source them

one thing that caught my eye was the part about incorporating feedback into the knowledge base - can you elaborate how you handle that?


great question and thanks for checking it out! I've talked to a number of folks who have built small/simple versions of this for various workflows.

The idea for incorporating feedback into the knowledge base is still coming together, in the prototype, LLM can classify the response as approval or not, and then if it's a rejection, llm will try to distill out facts/ideas from the response, e.g. "BigCorp and Acme.com are also using XYZ product" or "to learn more about pricing, you can book a meeting at LINK".

In the prototype, it then did a function call to add those as small chucks to the vector store, but you could also orchestrate that transparently if you didn't want to rely on the LLM reliably calling an `add_to_knowledge_base` function.

Longer term, I like the idea that I first heard of in BabyAGI, which is to store the messages leading up to an approval + the approval result in a vector DB, and use those historical approvals to derive up a confidence score for whether a particular action will be approved.

That stuff's more whiteboard stage than in code yet but I think it could be built.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: