I found myself building a bunch of LLM-backed features that needed to use tool calling, and some of those tools involved doing things that were somewhat high stakes - communicating on my behalf or modifying shared / production data.
one example - I wanted to replace a marketing website with a chatbot + vector DB loaded with the previous content, docs, and blog posts. Between hallucinations, missing knowledge base info, and the LLM generally writing like an psuedo-intellectual high schooler, I realized I couldn't trust it to communicate unsupervised with my website visitors. I needed a way to improve the percent of high-quality responses, and do it this at scale.
I wired up a prototype that would
1. consult me in slack before sending any response down to a website visitor
2. incorporate my feedback into the knowledge base, and
3. reformulate answers until I approved the message
4. send it to the visitor
That prototype evolved into what is now HumanLayer https://github.com/humanlayer/humanlayer#why-humanlayer
i've sorta made ad-hoc systems that do this myself, but didn't think to open source them
one thing that caught my eye was the part about incorporating feedback into the knowledge base - can you elaborate how you handle that?