This is honestly quite a detailed and thoughtfully put together post. I do have some questions and would love to hear your thoughts on those. First off, can I use just the model itself? Do you have models hosted somewhere or they run locally? If they run locally what are the system requirements? Can I build RAG based applications on arch? And how do you do intent detection in multi-turn dialogue? How does parameter gathering work, is the model capable of conversing with the user to gather parameters?
Arch-Function our fast, open source LLM does most of the heavy lifting on extracting parameter values from a user prompt, gathering more information from the user, determining the right set of functions to call downstream. Its designed for smarter RAG scenarios and agentic workflows (like buying an insurance claim through prompts). While you can use the model yourself, archgw offers a framework on usage to detect hallucinations and re-prompt the LLM if token logprobs.
The same model is currently being updated to handle complex multi-turn intent and parameter extraction scenarios, so that the dreaded follow-up, clarifying RAG use case can be effortlessly handled by developers without having to resort to complex LLM pre-processing. In essence, if the user's follow-up question is "remove X", their RAG endpoints get structured information about the prompt and refined parameters against which developers simply have to retrieve the right chunks for summarization.
Thanks :) thanks for those comments. Those are great questions. Let me respond to them one by one,
> Can I use just the model itself?
yes - our models are on huggingface. You can use them directly.
> Do you have models hosted somewhere or they run locally? If they run locally what are the system requirements?
arch gateway does bunch of processing locally for example for intent detection and hallucination we use nli model. For function calling we use hosted version of our 1.5B function calling model [1]. We use vllm to host our model, But vllm is not supported on mac. There are other issues too running model locally on mac. For example docker doesn't support giving gpu access on mac to containers. We tried using ollama in the past to host model but ollama doesn't support exposing logprobs. But we do have an issue on this [2] and we will improve it soon.
> Can I build RAG based applications on arch?
Yes you can. You would need to host vector db. In arch we don't host vector db, we wanted to keep our infra simple and clean. Do do have a default target that you can use to build RAG application. See this demo for example see insurance agent demo [3]. We do have an open issue on building a full RAG demo here [4], +1 to it to show your support.
> How does parameter gathering work, is the model capable of conversing with the user to gather parameters?
Our model is trained to engage in dialogue if a parameter is missing because our model has seen examples of missing parameters during training. During our evals and tests we found out that our model could still hallucinate e.g. for the question "how is the weather" model could hallucinate city as "LA" even though LA was not specified in query. We handle hallucination detection in arch using nli model to establish entailment of parameters from input query. BTW we are currently working on to improve that part by quite a lot. More on that in next release.