Thanks :) thanks for those comments. Those are great questions. Let me respond to them one by one,
> Can I use just the model itself?
yes - our models are on huggingface. You can use them directly.
> Do you have models hosted somewhere or they run locally? If they run locally what are the system requirements?
arch gateway does bunch of processing locally for example for intent detection and hallucination we use nli model. For function calling we use hosted version of our 1.5B function calling model [1]. We use vllm to host our model, But vllm is not supported on mac. There are other issues too running model locally on mac. For example docker doesn't support giving gpu access on mac to containers. We tried using ollama in the past to host model but ollama doesn't support exposing logprobs. But we do have an issue on this [2] and we will improve it soon.
> Can I build RAG based applications on arch?
Yes you can. You would need to host vector db. In arch we don't host vector db, we wanted to keep our infra simple and clean. Do do have a default target that you can use to build RAG application. See this demo for example see insurance agent demo [3]. We do have an open issue on building a full RAG demo here [4], +1 to it to show your support.
> How does parameter gathering work, is the model capable of conversing with the user to gather parameters?
Our model is trained to engage in dialogue if a parameter is missing because our model has seen examples of missing parameters during training. During our evals and tests we found out that our model could still hallucinate e.g. for the question "how is the weather" model could hallucinate city as "LA" even though LA was not specified in query. We handle hallucination detection in arch using nli model to establish entailment of parameters from input query. BTW we are currently working on to improve that part by quite a lot. More on that in next release.
> Can I use just the model itself?
yes - our models are on huggingface. You can use them directly.
> Do you have models hosted somewhere or they run locally? If they run locally what are the system requirements?
arch gateway does bunch of processing locally for example for intent detection and hallucination we use nli model. For function calling we use hosted version of our 1.5B function calling model [1]. We use vllm to host our model, But vllm is not supported on mac. There are other issues too running model locally on mac. For example docker doesn't support giving gpu access on mac to containers. We tried using ollama in the past to host model but ollama doesn't support exposing logprobs. But we do have an issue on this [2] and we will improve it soon.
> Can I build RAG based applications on arch?
Yes you can. You would need to host vector db. In arch we don't host vector db, we wanted to keep our infra simple and clean. Do do have a default target that you can use to build RAG application. See this demo for example see insurance agent demo [3]. We do have an open issue on building a full RAG demo here [4], +1 to it to show your support.
> How does parameter gathering work, is the model capable of conversing with the user to gather parameters?
Our model is trained to engage in dialogue if a parameter is missing because our model has seen examples of missing parameters during training. During our evals and tests we found out that our model could still hallucinate e.g. for the question "how is the weather" model could hallucinate city as "LA" even though LA was not specified in query. We handle hallucination detection in arch using nli model to establish entailment of parameters from input query. BTW we are currently working on to improve that part by quite a lot. More on that in next release.
[1] https://huggingface.co/katanemo/Arch-Function-1.5B.gguf
[2] https://github.com/katanemo/archgw/issues/286
[3] https://github.com/katanemo/archgw/blob/main/demos/insurance...
[4] https://github.com/katanemo/archgw/issues/287