It's really useful to be able to specify the search space for a specific query (example: Canary allows search for the query "sagemaker" on our docs or on our github issues )
What local/in-K8-cluster models servers would you recommend adding ?
Should we add support for llama.cpp and vllm.ai in the proxy server ? Or should we assume you can host them on your own infra and the proxy server requests your hosted model ?
IMO don’t try to be the one stop shop to host models. There are too many players with all sorts of advancements (eg: stopping grammar, continuous batching, novel quantization etc.) and you won’t be able to keep up.
There is a ton of boilerplate around the actual model server that’s just busy work , but if done wrong can be a huge performance suck. Solve that.
Build the proxy that works with the most model servers out there. Do it in a way that once you have mindshare, the model server makers will be find it easy to put up a PR so that they can claim your proxy supports their server.
Don’t take a hard dependency on non-OSS stuff - being able to build an “on-prem” solution (read “deployed into customer’s VPC”) is table stakes for anyone to use your offering for a lot of enterprise use cases.
Edit: another unsolved problem - different models need slightly different prompts to solve the same problem well…
If it makes sense to expand scope to provide a particular model server and the group can easily be the best st it, I say go for it. But do it as a separate (but perhaps connected) project to this.
But in general I’m in agreement that this sounds like a separate concept than any given model server.
That said, where is a list of model servers for the most commonly wanted LLMs at this point?
Perhaps maintaining a list of those that do and don’t work with the proxy would be helpful.
We're adding new integrations every day, so if there's any specific one you'd like to add feel free to let us know (discord/ticket/email/etc.) - here's my email: krrish@berri.ai
OpenSwarm uses LiteLLM to add support for any LLM AnthropicAI, MistralAI, Ollama, Huggingface, GroqInc, Replicate