If you wanted to compare OpenAI models against Anthropic or Google, wouldn't the...

OutOfHere · on June 18, 2024

That is a good use case and it's a good problem to have, certainly the kind I wanted to hear, but it's not a problem I have had yet.

Moreover, I absolutely expect to have to update my prompts if I have to support a different model, even if its a different model by the same provider. For example, there is a difference in the behavior of gpt4-turbo vs gpt4-o even though both are by OpenAI.

Specific LLMs have specific tendencies and preferences which one has to work with. What I'm saying is that the framework will help, but it's not as simple as switching the model class.

etse · on June 19, 2024

I'm not quite understanding how different prompts for different models reduces the attractiveness of a framework. A framework could theoretically have an LLM evals package to run continuous experiments of all prompts against across all models.

Also theoretically, an LLM framework could estimate costs, count tokens, offer a variety of chunking strategies, unify the more sophisticated APIs, like tools or agents–all of which could vary from provider to provider.

Admittedly, this view came just from doing early product explorations, but a framework was helpful for most of the above reasons (I didn't find an evals framework that I liked).

You mentioned not having this problem yet. What kind of problems have you been running across? I'm wondering if I'm missing some other context.