This seems like a fantastic approach to smoothing their GPU utilization.
It’s so difficult to compete with OpenAI - they ship so many “no brainer” features that are constantly setting the bar for what an LLM provider needs to provide - chat API, multi modality, function calling, json output, it’s endless.
It’s so difficult to compete with OpenAI - they ship so many “no brainer” features that are constantly setting the bar for what an LLM provider needs to provide - chat API, multi modality, function calling, json output, it’s endless.