Way back before ChatGPT, I finetuned a model (curie I believe) on OpenAI. However, the model was always so slow to warm up that I could not use it at scale and had to go back to the generic API in the end.
Have things changed, if you finetune a model, can you make a thousand requests to it and reasonably expect a response?
And relatedly, finetuning remains incredibly good value from what I can see (I was shocked when I first did it with curie that is only cost about $200!) - for great clients - is anybody out there finetuning a model according to each client? I manage Product for an app with power users and we have quite a lot of history/intent information and one use case for generative AI they already like as stands, but it's a bit too "generic". In this case RAG is not as useful as it's "always new" work.
If you fine tune an OpenAI model, you can fine tune another model with little efforts. Just use the model you prefer. Today it might be something from OpenAI, but it could also be a small mistral. It’s worth doing a few benchmarks.