You build it on top of the API to bootstrap and prototype with a very high quality model that takes minutes to get started with. Then once you have operated for a while, you have a ton of logged data to train/retrieve/analyze and can swap in a competitor, downgrade the OA model, finetune instead, or something.
In OP case's, this worked perfectly: he built a working product quickly and easily, and... discovered that it was a product he didn't want to work on regardless of token cost, cut his losses, did the post-mortem, and moved on with his life.
Imagine if he had wasted a year & a ton of money building his own LLM "because I'll need to economize on tokens and have to build a custom LLM" before he could even launch!
In OP case's, this worked perfectly: he built a working product quickly and easily, and... discovered that it was a product he didn't want to work on regardless of token cost, cut his losses, did the post-mortem, and moved on with his life.
Imagine if he had wasted a year & a ton of money building his own LLM "because I'll need to economize on tokens and have to build a custom LLM" before he could even launch!