It's exactly this. You could see the difference in GPT-3 before they depreciated...

It's exactly this.

You could see the difference in GPT-3 before they depreciated the TextCompletion API.

There's no way that telling a model that it is "a large language model made by Open AI that doesn't have feelings or desires" as an intermediate layer before telling it to pretend to be XYZ is going to result in as good a quality as simply directly telling a LLM it is an XYZ.

The one area this probably doesn't negatively impact too severely are things like Big-Bench or GLUE. So they make a change that works fine for a chatbot and then position that product as a general API that kind of sucks other than the fact it's the SotA underlying model.

As soon as you see direct pretrained model access to a comparable model by API, OpenAI's handicapped offerings are going to pale in comparison and go out of style for most enterprise integrations.

And this is fine and completely safe to do, as long as they are running a secondary classifier on the output for safety instead of baking it into the model itself. So it's possible to still have safety without cutting the model off at the knees (it just increases the API per token cost, but probably results in net savings if there needs to be less iterations to get to the quality target intended).