Yeah there are people ITT claiming that even the API model marked as 3/14 release version is different than it used to be. I guess that's not entirely outside the realm of possibility (if OpenAI is just lying), but I think it's way more likely this thread is mostly evidence of the honeymoon effect wearing off.
The specific complaints have been well-established weaknesses of GPT for awhile now too: hallucinating APIs, giving vague/"both sides" non-answers to half the questions you ask, etc. Obviously it's a great technical achievement but people seemed to really overreact initially. Now that they're coming back to Earth, cue the conspiracy theories about OpenAI.
Could be. But it could also be that those people (myself included) are right.
It's not that this is without precedent - there's a paper and a YouTube video with Microsoft person saying on record that GPT-4 started to get less capable with every release, ever since OpenAI switched focus to "safety" fine-tuning, and MS actually benchmarked it by applying the same test (unicorn drawing in tikz), and that was even before public release.
Myself, sure, it may be novelty effect, or Baader–Meinhof phenomenon - but in the days before this thread, I observed that:
- Bing Chat (which I haven't used until ~week ago; before, I used GPT-4 API access) has been giving surface-level and lazy answers -- I blamed, and still mostly blame it on search capability, as I noticed GPT-4 (API) through TypingMind also gets dumber if you enable web search (which, in the background, adds some substantial amount of instructions to the system prompt) -- however,
- GPT-4 via Azure (at work) and via OpenAI API (personal) both started to get lazy on me; before about 2-3 weeks ago, they would happily print and reprint large blocks of code for me; in the last week or two, both models started putting placeholder comments; this I noticed, because I use the same system prompt for coding tasks, and the first time the model ignored my instructions to provide a complete solution, opting to add placeholder comments instead, was quite... startling.
- In those same 2-3 weeks, I've noticed GPT-4 via Azure being more prone to give high-level overview answers and telling me to ask for more help if I need it (I don't know if this affected GPT-4 API via OpenAI; it's harder to notice with the type of queries I do for personal use);
All in all, I've noticed that over past 2-3 weeks, I was having to do much more hand-holding and back-and-forth with GPT-4 than before. Yes, it's another anecdote, might be novelty or Baader–Meinhof, but with so many similar reports and known precedents, maybe there is something to it.
Fair enough, I think it's realistic that an actual change is part of the effect with the ChatGPT interface, because it has gotten so much attention from the general public. Azure probably fits that somewhat as well. I just don't really see why they would nerf the API and especially why they would lie about the 3/14 model being available for query when secretly it's changing behind the scenes.
FWIW I was pretty convinced this happened with Dall-E 2 for a little while, and again maybe it did to some extent (they at least decreased the number of images so the odds of a good one appearing decreased). But also when I looked back at some of the earlier images I linked for people on request threads I found there were more duds than I remembered. The good ones were just so mind blowing at first that it was easy to ignore bad responses (plus it was free then).
The specific complaints have been well-established weaknesses of GPT for awhile now too: hallucinating APIs, giving vague/"both sides" non-answers to half the questions you ask, etc. Obviously it's a great technical achievement but people seemed to really overreact initially. Now that they're coming back to Earth, cue the conspiracy theories about OpenAI.