It really bugs me when people post these threads without saying whether they used GPT-3 or GPT-4. And if it's academic, I sort of suspect it's not GPT-4, unless they are paying for subscriptions for all of their students.
Both 3.5 and 4 hallucinated according to the professor:
> Most used 3.5. A few used 4 and those essays also had false info. I don't think they used any browsing plug-ins but it's possible--it was a take-home assignment and not one they did in class.
The public at large can be forgiven for not assuming a half version is the difference between being factually incorrect or not. If OpenAI actually even has confidence theyd call one model “sorta true sometimes” and “mostly true but only for certain things depending how convincing you need it to be”.
Users wont care whats true or not and blind trust in LLM output is the issue.
It makes a difference... a big difference.