All the time, but only when prompted. You have to have a conversation with it and provide more detail which exposes the flaws in its previous answers, then it will happily apologize for its mistakes. (For me, this usually looks like me pasting an error message that its code caused.)
I really hope they find a way to have it apply context from future conversations such that when it learns the error of its ways it emails you a retraction, but that's probably a ways out because humans can't be trusted to not weaponize such a feature into sending spam.
But it doesn't learn its error, that's the whole problem.
It only responds to 'accusations' from user in the most common way, which is 'apologies-like'.
The weight of phrases like "you are wrong" is in fact so strong, that it fools the chatGPT to apologize for its 'mistakes' even in the scenarios where its text was obviously correct - like telling it 2+2 doesn't equal 4
Well yeah, it's an imperfect tool, and you have to treat it as such. Probably there's a lot to be discovered about how to use it most effectively. I just don't find that it's more problematic than the other tools in my box.
Sure, grep has never flat out lied to me the way chatGPT does, but it's a statistical model, not a co-worker, so I don't feel betrayed, I just feel... cautioned. It keeps you on your toes, which isn't such a bad state to be in.
It totally would, if Bing doesn't return relevant results.
I've asked BingGPT about myself and it gave me three answers. One was more or less on-point (it found my linkedin profile), and the other two were hallucinations. What happened was Bing found two unrelated pages and GPT has tried and failed to make sense of them.
Either that, or I am a prince whose name means "goose" in Polish.
Problematically, they're much better bullshitters than ChatGPT. And if you used Google to find them, they're probably either selling you something, or you had to navigate a minefield of people who are in order to find them.
We can downvote human comments and proposed solution (on stack overflow, hn, etc...) and also I don't expect colleagues to lied to me when I ask them about a feature or how to do xyz in a language or library or framework.
Bing, IIRC, has a way to provide feedback, not sure how useful it is for today's users and if it will be able to solve hallucinations one day.
I try to always give Bing+ChatGPT chat or search results a thumbs up or a thumbs down. I am using the service for free, so it seems fair for me to take a moment to provide feedback.
When google sends me to a website, I can at least judge the credibility of a website.
When ChatGPT tells me something, I have no idea if it's paraphrasing information gathered from Encyclopedia Britannica, or from a hollow-earther forum.
> When ChatGPT tells me something, I have no idea if it's paraphrasing information gathered from Encyclopedia Britannica, or from a hollow-earther forum.
Or it's something it just hallucinated out of thin air.
This is a real question, so I apologize if it comes off as sophistry:
Is the work of judging the accuracy of a summary not just the work of comprehending the non-summarized field?
For example, a summary could be completely correct and cite its facts exhaustively. Say you're asking about available operating systems: it tells you a bunch of true info about Windows and OSX, but doesn't mention the existence of Linux. Without familiarity with the territory, wouldn't verifying the factuality of each reference still leave you with an incomplete picture?
At a slightly more practical level, do you actually save any time if you've gotta fully verify the sources? I assume you're doing more than just making sure the link doesn't 404, as citing a link that doesn't say what it is made out to be isn't exactly a new problem, but at that point we're mighty close to the traditional experience of running through a SERP.
Finally, even if you're reading all the links in detail, isn't that still a situation prone to automation bias? There's a lot of examples of cases where humans are supposed to check machine output, but if it's usually good enough the checkers fall into a pattern of trusting it too much and skipping work. Maybe I'm just lazy, but I think I'd eventually get less gung-ho about verifying sources and eventually do myself a mischief.
I'm asking because I've been underwhelmed by my own attempts at using LMs for search tasks, so maybe I'm doing it wrong.
The average human is going to give me the wrong answer to a question I ask him.
But I'm generally not interested in asking an average human. I'm interested in asking someone who knows their butt from a hole in the ground in whichever topic I'm asking them about.
Humans are actually quite reliable. Wikipedia is that trust manifested. Also a human liar knows they are lying, AI doesn't know it's saying something wrong.
What I've found is that until you see it really hallucinate like mad on a subject you know well you don't realize how crazy it can be.
Especially when I talk to it about fiction and ask questions about - for example - a specific story and you see it invent whole quotes and characters and so on...it is a masterful bullshitter.
Citations! I never trust Bing Chat's answer. The links usually quickly tell you if the answer is hallucinated. Basically: treat it as a search engine, not an answer engine. Follow the links like you would on any other search engine. Those links will still be more relevant.
It happily made up citations for me. In a follow up, I asked it not too, and to please use only real papers. It apologized, said it would not do it again, then in the same reply made up another non-existent but plausible citation.
Checking the links is a good practice.
I feel like we just created an interesting novel problem in the world. Looking forward to seeing how this plays out.
Are you talking about Bing Chat, which cites actual web pages it used to make the summary, or ChatGPT, which is a very different beast and relies on built-in knowledge rather than searches?
That was a problem for ChatGPT3. Not so much for ChatGPT4. I also switched to ChatGPT4 for most of my searches. I only use Google now as a shortcut for navigating to specific website.
Hallucination problem is easily solved by using it as a code/config template or starter, and actually vetting its output. It's still a huge time-saver, even with the vetting time involved.
I’m blown away by the competence of the language model but its willingness to make up facts makes me leary.