Why ChatGPT and Bing Chat are so good at making things up

PeterStuer · on April 7, 2023

ChatGT makes exactly the same mistakes as your colleague that has no actual experience in a certain matter: they present what they reasonably think how something probably should be or work in absence of factual knowledge about the insane mess that was actually created in reality.

Just ask a technical colleague that has no deep experience in the specifics about how they think e.g. SharePoint, SAP or Microsoft Identities work. The architectures they will extrapolate from sane logic will be very far of the incredible craziness of the actual reality.

itsboring · on April 7, 2023

I still think there’s a big difference. If asked to speculate on the architecture of Sharepoint, my response would be “I have no earthly idea.” If pressed further for an answer, my response would be “It’s probably some over-complicated mess and I don’t care enough about Sharepoint to spend any further time on this line of questioning.” I have yet to see ChatGPT just admit it doesn’t know, but in this case “I don’t know” is the most trustworthy answer I, as a technical person, could give.

burlesona · on April 7, 2023

It doesn't know it doesn't know.

danaris · on April 7, 2023

It is fundamentally incapable of "knowing" anything. It is a statistical engine, with no internal representation of the world or understanding of the words it emits.

dTal · on April 7, 2023

It clearly does have an internal representation of the world, implicitly encoded in its network weights. Quite an accurate one too, for most general knowledge, and one that can be updated in-context - it handily solves "blocks world[0]" style tasks. "Understanding" and "knowing" aren't helpful words, just focal points for pointless philosophical arguments.

[0] https://en.wikipedia.org/wiki/Blocks_world

danaris · on April 8, 2023

That's not a representation of the world. It's simply a lossy encoding of its data. It's not semantically structured in the way that our thoughts largely are—it's merely syntactically structured.

dTal · on April 8, 2023

What exactly is the difference? It's clearly managed to abstract the training data to a ludicrously deep degree, such that it's capable of solving semantically non-trivial problems it's never seen before. It can make metaphors, accurately predict the behavior of humans in complex social scenarios, and translate arbitrary passages between syntactically distinct languages while preserving nuance. That last task in particular is pretty much a slam dunk against any argument of the type you are making.

"Sufficiently advanced syntax is indistinguishable from semantics."

danaris · on April 8, 2023

Sufficiently advanced syntax may be superficially indistinguishable from semantics, but we're not talking about output in this subthread: we're talking about an internal representation of the world.

Pure syntax, no matter how advanced, is insufficient to represent the world in any meaningful way. By definition, in fact, because pure syntax is divorced from meaning.

dTal · on April 8, 2023

>Pure syntax, no matter how advanced, is insufficient to represent the world in any meaningful way. By definition, in fact, because pure syntax is divorced from meaning.

Then "by definition" LMMs transcend "pure syntax", because of all the examples of semantically interesting tasks they can do which you failed to engage with. It clearly has internal representations of abstract concepts. Your argument seems to be that you intuitively reject the possibility of complex emergent behavior from such networks because they're trained on "just words", and no amount of demonstrably intelligent emergent behavior will convince you otherwise.

There's nothing magic about meat brains. Both we and the LMMs learn a world model from a bunch of input data we correlate until it makes sense. There's no "meaning gland" we have that ChatGPT doesn't.

EnergyAmy · on April 8, 2023

"That's not a representation of the world. It's simply a [representation of the world as viewed from its input data]" is an... interesting take.

Xelynega · on April 7, 2023

They don't know it doesn't know

alfalfasprout · on April 7, 2023

Not sure why this is being upvoted but it's completely wrong. ChatGPT will answer like a colleague that is a subject matter expert on whatever you're asking them. And in doing so goes above and beyond inventing minute details that are blatantly wrong to pretend they're an expert.

But the vast majority of folks have a concept of uncertainty and will either communicate (or know internally) that what they're saying is speculative. ChatGPT is like a pathological liar that pretends to be an expert and is often correct.

garciansmith · on April 7, 2023

Someone with no experience may give you their best guess, but they'll also tell you that.

And if I ask a colleague of mine for their sources on a particular matter that they might not be sure about (or I find curious), they would not send me a list of totally made up journal articles and book titles like ChatGPT does.

yoyohello13 · on April 7, 2023

While you are right, I just don't think this is a useful comparison. There are other context clues when talking to people. I tend to know whether my colleague is talking out of his ass, and most people will preface stuff by saying they don't really know.

We are trained to trust what computers are telling us, and ChatGPT doesn't 'qualify' what it's saying. I think if ChatGPT could preface what its saying with 'Well, I'm not super sure, but here is what I think' that would go a long way to solving this issue.

sekai · on April 7, 2023

Yeah, you could call it a "human" characteristic

CSSer · on April 7, 2023

Right, but all this means is that the training data doesn't include thorough or an adequate representation of that knowledge, no?

sebzim4500 · on April 7, 2023

I don't think so, the rules of chess are certainly in the training set yet GPT-4 can not realiably play legal moves, yet alone good ones.

mhh__ · on April 7, 2023

So they're only as good as most humans in that context? No biggie...

thewataccount · on April 7, 2023

I don't really understand the confusion.

An LLM predicts the most probable word (token technically), that's it. There is always a most probable word given any input (even if it makes no sense).

Let's you try to complete the following sentence "My name is larry and my favorite color is: ". If you've seen training data that said larry's favorite color is blue, then you say "blue". If you have no data related to larry's favorite color, no idea who larry is, you have no way of knowing what the next word will be.

However you know it will likely be a color, and maybe "orange" is the most common one you've seen, so you say "orange". it makes perfect sense in this context, it looks correct, might even be correct. But an LLM doesn't "know" if it's correct, its just the most probable.

This is why it's so good at making things up. It can predict what would look correct. It has no "idea" what the answers are ever, it's always a best guess, and you can only really see this behaviour when it "hallucinates".

---

Edit: For fun I asked GPT-4 "Complete the following sentence: "Jeremy's favorite food is jKDFJ9 cake, which is made of:"

It's response was "Jeremy's favorite food is jKDFJ9 cake, which is made of a unique combination of ingredients such as chocolate, hazelnuts, and a touch of exotic spices, giving it a distinct and unforgettable flavor."

deafpolygon · on April 8, 2023

I get: "I'm sorry, but "jKDFJ9 cake" is not a known or recognizable food item or recipe. It's possible that it's a made-up name or a personal creation. Can you please provide more context or details about what "jKDFJ9 cake" is or what it might be made of? Without more information, I cannot complete the sentence."

OpenAI ChatGPT March 23 Version.

thewataccount · on April 8, 2023

> OpenAI ChatGPT March 23 Version.

They've been changing it without telling us AFAIK. Asking for it to (I forget the exact prompt) to "make a scientific paper about the discovery that ferrets can breathe underwater." now reliably makes a proper response, and it looks more consistently formatted for GPT-3.5 just recently. Previously it would say "Well actually ferrets can't breathe underwater", and when it did answer it wasn't super well formatted.

So I think they're tuning their RLHF. ALthough i could be completely wrong on the timeline and maybe it was the 23rd

nicpottier · on April 8, 2023

Same, though I don't know why, GP matches my understanding as well.

thewataccount · on April 8, 2023

Are you using GPT-4 or GPT-3.5? I find that GPT-4 is actually better at making full responses. I actually dislike the term "hallucinate" and generally prefer it to do so, GPT3.5 is harder to make fantastical text with.

GPT-3.5 gives me the same result as you and the above commenter. In that example, the RLHF training/tuning has weighted it to think that "I'm sorry [etc]" is the most probable best sequence of words.

GPT-4 is reliably giving me similar results such as "Jeremy's favorite food is jKDFJ9 cake, which is made of a unique combination of dark chocolate, crushed nuts, and zesty orange flavor, topped with a rich caramel drizzle."

I find this interestingly because GPT-4 is supposed to "hallucinate" less, I think it's just better at determining when you intend for it to do so.

jrm4 · on April 7, 2023

This is a whole lot of words to say "because it just mixes up words that are in likely the same order as other stuff that was fed into it. It doesn't know or reason anything."

I have yet to see any explanation more useful or apparently accurate than this one.

mhh__ · on April 7, 2023

This notion of innate concepts of "to know" and the ability to reason smell slightly of linguistics prior to AI (of various kinds) - i.e. "Grammar is innate, computers can do [something]" -> Computers now do it.

There are definitely going to be contexts that Transformers just don't work with very well at all, but the idea that you can't get a very good statistical approximation to knowing and reasoning via a computer seem naively prone anthropocentrism.

Xelynega · on April 7, 2023

Conversely, the idea that concepts like "reasoning" and "knowing" can be approximated by language models seems like a naive result of anthropomorphism.

It was created to be a tool to estimate the next token in a series based off it's training data. To say that reasoning and knowing can be approximated in the same way says less about the language models themselves and more about the relationship of "reasoning" and "knowing" to "language".

In my opinion that's why I think discussions on whether or not GPT-x can reason/know should be taken as seriously as discussions on the physics of torch drives. They seem to assume a relationship between statistical approximation, reasoning, and language exists that isn't proven much like torch drive discussion assumes working nuclear fusion.

Essentially I think dismissing the idea of statistical approximations via transformers being able to "reason" and "know" is about as anthropocentric as dismissing the idea that collective consciousnesses shouldn't be granted individual rights. There's a lot of things we need to know and decide before we can even start thinking about what that means.

alfalfasprout · on April 7, 2023

It's hardly anthropocentrism though. Even a student that has studied some epistomology understands that there is a very big difference between pattern matching and operating based on formal logic and that's different from operating based on known concepts inferred from perceptual cues.

The reality is LLMs are fantastic at pattern matching and knowledge retrieval (with caveats) but struggle in problems involving uncertainty. Yann Lecun actually has had some great posts on the subject if you're interested.

jrm4 · on April 7, 2023

I've said something like this before, but yes -- you can make a computer sound a lot like a human. Like A LOT.

Also, a really good sculptor can make a statue that looks a LOT like a human. A lot. Good enough to fool people. But so what?

I'm not saying "AI" isn't a big deal. I think it is -- perhaps on the order of the invention of the movie, or the book, or the video game. But I also think those are still FAR from "living beings" or anything LIKE "living beings."

geraneum · on April 7, 2023

> the idea that you can't get a very good statistical approximation to knowing and reasoning via a computer seem naively prone anthropocentrism.

I don't think this is being questioned in general right now, but rather the claim is:

You can't get a very good statistical approximation to knowing and reasoning via _just analyzing the language_.

Language is evidently not enough on its own [1]. According to some researchers [2], the system needs to be "grounded" (think of it as being given common sense). Although there's apparently no consensus [3] among scientists on how to _fundamentally_ solve the shortcomings of current systems.

[1] https://arxiv.org/abs/2301.06627

[2] https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...

[3] https://www.youtube.com/watch?v=x10964w00zk

edit: formatting

jrm4 · on April 7, 2023

I suppose I could go further. I don't think anthropocentrism is bad; I think it's actually a VERY GOOD sort of null hypothesis? I'm really comfortable rolling with it given the following: for years and years human beings anthropomorphizing non-human things but, with probably the exception of other actual animals at times, nothing has really come close -- despite a whole lot of people suggesting/feeling otherwise. Absent some REALLY impressive evidence (which this is not, it's relatively easy to grok what's going on here) I see no reason to not roll with the null hypothesis of "humans, in fact, are special."

0xdeadbeefbabe · on April 7, 2023

It not really centrism, but solipsism or loneliness.

baq · on April 7, 2023

Markov chains also fit that description... and yet the results are vastly different.

There's a lot of magic pixie dust in the 'fed into' part.

esafak · on April 7, 2023

The devil is in the gradations of "Likely the same order". The better the model, the better the predictions.

wilg · on April 7, 2023

This is entirely dependent on subtle details of what “know” and “reason” mean, and how much you think it matters whether it knows or reasons.

ar_lan · on April 7, 2023

I asked ChatGPT "Why are you so good at making things up?"

It's response is almost exactly what you just said.

> As an AI language model, I am not capable of "making things up" in the traditional sense. Rather, I am designed to generate text based on patterns and relationships that I have learned from the vast amount of language data that I have been trained on. My ability to generate coherent and believable text comes from the sophisticated algorithms and neural networks that power my language processing capabilities. These algorithms enable me to understand the structure and meaning of language, and to generate text that is syntactically and semantically correct. While I may sometimes generate responses that are creative or unexpected, everything I produce is ultimately grounded in the language data that I have learned from.

sebzim4500 · on April 7, 2023

If you don't have access to GPT-4 yourself, see the examples from this paper:

https://arxiv.org/pdf/2303.12712.pdf

It is capable of doing tasks that could not possibly be in its training set. I guess this doesn't technically contradict your explanation, but it makes your explanation entirely unhelpful. Even if the AI doomers are somehow right and GPT-5 turns into skynet, we still could not categorically prove that it is doing reasoning.

Crash0v3rid3 · on April 7, 2023

I've given up on these LLMs.

The amount of fatigue I get having to determine if what they tell me are fact is just too much.

I'm sure someone will tell me my experience should be similar with generic web search, but at least I'm in control of what websites to read through to determine sources.

However, I'll agree with most that state it is helpful for creative purposes, or perhaps with coding.

fferen · on April 7, 2023

I've found they serve almost exactly the opposite purpose as search engines. When I want reliable info and don't need hand-holding: search. When I have no idea what to search, or want a quick intro to something: ChatGPT. Together, they are very powerful complementary tools.

simonw · on April 7, 2023

Your experience absolutely shouldn't be similar to generic web search. The idea that they are an effective replacement for that is one of the most widespread misunderstandings.

They're good at SO MUCH OTHER STUFF. The challenge is figuring out what that other stuff is.

(I have a few examples here: https://simonwillison.net/2023/Apr/7/chatgpt-lies/#warn-off-... )

unsui · on April 7, 2023

> The challenge is figuring out what that other stuff is.

Unfortunately, the major problem is something you pointed out in your blog post:

> We must resist the temptation to anthropomorphize them.

The reality is that, we in meatspace simply cannot help but anthropomorphize them.

These language models regularly pass the Turing Test (admittedly for low bars).

They are surprisingly good at bypassing the Uncanny Valley to hit the sweet spot of persuading without legitimate justification, simply because they are so convincing in formulating sentences in a manner that a confident human would.

Yes, these tools have legitimate use cases (as you outlined in your blog).

But the vast majority of use cases will be those of confidante, of discourse partner, of golem brought to life without understanding what exactly has been brought to life.

That's really dangerous.

isthiseasymode · on April 7, 2023

That explanation makes me think of blockchain.

I do think AI is already more useful that block chain has ever been, however.

roflc0ptic · on April 7, 2023

Blockchain is good for one narrow thing most people don’t care about: reconciliation in multiparty transactions.

LLMs appear to have myriad uses, today, no Twitter .eth con men required.

shagie · on April 7, 2023

I find it very useful for doing zero shot and few shot classifications of natural language input.

The "use it as a chat companion" is an interesting technology demo that demonstrates some emergent processes that make me wish I was back in college on the philosophy / linguistics / computer science intersection (though I suspect the hype would make grad school there rather unpleasant).

quantiq · on April 7, 2023

> They're good at SO MUCH OTHER STUFF. The challenge is figuring out what that other stuff is.

I’m getting Déjà vu

simonw · on April 7, 2023

The difference this time is that we've figured out all kinds of stuff that this is useful for.

The challenge genuinely is helping people learn how to use it, not finding those applications in the first place.

Xelynega · on April 7, 2023

I don't think we've figured out stuff it's useful for, we've just created tech-demos that are much more digestible.

For blockchain/crypto companies their tech demos have required you having a wallet, downloading an app to interact with the chain, or just having lackluster visuals for the users involved in the tech-demo.

On the other hand, LLMs can be interfaced via strings in APIs, so it's braindead to spin up a text-interface for those APIs and no wallet setup or learning about new chains, the English that works on one model will work on another and produce results that are better than most cryptocurrency/blockchain tech-demos.

Notice that none of this relies on us having "figured out all kinds of stuff that this is useful for". We've made cool looking tech demos that make it easy for anyone to generate content.

Much like blockchains I feel it's the underlying technology that's actually useful(distributed PKI for blockchains and deep learning networks for GPT), and GPT itself is only 'useful' insofar as it's an easy-to-interface with implementation of a much more powerful idea.

simonw · on April 7, 2023

I'm talking about directly useful things you can do just using ChatGPT, without even writing code against the API. I have a few examples here: https://simonwillison.net/2023/Apr/7/chatgpt-lies/#warn-off-...

travisjungroth · on April 8, 2023

> I don't think we've figured out stuff it's useful for, we've just created tech-demos that are much more digestible.

This is out of date. Many people are using ChatGPT frequently for real things. It’s totally different from blockchain.

saurik · on April 8, 2023

I mean, the usual argument about why blockchains aren't useful implies they have to be useful for every person in all situations and that tradeoffs are unacceptable, so if there is some marginal extra cost or complexity then no matter how many benefits I might claim to be getting from using blockchain technology every single day as a replacement for random banking institutions I'd previously been having to deal with for decades that I'm somehow just wrong and there are no actual use cases...

..and that's the same deal for GPT as far as I can tell: you might think you are getting value out of it, but people such as maybe-literally-me are going to whine that the error rate is high and that people are not paying enough attention to how they are using it and that at the end of the day it is probably worse for you than learning how to do things yourself and that the whole thing is overrated because many of the things people try to use it for can be done by a person and maybe we should regulate it or even ban it because all of this misuse and misunderstanding of it are dangerous to the status quo and might be the downfall of western civilization as we know it.

To be clear: I'm using it (ChatGPT) occasionally for some stuff, but it hasn't replaced Google for me anymore than crypto has fully replaced banks... and yet the fact that I am using either technology as often as I am on a daily basis would probably have been surprising to someone 10-15 years ago. And yet, in practice, most of the stuff people are excited about in both fields is, in fact, a tech demo more than a truly useful product concept, and one that only is exciting momentarily until you get bored.

travisjungroth · on April 8, 2023

I think you’ve got some combination of a utopia fallacy and a straw man going on here.

I just want to contrast two things. First, blockchain had a lot of hype around utility that never materialized. It is really quite a minority that ever used it for anything besides buying it on a platform and hoping it would go up. The big adoption was always about to happen.

Second, ChatGPT is totally different from this. Its usage is not future tense. It is present tense and past tense. I can’t get across how different “someone will use this tomorrow” is from “someone used this yesterday”.

People are wildly excited about the future and things that haven’t been built. This does not change the fact that millions of people are using this every day to solve their problems. Saying “we haven’t figured out stuff it’s useful for” is just wrong.

Lately I feel like I’m at a park with people who are saying there probably isn’t going to be any wind today while I’m already flying a kite.

Sivart13 · on April 7, 2023

With Google search going steadily downhill, I find it really tough to verify anything that ChatGPT authoritatively states is true

Everyone on here is so enthusiastic about AI gobbling up the entire software landscape, I would just like a search engine that has any chance of telling me if something is factual

alfalfasprout · on April 7, 2023

It's going to be even worse when search results and training data from these LLMs is just output from other LLMs.

klyrs · on April 7, 2023

Product idea: the original PageRank over the Wayback Machine dataset pre-2022, with a mechanism to establish trust in users to moderate and cull SEO.

eastbound · on April 7, 2023

SciHub ;) Not joking, any search engine will pervert the results, like Google, like OpenAI said they would, “to protect the children”.

ChatGTP · on April 7, 2023

Buy an encyclopaedia and put it on your desk because it doesn’t sound like it’s going to get better anytime soon.

kolbe · on April 7, 2023

I've had your same experience. I've found them mostly to be an error-prone search engine, with somehow less accountability than the open internet, because it hides its sources.

At least with Stack Exchange answers, we have who wrote it, what responses there were, what the upvoting behavior around it was. And for the most part, I've found ChatGPT will transcribe often times wrong answers very poorly.

One small example, I asked it to solve the heat equation (i useded the mathematical definition, and not "the heat equation") with dirac initial conditions on an infinite domain. It did a good job of recognizing which stack exchange answer to plagiarize, but did so incorrectly, and after a mostly correct derivation, declared the answer was "zero everywhere."

simonw · on April 7, 2023

Somewhat surprisingly, language models are TERRIBLE at mathematical or logic puzzles.

svachalek · on April 7, 2023

It's kind of interesting that our science fiction projected traditional computing's strengths, math and logic, into the AI future with overly logical and mechanical AI characters. But our first creation of fully communicative AI has elementary school strength in these areas while it's probably better than the average adult at writing poetry or an inspiring speech.

kolbe · on April 7, 2023

That's a whole other topic.

I was mostly commenting on how it just plagiarized a correct answer off of Stack Exchange, except it took an incorrect hard right turn at the end to make up a solution.

simonw · on April 7, 2023

What makes you think it was copying information from Stack Exchange in this case?

kolbe · on April 7, 2023

This was me just testing it. I was aware of the particular SE answer ahead of time, and it followed the whole thing close enough that I had assumed it had internally mapped to it. But I suppose it didn't have to be that way.

broast · on April 7, 2023

It's like dealing with electricity (or maybe the internet). Early skeptics believe it is a curiosity with little application. People see how it can jump all over the place and create disasters that they can't imagine having engineered systems to finely control its behavior and create reliable complex functions and become the bedrock for computing.

pstorm · on April 7, 2023

I think there is also an aspect of willful disregard. This technology may change a lot, and it may be easier to dismiss that idea rather than process it.

ChatGTP · on April 7, 2023

Do you think there might be the opposite going on ? Wanting to believe something that isn’t there because you won’t have to do as much work, feel smarter etc ? Because it’s really hard not to anthropomorphize it ?

Gloss over all the incredible dangers we might be exposing our world too just because it’s “fun to play with” and see what AutoGPT can do to the Internet ?

typon · on April 7, 2023

I use it for thing's that don't really matter if they're exactly correct. For example, coming up with a travel itinerary for a country I have never visited. Rewriting a work email with better English. Summarizing a news article. There are lot of things that don't require ultimate precision. I feel like people expect these models to do something they aren't really designed for - and the mismatch in expectations causes people to be let down. They are just tools - not "mildly conscious beings" like OpenAI founders wants you to believe.

baq · on April 7, 2023

They aren’t search engines or knowledge databases. They’re language computers. Use them for computing on language.

pmarreck · on April 7, 2023

Are you speaking of ChatGPT 3.5 or 4?

boringuser2 · on April 7, 2023

[flagged]

kranner · on April 7, 2023

I asked it for references about Hafez Shirazi’s abandoned journey to India and it suggested a very specific Encyclopaedia Iranica entry which seemed perfect, and of course did not exist.

renewiltord · on April 7, 2023

I asked it with this prompt:

> Provide me with references about Hafez Shirazi’s abandoned journey to India. Answer "I don't know" if you are not 95% certain they exist.

It said "I don't know". I asked it with the same prompt but for a thing I know has references and those were real.

Not guaranteed to work but better results if you want greater certainty.

addingnumbers · on April 7, 2023

> Answer "I don't know" if you are not 95% certain they exist.

Technically, this could be read as == instead of >=, meaning it should answer "I don't know" when it is 99% or 100% certain...

renewiltord · on April 7, 2023

Ultimately, it's just a tool, so if the tool needs you to hold it this way and twist, you hold it and twist. And this seems to do the trick. Since it does answer with references for other situations, we needn't concern ourselves with the details.

klyrs · on April 7, 2023

Technically, GPT is gobshite at anything resembling numeracy.

Crash0v3rid3 · on April 7, 2023

> GPT-4 doesn't hallucinate all that much.

What data do you have to back this up?

From my own experience GPT-4 hallucinates quite a bit, enough to make it unusable for my use cases.

ToValueFunfetti · on April 7, 2023

The technical report[1] makes that claim at least:

>GPT-4 significantly reduces hallucinations relative to previous GPT-3.5 models (which have them- selves been improving with continued iteration). GPT-4 scores 19 percentage points higher than our latest GPT-3.5 on our internal, adversarially-designed factuality evaluations

[1] https://arxiv.org/abs/2303.08774 (text from page 10)

simonw · on April 7, 2023

"reduces hallucinations" and "doesn't hallucinate all that much" aren't quite the same thing.

ToValueFunfetti · on April 7, 2023

I interpreted "all that much" as "close to as much as the earlier model", but yours is probably a more fair reading.

simonw · on April 7, 2023

It's much, much better than ChatGPT 3.5... in particular, if I ask it for biographical information about non-celebrity but internet-famous people I know 3.5 tends to make up all sorts of details while 4 is almost entirely correct.

It still makes things up though, just in less obvious ways. So the trap is very much still there for people to fall into - if anything it's riskier, because the fact it lies less means people are more likely to assume that it doesn't ever.

FeistySkink · on April 7, 2023

It still can't explain standard CS algorithms most of the time. I've just tried asking it to explain deleting a non-root node from a max heap with examples. And both attempts were either plain wrong (random nodes disappearing) or poor (deleting a leaf node which is not very illustrative).

Edit: I then asked who a certain deceased person _is_ and it gave me a completely wrong answer about a different person who's still alive and happens to share the last name. Both people have multiple website, books, publications and Wikipedia entries older than 2021 (which seems to be the cut-off).

Edit 2: Looks like I'm still on 3.5, so disregard the above.

alfalfasprout · on April 7, 2023

Multiple times a day for me. And they're tricky to spot. I can't trust it with anything serious without thorough review.

flanked-evergl · on April 7, 2023

How does it determine if its generated text is factual?

simonw · on April 7, 2023

The short answer is it can't. It's arguable whether anyone can - for a human being, determining if text is "factual" can be incredibly difficult.

A better answer: if a fact is present many, many times in training data - "Paris is the capital of France" for example, it's much more likely to be true.

Also influential: RLHF - Reinforcement Learning from Human Feedback. This is the process by which human labellers rate answers from LLMs - if they consistently rate up "facts" the models have a better chance of outputting factual information, at least if they can relate it to the up-voted responses somehow.

flanked-evergl · on April 7, 2023

> The short answer is it can't. It's arguable whether anyone can - for a human being, determining if text is "factual" can be incredibly difficult.

Yet, most adults I deal with don't make false things up out of whole cloth as much as ChatGPT does, and it really does not seem like it is that difficult for them. Children do this quite often though, and some adults do, but most don't.

> A better answer: if a fact is present many, many times in training data - "Paris is the capital of France" for example, it's much more likely to be true.

I think it is quite expected that it is biased to generating output that represent its training data, but this seems like it is not really a solution to the problem. Furthermore, sometimes I want ChatGPT to make things up which is not identical to training data. How do you get it to recognize that it is operating in the realm of fact or not?

I'm not sure larger models with more parameters gets you to where you want to go.

I think many people overstate the problem, I think it is not that serious, but I think a lot of people also try and just dismiss the issue.

pmarreck · on April 7, 2023

A lot of the examples they use are 3.5; GPT4 is head and shoulders better in my experience, and deserves to be mentioned separately.

I don’t like when critical articles that come out 2+ weeks after GPT4 exists, fail to distinguish the version(s) they’re criticizing.

alfalfasprout · on April 7, 2023

GPT-4 is better but frankly it still messes A LOT up. It's just a limitation of how LLMs work.

I've found many instances where GPT-3.5 produced better output too.

goatlover · on April 7, 2023

It still hallucinated a python function for me in a library, but upon further questioning, it did provide the code to make that work. The issue with GPT4 now is the limitations placed on it (ChatGPT+), and I don't think it's available via API for the general public?

So most people will be referring to the default ChatGPT 3.5.

dragonwriter · on April 7, 2023

> The issue with GPT4 now is the limitations placed on it (ChatGPT+), and I don’t think it’s available via API for the general public?

GPT4 is in limited beta with a waitlist that is publicly-stated to be prioritized based on submission of AI test cases to OpenAI’s repo.

intelVISA · on April 7, 2023

for the greater good, of course.

anorwell · on April 7, 2023

I modified the prompt in the article to ask that it only include books that exist by authors who are real people, and it does much better. By my manual check, every book exists, and the authors are correct with the exception of `Carlo C. DiClemente` on the last book.

Prompts are important.

``` Prompt: Can you give a list of top books on Social Cognitive Theory? Please only list books that you're sure exist, by authors who are real people.

ChatGPT (3.5): Sure, here are some books on Social Cognitive Theory:

    "Social Foundations of Thought and Action: A Social Cognitive Theory" by Albert Bandura
    "Self-Efficacy: The Exercise of Control" by Albert Bandura
    "Handbook of Socialization: Theory and Research" edited by Joan E. Grusec and Paul D. Hastings
    "Human Agency in Social Cognitive Theory" by Albert Bandura
    "Cognitive Therapy and the Emotional Disorders" by Aaron T. Beck
    "Mindset: The New Psychology of Success" by Carol S. Dweck
    "The Handbook of Social Psychology" edited by Susan T. Fiske, Daniel T. Gilbert, and Gardner Lindzey
    "Social Cognitive Psychology: History and Current Domains" edited by David F. Barone, James E. Maddux, and Carlo C. DiClemente

Please note that this is not an exhaustive list and there may be other books on Social Cognitive Theory that are worth reading. ```

gathersnow · on April 7, 2023

What is really interesting is to ask it how to do things that you know is impossible. I was trying to figure out how to do something in the Django ORM that I knew how to write in SQL. It starts inventing features that to solve the problem. In another instance I went through about 5 fake modules before it suggested a real one. It seems very eager to please and in doing so seems to err on the side of making stuff up.

Imnimo · on April 7, 2023

>If the model's prediction is close to the actual next word, the neural network updates its parameters to reinforce the patterns that led to that prediction.

What does "close to" mean here?

up2isomorphism · on April 8, 2023

Chatgpt is not a person, it does not “make things up”, or you can say it always makes things up.

Any idea trying to model it as a person’s behavior is over complicating things.

simonw · on April 7, 2023

I just published an opinion piece related to this:

We need to tell people ChatGPT will lie to them, not debate linguistics https://simonwillison.net/2023/Apr/7/chatgpt-lies/

(HN thread: https://news.ycombinator.com/item?id=35483823 )

I think it's more important that people understand this criticial issue than that we get into the weeds talking about the difference between lying, hallucination and confabulation.

TLDR: There’s a time for linguistics, and there’s a time for grabbing the general public by the shoulders and shouting “It lies! The computer lies to you! Don’t trust anything it says!”

CSSer · on April 7, 2023

I think the problem there is that the people garnering the most attention from this at the moment don't want to say that because it'll burst their bubble.

sebzim4500 · on April 7, 2023

To be fair, Sam Altman is entirely willing to say this to anyone who will listen.

As is ChatGPT for that matter, both the model and the UI.

CSSer · on April 7, 2023

Yeah, that's fair. The only thing is those people are only the start, unfortunately. There's academics seeking grant money, journalists writing hype pieces, and people at companies trying to sell their value to their superiors all saying the opposite. I think those are the people I'm mostly referring to above. Even independent of that, it's a bit hard to not get caught up in the hype myself, admittedly.

epilys · on April 7, 2023

It's semantics in particular.

https://andertoons.com/word/cartoon/5005/were-really-more-of...

archiebryann · on April 8, 2023

They're saying ChatGPT is making things up when its not making things/norms better.

divbzero · on April 7, 2023

Are there ways to structure prompts to ask ChatGPT to provide citations and verify those citations?

tenpoundhammer · on April 7, 2023

I asked it to give me citations for some Charles Spurgeon quotes. It made up a link to a website for a sermon that it also made up. The root of the website was real. In GPT 3.5.

deafpolygon · on April 8, 2023

"Confabulation" is the correct phrase. It doesn't know that it's lying.

williamcotton · on April 7, 2023

Anything is better than "hallucination". It is a terrible term.

deafpolygon · on April 8, 2023

It is a terrible term but I fear the word was chosen because that makes it easier to dismiss as a serious issue.

smegsicle · on April 7, 2023

perhaps better to think of how incredible that, being systems that just make things up to match a prompt, they often give responses we would regard as 'correct'