That's a different error context I think. It's a mistake if the model produces nonsense, because it's designed to produce realistic text. It's not a mistake if it produces non-factual information that looks realistic.
And it fundamentally cannot always produce factual information, it doesn't have that capacity (but then, neither do humans and with the ability to source information this statement may well be obsolete soon enough)
Though I wouldn't go so far as to say that the model cannot make mistakes - it clearly is susceptible to producing nonsense. I just think expecting it to always produce factual information is like using a hammer to cut wood and complaining the wood comes out all jagged
Indeed, I intended to imply that a model cannot err in the same way a computer cannot. This parallels the concept that any tool is incapable of making mistakes. The notion of a mistake is contingent upon human folly, or more broadly, within the conceptual realm of humanity, not machines.
LLMs may generate false statements, but this stems from their primary function - to conjure plausible language, not factual statements. Therefore, it should not be regarded as a mistake when it accomplishes what it was designed to do.
In other words, the tool functions as intended. The user, being forewarned of the tool's capabilities, holds an expectation that the tool will perform tasks it was not designed to do. This leaves the user dissatisfied. The fault lies with the user, yet their refusal to accept this leads them to cast blame on the tool.
In the words of a well-known adage - a poor craftsman blames his tools.
I conceptually agree with you that a fool blames his tools.
However! If LLMs produced only lies no one would use them! Clearly truthiness is a desired property of an LLM the way sufficient hardness is of a bolt. Therefore, I maintain that an LLM can be wrong because truthiness is its primary function.
A craftsman really can just own a shitty hammer. He shouldn't use it. But the hammer can inherently suck at being a hammer.
I agree for the most part, but I wish to underscore the primary function inherent in each tool. For a LLM, it is to generate plausible language. For a bolt, it is to offer structural integrity. For a car, it is to provide mobility. Should these tools fail to do what they were designed for, we can rightfully deem them as defective.
GPT was not primarily made to produce factual statements. While factual accuracy certainly constitutes a desirable design aspiration, and undeniably makes the LLM more useful, it should not be expected. Automobile designers, for example, strive to ensure safety during high-speed collisions, a feature that almost invariably benefits the user. However, if someone uses their car to demolish their house, this is probably not going to leave them satisfied. And I don't think we can say the car is a lemon for this.
> Mar 1st, 2023 is where things get interesting. This document was filed—“Affirmation in Opposition to Motion”—and it cites entirely fictional cases! One example quoted from that document (emphasis mine):
The very first limitation listed on the ChatGPT introduction post is about incorrect answers - https://openai.com/blog/chatgpt. This has not changed since ChatGPT was announced. OpenAI is advertising that it will generate more than plausible language.
I think you are barking up the wrong tree here. As much as I understand your scepticism, OpenAI have been very transparent about the limitations of GPT and it is not truthful to say otherwise.
A definition of "plausible" is "apparently reasonable and credible, and therefore convincing".
In what limit does "apparently reasonable and credible" diverge from "true"?
We'd make the LLM not lie if we could. All this "plausible" language constitutes practitioner weasel words. We'd collectively love if the LLMs were more truthful than a 5-year-old.
I've noticed that there's a lot of shallow fulmination on HN recently. People say things like "I call bullshit", or "I don't believe this for a second", or even call others demeaning things.
My brother (and I say this with empathy), no one is here to hear your vehement judgement. If you have anything of substance to contribute, there are a million different ways to express it with kindness and constructively.
As for RLHF, it is used to align the LLM, not to make it more factual. You cannot make a language model know more facts than what it comes out of training with. You can only align it to give more friendly and helpful output to its users. And to an extent, the LLM can be steered away from outputting false information. But RLHF will never be comprehensive enough to eliminate all hallucination, and that's not its purpose.
LLMs are made to produce plausible text, not facts. They are fantastic (to varying degrees) at speaking about the facts they know, but that is not their primary function.
I call B.S. If LLMs never made mistakes we wouldn't train them. Any random initialization would work.