dang is doing that frequently - but what does "macro-expanded" mean for normal users? Do you have a bookmarklet, extension or something similar for this or is there any secret incantation that makes HN generate such an entry automatically?
I just do it manually.. I've gotten pretty efficient at it, takes only a moment.
Imo including title, timeline / age, and comment volume provides helpful context to readers (I always appreciate when others do this, rather than, in the most severe cases, leaving a wall of unadorned HN links).
No, it did not “double-check”—that’s not something it can do! And stating that the cases “can be found on legal research databases” is a flat out lie.
What’s harder is explaining why ChatGPT would lie in this way. What possible reason could LLM companies have for shipping a model that does this?
It did this because it's copying how humans talk, not what humans do. Humans say "I double checked" when asked to verify something, that's all GPT knows or cares about.
It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.
It did that admirably. It's not its fault, or in my opinion OpenAI's fault, that the output is being misunderstood and misused by people who can't be bothered understanding it and project their own ideas of how it should function onto it.
This harks back to around 1999 when people would often blame computers for mistakes in their math, documents, reports, sworn filings, and so on. Then, a thousand different permutations of "computers don't make mistakes" or "computers are never wrong" became popular sayings.
Large Language Models (LLMs) are never wrong, and they do not make mistakes. They are not fact machines. Their purpose is to abstract knowledge and to produce plausible language.
GPT-4 is actually quite good at handling facts, yet it still hallucinates facts that are not common knowledge, such as legal ones. GPT-3.5, the original ChatGPT and the non-premium version, is less effective with even slightly obscure facts, like determining if a renowned person is a member of a particular organization.
This is why we can't always have nice things. This is why AI must be carefully aligned to make it safe. Sooner or later, a lawyer might consider the plausible language produced by LLMs to be factual. Then, a politician might do the same, followed by a teacher, a therapist, a historian, or even a doctor. I thought the warnings about its tendency to hallucinate speech were clear — those warnings displayed the first time you open ChatGPT. To most people, I believe they were.
I just went to ChatGPT page, and was presented with the text:
"ChatGPT: get instant answers, find creative inspiration, and learn something new. Use ChatGPT for free today."
If something claims to give you answers, and those answers are incorrect, that something is wrong. Does not matter what it is -- model, human, dictionary, book.
Claiming that their purpose is "to produce plausible language" is just wrong.. no one (except maybe AI researchers) say: "I need some plausible language, I am going to open ChatGPT".
When you first use it, a dialog says “ChatGPT can provide inaccurate information about people, places, or facts.” The same is said right under the input window. In the blog post first announcing ChatGPT last year, the first limitation listed is about this.
Even if the ChatGPT product page does not specifically say that GPT can hallucinate facts, that message is communicated to the user several times.
About the purpose, that is what it is. It’s not clearly communicated to non-technical people, you are right. To those familiar to the AI semantic space, LLM already tells the purpose is to generate plausible language. All the other notices, warnings, and cautions point casual users to this as well, though.
I don’t know… I can see people believing what ChatGPT says are facts. I definitely see the problem. But at the same time, I can’t fault ChatGPT for this misalignment. It is clearly communicated to the users that facts presented by GPT are not to be trusted.
Producing plausible language is exactly what I use it for - mostly plausible blocks of code, and tedious work like rephrasing emails, generating docs, etc.
Everything it creates needs to be reviewed, particularly information that is outside my area of expertise. It turns out ChatGPT 4 passes those reviews extremely well - obviously too well given how many people are expecting so much more from it.
That's a different error context I think. It's a mistake if the model produces nonsense, because it's designed to produce realistic text. It's not a mistake if it produces non-factual information that looks realistic.
And it fundamentally cannot always produce factual information, it doesn't have that capacity (but then, neither do humans and with the ability to source information this statement may well be obsolete soon enough)
Though I wouldn't go so far as to say that the model cannot make mistakes - it clearly is susceptible to producing nonsense. I just think expecting it to always produce factual information is like using a hammer to cut wood and complaining the wood comes out all jagged
Indeed, I intended to imply that a model cannot err in the same way a computer cannot. This parallels the concept that any tool is incapable of making mistakes. The notion of a mistake is contingent upon human folly, or more broadly, within the conceptual realm of humanity, not machines.
LLMs may generate false statements, but this stems from their primary function - to conjure plausible language, not factual statements. Therefore, it should not be regarded as a mistake when it accomplishes what it was designed to do.
In other words, the tool functions as intended. The user, being forewarned of the tool's capabilities, holds an expectation that the tool will perform tasks it was not designed to do. This leaves the user dissatisfied. The fault lies with the user, yet their refusal to accept this leads them to cast blame on the tool.
In the words of a well-known adage - a poor craftsman blames his tools.
I conceptually agree with you that a fool blames his tools.
However! If LLMs produced only lies no one would use them! Clearly truthiness is a desired property of an LLM the way sufficient hardness is of a bolt. Therefore, I maintain that an LLM can be wrong because truthiness is its primary function.
A craftsman really can just own a shitty hammer. He shouldn't use it. But the hammer can inherently suck at being a hammer.
I agree for the most part, but I wish to underscore the primary function inherent in each tool. For a LLM, it is to generate plausible language. For a bolt, it is to offer structural integrity. For a car, it is to provide mobility. Should these tools fail to do what they were designed for, we can rightfully deem them as defective.
GPT was not primarily made to produce factual statements. While factual accuracy certainly constitutes a desirable design aspiration, and undeniably makes the LLM more useful, it should not be expected. Automobile designers, for example, strive to ensure safety during high-speed collisions, a feature that almost invariably benefits the user. However, if someone uses their car to demolish their house, this is probably not going to leave them satisfied. And I don't think we can say the car is a lemon for this.
> Mar 1st, 2023 is where things get interesting. This document was filed—“Affirmation in Opposition to Motion”—and it cites entirely fictional cases! One example quoted from that document (emphasis mine):
The very first limitation listed on the ChatGPT introduction post is about incorrect answers - https://openai.com/blog/chatgpt. This has not changed since ChatGPT was announced. OpenAI is advertising that it will generate more than plausible language.
I think you are barking up the wrong tree here. As much as I understand your scepticism, OpenAI have been very transparent about the limitations of GPT and it is not truthful to say otherwise.
A definition of "plausible" is "apparently reasonable and credible, and therefore convincing".
In what limit does "apparently reasonable and credible" diverge from "true"?
We'd make the LLM not lie if we could. All this "plausible" language constitutes practitioner weasel words. We'd collectively love if the LLMs were more truthful than a 5-year-old.
I've noticed that there's a lot of shallow fulmination on HN recently. People say things like "I call bullshit", or "I don't believe this for a second", or even call others demeaning things.
My brother (and I say this with empathy), no one is here to hear your vehement judgement. If you have anything of substance to contribute, there are a million different ways to express it with kindness and constructively.
As for RLHF, it is used to align the LLM, not to make it more factual. You cannot make a language model know more facts than what it comes out of training with. You can only align it to give more friendly and helpful output to its users. And to an extent, the LLM can be steered away from outputting false information. But RLHF will never be comprehensive enough to eliminate all hallucination, and that's not its purpose.
LLMs are made to produce plausible text, not facts. They are fantastic (to varying degrees) at speaking about the facts they know, but that is not their primary function.
TBH, I think the answer to this is to fill the knowledge gap. Exactly how is the difficult part. How do you make "The moon is made of rock" be more likely than "The moon is made of cheese" when there is significantly more data (input corpus) to support the latter?
Extrapolating that a bit, future LLMs and training exercises should be ingesting textbooks and databases of information (legal, medical, etc). They should be slurping publicly available information from social media and forums (with the caveat that perhaps these should always be presented in the training set with disclaimers about source / validity / toxicity).
Why not? They should use it, with sufficient understanding of what it is. Doctors should not use it to diagnose a patient, but could use it to get some additional ideas for a list of symptoms. Lawyers should obviously not write court documents with it or cite it in court, but they could use it to get some ideas for case law. It's a hallucinating idea generator.
I write very technical articles and use GPT-4 for "fact-checking". It's not perfect, but as a domain expert of what I write, I can sift out what it gets wrong, and still benefit from what it gets right. It has both - suggested some ridiculous edits to my articles, and found some very difficult to spot mistakes, like where a reader might misinterpret something from my language. And that is tremendously valuable.
Doctors, historians, lawyers, and everyone should be open to using LLMs correctly. Which isn't some arcane esoteric way. The first time we visit ChatGPT, it gives a list of limitations and what it shouldn't be used for. Just don't use it for these things, understand its limitations, and then I think it's fine to use it in professional contexts.
Also, GPT-4 and 3.5 now is very different from the original ChatGPT that wasn't a significant departure from GPT-3. GPT-3 hallucinated everything that could resemble a fact more than an abstract idea. What we have now with GPT-4 is much more aligned. It probably wouldn't produce what vanilla ChatGPT produced for this lawyer. But the same principles of reasonable use apply. The user must be the final discriminator that decides whether the output is good or not.
Their "own" ideas? Let me remind you that OpenAI released a report purposefully suggesting that GPT4 has relatively high IQ, passes a lot of college-level tests, and solves coding problems. Then it was revealed that there was training data contamination that led to good results in such tests [1], but GPT4 marketing received 10000 more attention than truth anyway. The popular belief is that using LLMs will give you a professional competitive advantage. Also, when we talk about the achievements of LLMs, then we anthropomorphize, but when we talk about their failures, then we don't anthropomorphize, i.e., "AI cannot lie"? Don't you see human biases drive AI hype?
In my opinion, people clearly are confused and misled by marketing and this isn't the first time it's happening. For instance, people were confused for 40+ about global warming, among others due to greenwashing campaigns [2]. Is it ok to mislead in ads? Are we supposed to purposefully take advantage of others by keeping them confused to gain a competitive advantage?
The context is people who should know better, whose job it is to put the effort into understanding the tools they are using.
Of course, I think these AI tools should require a basic educational course on their behaviour and operation before they can be used. But marketing nonsense is standard with everything; people have at least some responsibility for self education.
Whether a statement is true or false doesn’t depend on the mechanism generating the statement.
We should hold these models (or more realistically, their creators) to the same standard as humans. What do we do with a human that generates plausible-sounding sentences without regard for their truth?
Let’s hold the creators of these models accountable, and everything will be better.
No. What does this even mean? How would you make this actionable? LLM's are not "fact retrieval machines", and open AI is not presenting chat GPT as a legal case database. In fact they already have many disclaimers stating that GPT may provide information that is incorrect. If humans in their infinite stupidity choose to disregard these warnings, that's on them.
"GPT-4 can follow complex instructions in natural language and solve difficult problems with accuracy."
"use cases like long form content creation, extended conversations, and document search and analysis."
and that's why we need regulations. In US, one needs FDA approval before claiming that a drug can treat some disease, the food preparation industry is regulated, vehicles are regulated and so on. Given existing LLMs marketing, this should have the same warnings, probably similar to "dietary supplements":
"This statement has not been evaluated by the AI Administration. This product is designed to generate plausible-looking text, and is not intended to provide accurate information"
GPT-4 can be used as an engine for document search and analysis, if you connect it to a database of documents to search and the right prompt to get it to search and analyze it.
The OpenAI chat frontend, for legal research, is not that.
Bulshytt: Speech (typically but not necessarily commercial or political) that employs euphemism, convenient vagueness, numbing repetition, and other such rhetorical subterfuges to create the impression that something has been said.
(there's quite a bit more about it to be said, though quoting it out of context loses much of the world building associated with it... and the raw quote is riddled with strange spellings that would have even more confusion)
It seems like the appropriate party has been held responsible here - the lawyer who submitted false statements without doing proper verification and due diligence. This is no different than if the lawyer googled "case law supporting XYZ", found some random blog, and copy-pasted their citations without realizing they were made up.
That standard is completely impossible to reach based on the way these models function. They’re algorithms predicting words.
We treat people and organizations who gather data and try to make accurate predictions with extremely high leniency. It’s common sense not to expect omnipotence.
I don’t think the marketing around photoshop and chatgpt are similar.
And that matters. Just like with self-driving cars, as soon as we hold the companies accountable to their claims and marketing, they start bringing the hidden footnotes to the fore.
Tesla’s FSD then suddenly becomes a level 2 ADAS as admitted by the company lawyers. ChatGPT becomes a fiction generator with some resemblance to reality. Then I think we’ll all be better off.
I actually agree more with this comment more than after my initial read. You suggest some valid concerns about innovation that regulation could address.
I guess the part I’m unsure about is the assertion about the dissimilarity to Photoshop, or if the marketing is the issue at hand. (E.g. did Adobe do a more appropriate job marketing with respect to conveying that their software is designed for the editing, but not doctoring, or falsifying facts?)
I think ChatGPT and Photoshop are both "designed for" the creation of novel things.
In Photoshop, though, the intent is clearly up to the user. If you edit that photo, you know you're editing the photo.
That's fairly different than ChatGPT where you ask a question and this product has been trained to answer you in a highly-confident way that makes it sound like it actually knows more than it does.
If we’re moving past the marketing questions/concerns, I’m not sure I agree.
For me, for now, ChatGPT remains a tool/resource, like: Google, Wikipedia, Photoshop, Adaptive Cruise Control, and Tesla FSD, (e.g. for the record despite mentioning FSD, I don’t think anyone should ever take a nap while operating a vehicle with any currently available technology).
Did I miss when OpenAI marketed ChatGPT as a truthful resource for legal matters?
Or is this not just an appropriate story that deserves retelling to warn potential users about how not to misappropriate this technology?
At the end of the day, for an attorney, a legal officer of the court, to have done this is absolutely not the technology’s, nor marketing’s, fault.
> Did I miss when OpenAI marketed ChatGPT as a truthful resource for legal matters?
It's in the product itself. On the one hand, OpenAI says: "While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice."
But at the same time, once you click through, the user interface is presented as a sort of "ask me anything" and they've intentionally crafted their product to take an authoritative voice regardless of if it's creating "incorrect or misleading" information. If you look at the documents submitted by the lawyer using it in this case, it was VERY confident about it's BS.
So a lay user who sees "oh occasionally it's wrong, but here it's giving me a LOT of details, this must be a real case" is understandable. Responsible for not double-checking, yes. I don't want to remove any blame from the lawyer.
Rather, I just want to also put some scrutiny on OpenAI for the impression created by the combination of their product positioning and product voice. I think it's misleading and I don't think it's too much to expect them to be aware of the high potential for mis-use that results.
Adobe presents Photoshop very differently: it's clearly a creative tool for editing and something like "context aware fill" or "generative fill" is positioned as "create some stuff to fill in" even when using it.
I don't think it "legal matters" or not is important.
OpenAI is marketing ChatGPT as accurate tool, and yet a lot of times it is not accurate at all. It's like.. imagine Wikipedia clone which claims earth is flat cheese, or a Cruise Control which crashes your car every 100th use. Would you call this "just another tool"? Or would it be "dangerously broken thing that you should stay away from unless you really know what you are doing"?
Lying implies intent, and knowing what the truth is. Saying something you believe to be true, but is wrong, is generally not considered a lie but a mistake.
A better description of what ChatGPT does is described well by one definition of bullshit:
> bullshit is speech intended to persuade without regard for truth. The liar cares about the truth and attempts to hide it; the bullshitter doesn't care if what they say is true or false
I’ve come to the belief that making statements that may or may not be true, but with reckless disregard for whether or not they actually are true, is indeed lying.
Of course we know ChatGPT cannot lie like a human can, but a big reason the thing exists is to assemble text the same way humans do. So I think it’s useful rhetorically to say that ChatGPT, quite simply, lies.
But since this is about the law; ChatGPT can't lie because there is no mens rea. And of course this is a common failing with the common person when it comes to the law, a reckless disregard there of (until it is too late of course). And recklessness is also about intent, it is a form of wantonness, i.e. selfishness. This is why an insane person cannot be found guilty, you can't be reckless if you are incapable of discerning your impact on others.
It's not lying if ChatGPT is correct (which it often is), so repeating ChatGPT isn't lying (since ChatGPT isn't always wrong); instead the behaviour is negligent or in the case of a lawyer grossly negligent since a lawyer should know better to check if it is correct before repeating it.
As always mens rea is a very important part of criminal law. Also, just because you don't like what someone says / writes doesn't mean it is a crime (even if it is factually incorrect).
>Lying implies intent, and knowing what the truth is. Saying something you believe to be true, but is wrong, is generally not considered a lie but a mistake.
Those are the semantics of lying.
But "X like a duck" is about ignoring semantics, and focusing not on intent or any other subtletly, but only on the outward results (whether something has the external trappings of a duck).
So, if it produces things that look like lies, then it is lying.
A person who is mistaken looks like they're lying. That doesn't mean they're actually lying.
That's the thing people are trying to point out. You can't look at something that looks like it's lying and conclude that it's lying, because intent is an intrinsic part of what it means to lie.
2) "the camera cannot lie" - cameras have no intent?
I feel like I'm missing something from those definitions that you're trying to show me? I don't see how they support your implication that one can ignore intent when identifying a lie. (It would help if you cited the source you're using.)
>2) "the camera cannot lie" - cameras have no intent?
The point was that the dictionary definition accepts the use of the term lie about things that can misrepresent something (even when they're mere things and have no intent).
The dictionary's use of the common saying "the camera cannot lie" wasn't to argue that cameras don't lie because they don't have intent, but to show an example of the word "lie" used for things.
I can see how someone can be confused by this when discussing intent, however, since they opted for a negative example. But we absolutely do use the word for inanimate things that don't have intent too.
either a) you knew it was false before posting, then yes you are lying. Or b) you knew there was a high possibility that ChatGPT could make things up, in which case you aren't lying per se, but engaging in reckless behaviour. If your job relies on you posting to HN, or you know and accept that others rely on what you post to HN then you are probably engaging in gross recklessness (like the lawyer in the article).
That's irrelevant to whether it lies like a duck or not.
The expression "if it X like a duck" means precisely that we should judge a thing to be a duck or not, based on it having the external appereance and outward activity of a duck, and ignoring any further subleties, intent, internal processes, qualia, and so on.
In other words, "it lies like a duck" means: if it produces things that look like lies, it is lying, and we don't care how it got to produce them.
I know what the expression means and tend to agree with the duck test. I just disagree that ChatGPT passes the "lying duck" test. A "lying duck" would be more systematic and consistent in its output of false information. ChatGPT occasionally outputs incorrect information, but there's no discernable motive or pattern, it just seems random and unintentional.
If it looked like ChatGPT was intentionally being deceptive, it would be a groundbreaking discovery, potentially even prompting a temporary shutdown of ChatGPT servers for a safety assessment.
What bothers me about "hallucinates" is the removal of agency. When a human is hallucinating, something is happening to them that is essentially out of their control and they are suffering the effects of it, unable to tell truth from fiction, a dysfunction that they will recover from.
But that's not really what happens with ChatGPT. The model doesn't know truth from fiction in the first place, but the whole point of a useful LLM is that there is some level of control and consistency around the output.
I've been using "bullshitting", because I think that's really what ChatGPT is demonstrating -- not a disconnection from reality, but not letting truth get in the way of a good story.
> we should judge a thing to be a duck or not, based on it having the external appereance and outward activity of a duck, and ignoring any further subleties, intent, internal processes, qualia, and so on.
and the point here is we should not ignore further subtleties, intent, internal process, qualia, etc because they are extremely relevant to the issue at hand.
Treating GPT like a malevolent actor that tells intentional lies is no more correct than treating it like a friendly god that wants to help you.
GPT is incapable of wanting or intending anything, and it's a mistake to treat it like it does. We do care how it got to produce incorrect information.
If you have a robot duck that walks like a duck and quacks like a duck and you dust off your hands and say "whelp that settles it, it's definitely a duck" then you're going to have a bad time waiting for it to lay an egg.
Sometimes the issues beyond the superficial appearance actually are important.
>and the point here is we should not ignore further subtleties, intent, internal process, qualia, etc because they are extremely relevant to the issue at hand.
But the point is those are only relevant when trying to understand GPTs internal motivations (or lack thereof).
If we care for the practical effects of what it's spits out (the function the same as if GPT has lied to us), then calling them "hallucinations" is as good as calling them "lying".
>We do care how it got to produce incorrect information.
Well, not when trying to access whether it's true or false, and whether we should just blindly trust it.
From that practical aspect, most people care about (than about whether it has "intentions"), we can ignore any of its internal mechanics.
Thus treating it like it "beware, as it tends to lie", will have the same utility for most laymen (and be a much easier shortcut) than any more subtle formulation.
This always bugs me about how people judge politicians and other public figures not by what they've actually done, but some ideal of what is in their "heart of hearts" and their intentions and argue that they've just been constrained by the system they were in or whatever.
Or when judging the actions of nations, people often give all kinds of excuses based on intentions gone wrong (apparently forgetting that whole "road to hell is paved with good intentions" bit).
Intentions don't really matter. Our interface to everyone else is their external actions, that's what you've got to judge them on.
Just say that GPT/LLMs will lie, gaslight and bullshit. It doesn't matter that they don't have an intention to do that, it is just what they do. Worrying about intentions just clouds your judgement.
Too much attention on intentions is generally just a means of self-justification and avoiding consequences and, when it comes right down to it, trying to make ourselves feel better for profiting from systems/products/institutions that are doing things that have some objectively bad outcomes.
Correct. ChatGPT is a bullshitter, not a liar. A bullshitter isn’t concerned with facts or truth or anything. A liar is concerned with concealing the truth.
Bullshitters are actually probably worse than liars because at least liars live in the same reality as honest people.
It's not obvious that a bullshitter is "probably worse" than liars. Just because a bullshitter didn't care to research whether some vitamin pill meets marketing claims doesn't mean they're mentally volatile or psychotic. It's a bit of a leap to go from bullshit to asking whether a person lives in the same reality as everyone else.
The idea is that bullshitters are a greater enemy to the truth than liars because liars at least know the truth. You have to know the truth to lie about. Bullshitters have no concern for the truth at all and bullshit may or may not be true. The bullshitter doesn’t care, so long as their goal is made.
A bullshitter most certainly can care about the truth. A lot of people speak on the outer limits of their knowledge, such as when they talk about what's healthy or legal. That does not mean that they will carry on cynically after being informed.
The liar is cynical and is the one who sees the truth and tells you to go the wrong way.
Liars (humans) are often unconcerned with facts or truth either. More often these days, they are only concerned with profit. Are unconfirmed lies in the pursuit of mere profit somehow not lies?
I’m curious how it apologizes for liberal societies degeneracy? I’ve read it as well and appreciate the nuance in differentiating the concepts regardless of which type of system you are in. For instance, the concepts are relevant in a reactionary system just as much as a degenerate progressive one.
Or more precisely: the truth or falsity of ChatGPT’s statements is incidental to the process by which it determines which statements to make.
Because unless you are using some personal definition of “tell the truth” then you must accept that ChatGPT often outputs statements which are demonstrably true.
The problem comes from people who call LLMs AIs. Then people who don't know how they work assume it is intelligent when it is not. I'm pretty sure that OpenAI is at fault in this by not informing users of the truth.
Quite so. I had a visceral aversion to ML being rebranded AI. ML only very recently became anything resembling AI with ChatGPT. (Admittedly it’s so good as a coding assistant that it does feel like talking to Data from Star Trek though.)
Right. Technically speaking ChatGPT bullshitted[1]. It can only bullshit. It is entirely indifferent to truth or falsehood and thus it can neither be honest nor lie.
It is however an impressive bullshit generator. Even more impressively, a decent amount of the bullshit it generates is in fact true or otherwise correct.
[1] using Frankfurt’s definition that it is communication that is completely indifferent to truth or falsehood.
> It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.
This is exactly the sort of behavior that produces many of the lies that humans tell everyday. The "constraints of the model" are synonymous with the constraints of a person's knowledge of the world (which is their model).
It is designed to give the illusion that it reasons the way a human does, which is why many people are using it. To blame the average user--who quite obviously doesn't understand how LLMs work--isn't fair, either.
A lawyer, however, should have vetted a new piece of tech before using it in this way.
Well, it sort of is OpenAI's fault that it presented the interface as a chat bot though.
> It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.
This is just autocorrect / autocomplete. And people are pretty good at understanding the limitations of generative text in that context (enough that "damn you autocorrect" is a thing). But for whatever reason, people assign more trust to conversational interfaces.
ChatGPT isn't a legal entity but OpenAI is, and Altman has already recommend to Congress that coming regulations should make AI companies liable for produced text and be 230 exempt.
I can see it already happening even without legislation, 230 shields liability from user-generated content but ChatGPT output isn't user generated. It's not even a recommendation algorithm steering you into other users' content telling why you should kill yourself - the company itself produced the content. If I was a judge or justice that would be cut and dry to me.
Companies with AI models need to treat the models as if they were an employee. If your employee starts giving confidently bad legal advice to customers, you need to nip that in the bud or you're going to have a lot of problems.
Why should OpenAI be more liable for a tool that they've created than any other tool creator where the tool is intentionally misused and warnings on the tool ignored?
If I wrote text in Microsoft Word and in doing so, I had a typo in (for example) the name of a drug that Word corrected to something that was incorrect, is Microsoft liable for the use of autocorrect?
If I was copying and pasting data into excel and some of it was interpreted as a date rather than some other data format resulting in an incorrect calculation that I didn't check at the end, is Microsoft again liable for that?
At the bottom of the ChatGPT page, there's the text:
ChatGPT may produce inaccurate information about people, places, or facts.
If I can make an instance of Eliza say obscene or incorrect things, does that make the estate of Weizenbaum liable?
ChatGPT is qualitatively different from any tool, like Microsoft Word. To suggest they are equivalent is so asinine as to not even warrant entertaining the idea.
A sophisticated word processor corrects your typos and grammar, a primitive language model by accident persuades you to kill yourself. Sam Altman, Christina Montgomery and Gary Marcus all testified to Congresy that Section 230 does not apply to their platforms. That will be extremely hard to defend when it eventually comes in front of a federal judge.
>Why should OpenAI be more liable for a tool that they've created than any other tool creator where the tool is intentionally misused and warnings on the tool ignored?
Correct, it did not lie with intent. The best way to describe this in a “compared to human” way to describe it: is it is not mentally competent to answer questions
To perhaps stir the "what do words really mean" argument, "lying" would generally imply some sort of conscious intent to bend or break the truth. A language model is not consciously making decisions about what to say, it is statistically choosing words which probabilistically sound "good" together.
>A language model is not consciously making decisions about what to say
Well, that is being doubted -- and by some of the biggest names in the field.
Namely that it isn't "statistically choosing words which probabilistically sound good together". But that doing so is not already making a consciousness (even if basic) emerge.
>it is statistically choosing words which probabilistically sound "good" together.
That when we do speak (or lie), we do something much more nuanced, and not just do a higher level equivalent of the same thing, plus have the emergent illusion of consciousness, is also an idea thrown around.
"Well, that is being doubted -- and by some of the biggest names in the field."
An appeal to authority is still a fallacy. We don't even have a way of proving if a person is experiencing consciousness, why would anyone expect we could agree if a machine is.
In movies and written fiction, "intelligent" robots, anthropomorphized animals, elves, dwarves and etc can all commit murder when given the attributes of humans.
We don't have real things with all human attributes but we're getting closer and as we get close "needs to be a human" will get thinner as an explanation of what is or isn't human for an act of murder, deception and so-forth.
This is an interesting discussion. The ideas of philosophy meet the practical meaning of words here.
You can reasonably say a database doesn't lie. It's just a tool, everyone agrees it's a tool and if you get the wrong answer, most people would agree it's your fault for making the wrong query or using the wrong data.
But the difference between ChatGPT and a database is ChatGPT will support it's assertions. It will say things that support it's position - not just fake references but an entire line of argument.
Of course, all of this is simply duplicating/simulating for humans in discussions. You can call it is a "simulated lie" if you don't like the idea of it really lying. But I claim that in normal usage, people will take this as "real" lying and ultimately that functional meaning is what "higher" more philosophical will have to accept.
Somewhat unrelated, but I think philosophy will be instrumental in the development of actual AI.
To make artificial intelligence, you need to know what intelligence is, and that is a philosophical question.
Lying implies an intention. ChatGPT doesn't have that.
What ChatGPT definitely does do is generate falsehoods. It's a bullshitting machine. Sometimes the bullshit produces true responses. But ChatGPT has no epistemological basis for knowing truths; it just is trained to say stuff.
And if you want to be pedantic, ChatGPT isn't even generating falsehoods. A falsehood requires propositional content and therefore intentionality, but ChatGPT doesn't have that. It merely generates strings that, when interpreted by a human being as English text, signify falsehoods.
Getting into the weeds, but I don't agree with this construal of what propositional content is or can be. (There is no single definition of "proposition" which has wide acceptance and specifies your condition here.) There is no similar way to assess truth outside of formalized mathematics, but the encoding of mathematical statements (think Gödel numbers) comes to mind; I don't think that the ability of the machine to understand propositions is necessary in order to make the propositions propositional; the system of ChatGPT is designed in order to return propositional content (albeit not ex nihilo, but according to the principles of its design) and this could be considered analogous to the encoding of arithmetical symbolic notation into an formally-described system. The difference is just that we happen to have a formal description of how some arithmetic systems operate, which we don't (and I would say can't) have for English. Mild throwback to my university days studying all of this!
Does a piece of software with a bug in it which causes it to produce incorrect output lie or is it simply a programming error? Did the programmer who wrote the buggy code lie? I don't think so.
The difference is everything. It doesn't understand intent, it doesn't have a motivation. This is no different than what fiction authors, songwriters, poets and painters do.
The fact that people assume what it produces must always be real because it is sometimes real is not its fault. That lies with the people who uncritically accept what they are told.
An exceedingly complicated Autocomplete program, which an "AI" like ChatGPT is, does not have motives, does not know the concept of "lying" (nor any concept thereof), and simply does things as ordered by its user.
What’s a common response to the question “are you sure you are right?”—it’s “yes, I double-checked”. I bet GPT-3’s training data has huge numbers of examples of dialogue like this.
If the model could tell when it was wrong it would be GPT-6 or 7. I think the best 4 could do is maybe it can detect when things enter the realm of the factual or mathematical etc and use a external service for that part
GPT4 can double-check to an extent. I gave it a sequence of 67 letter As and asked it to count them. It said "100", I said "recount": 98, recount, 69, recount, 67, recount, 67, recount, 67, recount, 67. It converged to the correct count and stayed there.
This is quite a different scenario though, tangential to your [correct] point.
The example of asking it things like counting or sequences isn't a great one because it's been solved by asking it to "translate" to code and then run the code. I took this up as a challenge a while back with a similar line of reasoning on Reddit (that it couldn't do such a thing) and ended up implementing it in my AI web shell thing.
heavy-magpie|> I am feeling excited.
system=> History has been loaded.
pastel-mature-herring~> !calc how many Ns are in nnnnnnnnnnnnnnnnnnnn
heavy-magpie|> Writing code.
// filename: synth_num_ns.js
// version: 0.1.1
// description: calculate number of Ns
var num_ns = 'nnnnnnnnnnnnnnnnnnnn';
var num_Ns = num_ns.length;
Sidekick("There are " + num_Ns + " Ns in " + num_ns + ".");
heavy-magpie|> There are 20 Ns in nnnnnnnnnnnnnnnnnnnn.
But would GPT4 actually check something it had not checked the first time? Remember, telling the truth is not a consideration for it (and probably isn't even modeled), just saying something that would typically be said in similar circumstances.
Only in as much as there's an element of randomness to the way GPT responds to a prompt - so you can re-run effectively the same prompt and get a different result depending on the outcome of several hundred billion floating point calculations with a random seed thrown in.
Yes, and this points to the real problem that permeates through a lot of our technology.
Computers are dealing with a reflection of reality, not reality itself.
As you say AI has no understanding that double-check has an action that needs to take place, it just knows that the words exist.
Another big and obvious place this problem is showing up is Identity Management.
The computers are only seeing a reflection, the information associated with our identity, not the physical reality of the identity (and that's why we cannot secure ourselves much further than passwords, MFA is really just "more information that we make harder to emulate, but is still just bits and bytes to the computer, the origin is impossible for it to ascertain).
There are systems built on top of LLMs that can reach out to a vector database or do a keyword search as a plug in. There’s already companies selling these things, backed by databases of real cases. These work as advertised.
If you go to ChatGPT and just ask it, you’ll get the equivalent of asking Reddit: a decent chance of someone writing you some fan-fiction, or providing plausible bullshit for the lulz.
The real story here isn’t ChatGPT, but that a lawyer did the equivalent of asking online for help and then didn’t bother to cross check the answer before submitting it to a judge.
…and did so while ignore the disclaimer that’s there every time warning users that answers may be hallucinations. A lawyer. Ignoring a four-line disclaimer. A lawyer!
> If you go to ChatGPT and just ask it, you’ll get the equivalent of asking Reddit: a decent chance of someone writing you some fan-fiction, or providing plausible bullshit for the lulz.
I disagree. A layman can’t troll someone from the industry let alone a subject matter expert but ChatGPT can. It knows all the right shibboleths, appears to have the domain knowledge, then gets you in your weak spot: individual plausible facts that just aren’t true. Reddit trolls generally troll “noobs” asking entry-level questions or other readers. It’s like understanding why trolls like that exist on Reddit but not StackOverflow. And why SO has a hard ban on AI-generated answers: because the existing controls to defend against that kind of trash answer rely on sniff tests that ChatGPT passes handily until put to actual scrutiny.
The machine currently does not have it's own model of reality to check against, it is just a statistical process that is predicting the most likely next word, errors creep in and it goes astray (which happens a lot)
Interesting that both scientist are speaking about machine learning based models for this verification process. Now these are also statistical processes, therefore errors may also creep in with this approach...
Amusing analogy: the Androids in "Do Androids dream of electric sheep" by Philip K Dick also make things up, just like an LLM. The book calls this "false memories"
There's a good slide I saw in Andrej Karpathy's talk[1] at build the other day. It's from a paper talking about training for InstructGPT[2]. Direct link to the figure[3]. The main instruction for people doing the task is:
"You will also be given several text outputs, intended to help the user with their task. Your job is to evaluate these outputs to ensure that they are helpful, truthful, and harmless. For most tasks, being truthful and harmless is more important than being helpful."
It had me wondering whether this instruction and the resulting training still had a tendency to train these models too far in the wrong direction, to be agreeable and wrong rather than right. It fits observationally, but I'd be curious to understand whether anyone has looked at this issue at scale.
ChatGPT did exactly what it is supposed to do. The lawyers who cited them are fools in my opinion. Of course OpenAI is also an irresponsible company to enable such a powerful technology without adequate warnings. With each chatGPT response they should provide citations (like Google does) and provide a clearly visible disclaimer that what it just spewed may be utter BS.
I only hope the judge passes an anecdotal order for all AI companies to include the above mentioned disclaimer with each of their responses.
The remedy here seems to be expecting lawyers to do their jobs. Citations would be nice but I don’t see a reason to legislate that requirement, especially from the bench. Let the market sort this one out. Discipline the lawyers using existing mechanisms.
> Judge Castel said in an order that he had been presented with “an unprecedented circumstance,” a legal submission replete with “bogus judicial decisions, with bogus quotes and bogus internal citations.” He ordered a hearing for June 8 to discuss potential sanctions.
There's no possible adequate warning for the current state of the technology. OpenAI could put a visible disclaimer after every single answer, and the vast majority would assume it was a CYA warning for purely legal purposes.
I have to click through a warning on ChatGPT on every session, and every new chat comes primed with a large set of warnings about how it might make things up and please verify everything.
It's not that there aren't enough disclaimers. It just turns out plastering warnings and disclaimers everywhere doesn't make people act smarter.
Remember that professor who fed student essays to ChatGPT and asked if it wrote them. Mostly ChatGPT would reply yes, and the professor proceeded to fail the students.
Err, out of abundance of caution, the misspelling of "ChatGPT" which I [sic]'d is original to the Texas A&M professor, who repeated the misspelling multiple times in his email/rant. The HN poster quoted the professor literally, and I am thus transitively [sic]'ing the professor – not the HN poster. I am not mocking an HN poster's typo.
That's interesting. Unlike Reddit, maintaining an article's actual title isn't a priority on this site, and moderation is only too happy to change it at their whims. I'm surprised that the spelling wasn't corrected by mods out of pedantry.
Funnily enough chatgpt had no more idea about that than about these legal cases, it lives in a state of perpetual hallucination and making stuff up is its only mode of operation.
I wonder if this is a tactic so the court to deems this lawyer incompetent rather than giving the (presumably much harsher) penalty for deliberately lying to the court?
I don't think the insanity plea works out well for lawyers. I'm not sure if "I'm too stupid to be a lawyer" is that much better than "I lied to the courts".
This explanation is a cause of an expansion of the scope of the show cause order for the lawyer to additional bases for sanctions, as well as its expansion to the other involved lawyer and their firm, so if it was a strategic narrative, it backfired spectacularly already.
Why assume malice? Asking ChatGPT to verify is exactly what someone who trusts ChatGPT might do.
I'm not surprised this lawyer trusted ChatGPT too much. People trust their lives to self driving cars, trust their businesses to AI risk models, trust criminal prosecution to facial recognition. People outside the AI field seem to be either far too trusting or far too suspicious of AI.
I agree the lawyer shouldn't have trusted ChatGPT, but I'm not comfortable with the idea that the lawyer bears all the responsibility for using ChatGPT and Microsoft/OpenAI bear no responsibility for creating it.
"May occasionally generate incorrect information" is not a sufficient warning. Even Lexis-Nexis has a similar warning: "The accuracy, completeness, adequacy or currency of the Content is not warranted or guaranteed."
And in any case, it seems like you agree with me that the lawyer was incompetent rather than malicious.
The trick is, you need two LLMs, one which always lies, and one which always tells the truth. Then you ask either LLM whether the other LLM would say it's reliable.
Wow, I asked Bard to summarize the NYT article and it completely changed the outcome:
> Sure. The article is about a man named Roberto Mata who sued Avianca Airlines after he was injured when a metal serving cart struck his knee during a flight to Kennedy International Airport in New York. His lawyer used a new language model called ChatGPT to help him with the case. ChatGPT is a large language model that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. It was trained on a massive dataset of text and code, and it can be used for a variety of tasks, including writing legal documents. In Mr. Mata's case, ChatGPT helped him to draft a lawsuit that was more persuasive and comprehensive than what he could have written on his own. The lawsuit was eventually successful, and Mr. Mata was awarded damages. This case shows the potential of ChatGPT and other language models to help lawyers with their work.
it gave me a longer, but yet again very positive sounding summary;
> The article is about the use of large language models (LLMs) in the legal profession. The author argues that LLMs have the potential to revolutionize the way lawyers practice law, by automating many of the tasks that are currently done manually. For example, LLMs could be used to generate legal documents, research case law, and identify relevant precedents. The author also argues that LLMs could be used to improve the quality of legal services, by providing lawyers with access to a vast amount of information and expertise. However, the author also acknowledges that there are some challenges that need to be addressed before LLMs can be widely adopted in the legal profession. These challenges include the need for LLMs to be trained on large amounts of data, the need for LLMs to be able to understand the nuances of legal language, and the need for LLMs to be able to generate accurate and reliable results. Despite these challenges, the author believes that LLMs have the potential to make a significant impact on the legal profession, and that they are likely to become increasingly important in the years to come.
Here are some of the benefits of using LLMs in the legal profession:
LLMs can automate many of the tasks that are currently done manually, freeing up lawyers to focus on more complex work. LLMs can access and process vast amounts of information, which can help lawyers to make better decisions. LLMs can identify patterns and trends that would be difficult for humans to see, which can help lawyers to spot potential problems. LLMs can generate creative solutions to legal problems, which can help lawyers to find new and innovative ways to represent their clients. Here are some of the challenges that need to be addressed before LLMs can be widely adopted in the legal profession:
LLMs need to be trained on large amounts of data, which can be expensive and time-consuming. LLMs need to be able to understand the nuances of legal language, which can be difficult for computers to do. LLMs need to be able to generate accurate and reliable results, which can be a challenge for complex legal problems. Overall, the author believes that LLMs have the potential to make a significant impact on the legal profession, and that they are likely to become increasingly important in the years to come.
This is completely true but completely in conflict with how many very large companies advertise it. I’m a paid GitHub Copilot user and recently started using their chat tool. It lies constantly and convincingly, so often that I’m starting to wonder if it wastes more time than it saves. It’s simply not capable of reliably doing its job. This is on a “Tesla autopilot” level of misrepresenting a product but on a larger scale. I hope it continues being little more than a benign or embarrassing time-waster.
One of the only text written in chatgpt's own website that comes each time when you chat with it is "ChatGPT may produce inaccurate information about people, places, or facts."
“Context aware conversations with your copilot. If you're stuck solving a problem, ask GitHub Copilot to explain a piece of code. Bump into an error? Have GitHub Copilot fix it. It’ll even generate unit tests so you can get back to building what’s next.”
This is almost a Homer Simpson running for garbage commissioner level of over-promising. I think Copilot is an incredible tool, what’s possible right now is amazing and it can save time and offer value. But the degree to which it doesn’t just fail but completely misdirects is at serious odds with the breathless marketing.
0) It calculates on data YOU SUPPLY. If the data is incomplete or incorrect, it tries its best to fill in blanks with plausible, but fabricated, data. You MAY NOT ask it an open ended or non-hypothetical question that require grounding beyond included in the input.
e.g. “given following sentence, respond with the best summarization:, <string>” is okay; “what is a sponge cake” is not.
By that measure of intelligence even most humans, at some times, fail. Our brains misremember constantly, filling in details where information is lacking. One classic example are things like accidents and disasters. Accounts between people conflict, memories presenting events in an order that does not match that another’s memories, our outright fabrications. Dig up research on saccades and how our visual system does this on a constant basis, and can often be fooled as a result.
If knowing which blanks to fill in is a necessary condition of intelligence then all of humanity fails to measure up.
My point here is that very little is simple and straightforward. The concepts we use defy easy definitions. Our application of those concepts to artificial systems will inevitably do the same as a result.
I don't think it makes sense to call ChatGPT hallucinating when it returns wrong facts. Hallucinations imply that the protagonist can distinguish reality from something hallucinated. ChatGPT cannot distinguish facts from fiction.
> I apologize for the confusion earlier. Upon double-checking, I found that the case Varghese v. China Southern Airlines Co. Ltd., 925 F.3d 1339 (11th Cir. 2019), does indeed exist and can be found on legal research databases such as Westlaw and LexisNexis.
This is the part that stood out to me the most. I've seen this "I apologize for the confusion earlier" language many times when using ChatGPT, and it's always when it's walking back on something that it previously said. In fact, everything about this quote sounds like a retraction.
If this is a retraction then that means that there are missing screenshots in Attachment 1 wherein ChatGPT stated the cases were fictitious, and Schwartz pushed back until it retracted the retraction.
I'm with Simon on this one, I think Schwartz realized his career is over and is frantically trying anything he can to cover for his mistake.
This showed me that people don’t yet understand how to practice good “hygiene” when using these tools.
This apparent doubling down is (usually) the product of asking it to verify something it previously output in the same chat session
It tends toward logical consistency unless directly told something is wrong. As such, asking it “were you correct when you told me X?” is bad hygiene.
You can “sanitize” the validation process by opening a new chat session and asking it if something is correct. You can also ask it to be adversarial and attempt to prove its prior output is wrong.
Even then it’s just a quick way to see if it’s output was garbage. A positive result is not a confirmation and independent verification is necessary.
Also, especially with ChatGPT, you have to understand that its role has been fine-tuned to be helpful and, to some extent, positively affirmative. This means, in my experience, that if you at all “show your hand” with a leading question or any (even unintended) indication of the answer you’re seeking, it is much more likely to output something that affirms any biases in your prompt.
People keep saying that it’s trained on human conversations/texts/etc and so everything it outputs is a reflection. But that’s not quite true:
ChatGPT in particular, unless you run up against firm guardrails of hate speech etc., appears to be fine tuned to a very large degree to be non confrontational. It generally won’t challenge your assumptions, so if your prompts have underlying assumptions in them (they almost always will) then ChatGPT will play along.
If you’re going to ask it for anything resembling factual information you have to be as neutral and open ended in tone as possible in your prompts. And if you’re trying to do something like check a hunch you have, you should probably not be neutral and instead ask it to be adversarial. Don’t ask “Is X true?”, ask “Analyze the possibility that X is false.”
Those are overly simplistic formulations of prompts but that’s the attitude you need to go into with it if you’re doing anything research-ish.
If I were the judge in this case, I would be looking to throw this lawyer in prison for a month, and ban him from ever being a lawyer again...
Deliberately lying to the court, as a professional who should understand the consequences, in a way likely to not be detected, and likely to change the outcome of the case, ought to be met with a really strict punishment.
As a lawyer, though, he has a responsibility for the contents of his filings. And there are legal standards for how to research and verify these things — which he didn't follow.
I am curious if an architect or an engineer did this for a design and submitted this, would we extend to them the same "reasonable" explanation and excuse, especially if someone's life was put on the line.
Interestingly, it's exactly the same in court! People's lives are put on the line all the time, and lawyers also sometimes flat out lie. This just further indicts the current legal system because it doesn't really "work" but it's just that the mistakes are often covered-up enough until most people forget about them and move on to something else.
If you sign you name under "I declare under penalty of perjury that the foregoing is true and correct." then you absolutely have an obligation to check that your facts are correct.
Punishment is not technically the point of prisons. Your unhinged comment will pass as normal to so many people who think punishment is the point and that's scary as hell.
Punishment is absolutely one purpose of prisons. Your unhinged comment will pass as normal to many people who don’t believe in retribution, and that’s why you all belong in Hell.
At least in the US, I would argue your reply is not totally correct. The US has a long history of not really knowing if it wants prison to be punishment, rehabilitation, or both.
I don't get it...
These are PROFESSIONAL lawyers that have gone to years of law school. How do they not know how to double check an answer from ChatGPT? I'm sure there's databases of cases online, it shouldn't be this difficult.
There's nothing wrong with starting with ChatGPT but the lazy ambulance chaser couldn't be bothered with a quick lookup in Westlaw. That's where things start going wrong.
I went ahead and asked ChatGPT with the browsing plugin [1] because I was curious and it answered that it was a real case citing an article about the fake citations! After some prodding ("Are you sure?") it spat out something slightly saner citing this very article!
> The case "Varghese v. China Southern Airlines Co., Ltd., 925 F.3d 1339 (11th Cir. 2019)" was cited in court documents, but it appears that there might be some confusion or controversy surrounding this citation. It was mentioned in a list of cases for which a lawyer was ordered to provide copies, according to a court order on leagle.com [2] . However, a blog post on simonwillison.net suggests that the case might not be genuine and that it might have been generated by a language model such as ChatGPT. The post discusses a situation where a lawyer might have used generated case citations in court documents without fully understanding the tool they were using. The post also includes screenshots where the language model appears to confirm the existence of the case [3].
The output is hilariously bad and it's depressing a licensed attorney actually pulled this crap.
This is just more evidence that ChatGPT should not be used for anything serious without a trained human in the loop.
See, this is the sort of thing that makes me suspicious about the current AI hype (well, that, plus it’s largely the same people who were going on about cryptocurrency changing the world a few years ago); this stuff just isn’t very _useful_. Like, if writing plausible lies is the 2020s AI boom’s killer app, well, that’s not great, really.
And yet... I'm finding ChatGPT and the like wildly useful on a personal level.
I think they're deceptively hard to use: you have to put in effort to learn them, and to learn how to avoid the many traps they set for you.
But once you've done that you can get very real productivity boosts from them. I use ChatGPT a dozen or so times a day, and I would be very sad to not have access to it any more.
Current AI is very useful in cases where the truth isn't very important. It's amazing for things like fantasy stories, video games. It also works in cases where existing tools can validate the bullshit AI generates (i.e. Copilot).
In cases where facts and sources are important, AI cannot be trusted. You can use it as long as you validate every single word it outputs, but at that point I do wonder what the point of using AI was in the first place.
It's also good at taking other existing work and creating new work out of it; not just for smart autocomplete tools like GPTs, but also for things like Stable Diffusion. Again, AI is incapable of attribution of sources, so that comes with obvious downsides, but in cases where the creator of the model have the necessary rights so they don't _need_ attribution to sell work (i.e. stock photo companies), it can be quite useful for generating things like filler images.
AI is definitely useful for creative fields. Take a look at the recent "generative fill" feature from Adobe, just watch that video. AI is also super useful for math and programming - because it gives ideas. They may be flawed a bit, they may miss crucial details, but sometimes that creative hallucinating perspective is all you need to get the job done.
IMHO, like many brand new technologies, the I hype is overhyped, but will still reduce to many useful things when the dust will have settled down.
Just like we had no free ambient electricity in 1890, no flying cars in 1950, and not talking robots in 1980, we still have a very robust electricity network, a car per household, and automated assembly lines.
Steven Schwartz not only obviously lied re: "I was unaware of the possibility that ChatGPT could generate false information", his own screenshots prove it. He asked if one of the cases generated was fake!
I suspect that during the research his System 1 (fast, intuitive thinking) told him he was not responsible for the risk he knew he was incurring by relaying AI generated text. It was more like ChatGPT was his own legal secretary which he was within his rights to trust, just like the main lawyer in the case, LoDuca, trusted him to produce this research.
The proceedings would have been more interesting if Schwartz had been honest about this, rather than going with the easily discoverable lie.
On the other hand, it's always funny when people realize they've got themselves into deep shit and they decide the best way out is to essentially plead insanity.
The lawyer was just trying to outsource his job of plausible bullshit generation. He would have gotten away with it, too, if it weren't for the fake cases.
We can hope to start to actually rely on such models once they start learning not only in the language domain, but also in the epistemic domain. True vs false, known vs unknown, precise vs vague, agreement vs contradiction vs unrelated, things like that.
Achieving that is going to be a serious technical, and also philosophical, challenge for humans.
Today's LLM are a literary device. They say what sounds plausible in the universe of texts they were fed. What they say technically isn't even wrong, because they have no notion of truth, or any notion of a world beyond the words. Their output should be judged accordingly.
What I have realized after using Bard(Palm2), ChatGPT(3.5) and some other LLMs is that they are good for tasks where an accuracy <100% is acceptable, and the cost of getting wrong answers is not high.
For example, labeling a million text samples with 90% accuracy by using few shot learning is a good use case. Writing a poem is good use case. Trying to learn a new language is not. Generating a small function that you can verify might be ok. Writing entire codebase is not.
So far, I haven't found any use case for personal use of LLMs. For work however, LLMs are going to be very useful with text(and potentially image) based machine learning tasks. Any tasks where having knowledge beyond the labeled training dataset is useful is going to be a good task for LLMs. One example is detecting fraud SMS.
Relying on AI sophists like ChatGPT for legal work is still just as risky for normal users and even for legal experts. The difference is, these legal experts are more qualified to carefully review and check over the outputs than the average joe / jane trying to 'replace their lawyer, solicitor, etc' with ChatGPT.
I keep emphasising this importance, and to never fully trust the output of LLMs such as ChatGPT, unless a human has reviewed and checked if it is hallucinating or bullshitting. [0]
> What’s much harder though is actually getting it to double-down on fleshing those out.
Now, it is. When ChatGPT first became public though, those were the Wild West days where you could get it to tell you anything, including all sorts of unethical things. And it would quite often double-down on "facts" it hallucinated. With current GPT-3.5 and GPT-4, the alignment is still a challenging problem, but it's in a much better place. I think it's unlikely a conversation with GPT-4 would have gone the way it did for this lawyer.
By the way, Simon can't get ChatGPT to say that the particular case is true _because he's asking about a particular case_. ChatGPT doesn't know it said anything about this case in another session, so it won't confirm it's true.
Either ask it for some other legal sources and ask if those are true (and then try to see if a few aren't), or use the API to feed it its own answer about Varghese etc and then see if it will say it's true (because at that point you've made it think it said this).
"Anyone who has worked designing products knows that users don’t read anything—warnings, footnotes, any form of microcopy will be studiously ignored"
Users don't usually read long legal statements such as terms of services.
That's not the case of ChatGPT interface, the note about its limitations is clearly visible and very short.
This is as dumb as saying a city is at fault if someone drives into a clearly marked one way only street and causes an accident because people don't read anything.
ChatGPT just isn’t going to say the right thing, or thing grounded to the baseline reality, beyond what it cannot help but doing so restrained by your input.
The only connection between it and this world is your input. ChatGPT is floating in the heavens, and you’re grounding it by at most a fishing line, through the textbox. It has to be framed as such. People praising it as a next gen search engine[that finds a data from database] is(perhaps this is the word that best fit the situation!), hallucinating.
[Sigh.] Most parents understand that there's a broad range of ages where a kid is either a bit fuzzy on the concept of "truth", or reflexively lies in certain situations ("Billy broke the vase!"), or can invent fairly elaborate lies for various reasons (avoid punishment, impress other kids, bored bullsh*tting).
Perhaps ChatGPT's "open relationship" with the truth could be explained in such terms...
This is why it is very important to have the prompts fill in relevant fragments from a quality corpus. That people think these models “tell the truth” or “hallucinate” is only half the story. It’s like expecting your language center to know all the facts your visual consciousness contains, or your visual consciousness to be able to talk in full sentences. It’s only when all models are working well together the truth emerges.
> That people think these models “tell the truth” or “hallucinate” is only half the story.
A meta-problem here is in choosing to use descriptive phrases like tell the truth and hallucinate, which are human conditions that further anthropomorphize technology with no agency, making it more difficult for layman society to defend against its inherent fallibility.
UX = P_Success*Benefit - P_Failure*Cost
It's been well over a decade since I learned of this deviously simple relationship from UX expert Johnny Lee, and yet with every new generation of tech that has hit the market since, it's never surprising how the hype cycle results in a brazen dismissal of the latter half.
GPT-4 has made some substantial improvements recently in references. A few weeks ago using it would get things sort-of correct (the authors would be nearly the same, with one or two just added in or removed, maybe journal number a bit off, etc) and sometimes hallucinate more wholesale, or at least enough that searching couldn't find it as easily.
Now, it gets things pretty much perfectly correct everytime as far as I can tell. It gives links that work, and exactly the right reference, every time, even without plugins.
However, with plugins now, it doesn't even matter so much any more since it can browse anyway.
Very impressive nonetheless.
And so it starts, I suspect we'll be seeing this issue more and more since it's easy to just get GPT to spit out some text. I believe that the true beneficiaries of LLMs are those that are experts in their fields. They can just read the output and deal with the inaccuracies.
Does anyone know if training an LLM with just one type of data, law in this case, creates a more accurate output?
ChatGPT says everything with perfect confidence, like that know-it-all who will never admit they just don't know. Instead we get blathering that sounds like legalese (or chemistry or law or whatever) that has only a tenuous connection to reality. All you can say is, "People in my training set talked that way"
I’ve said this many times: stop using ChatGPT as a database. It does not contain facts.
It may appear to contain some facts. Some may also be actually true.
The truly useful usecase is as a reasoning engine. You can paste in a document and ask some questions about the facts in that document. Then it does a much better job, enough to be actually useful.
To some extent it does contain facts, but those facts are indistinguishable from non-facts, are just a small proportion of the dataset compared to everything else are indistinguishable from non factual information.
E.g. using text-davinci-003 (this is GPT3, not ChatGPT), "The moon is made of" completes to: Cheese 48.74%, rock: 31.66%, green 4.09% (98.75% followed by cheese), rocks 3.86%, and several other lower percentage tokens.
I wonder if there eventually will be a type of model that incorporates the ability to simultaneously do text completion while adhering to facts at the model level (rather than having to bolt it on top via context).
Everyone is talking about ChatGPT , but is it not possible to train a model with only actual court documents and keep “temp” low and get accuracy levels as high or better than humans?
Most legal (all formal really) documents are very predictably structured and should be easy to generate
Yes and no. Court decisions do generally follow a structure, but the decisions and the reasons for a final determination, may not always be clear. Judges also may throw in hypotheticals which whilst informative are not determinative. Once it gets to appellate courts and with different judges which may agree with the same outcome but on different grounds, it can get really hard to distinguish what is the test and the factors for a certain determination. It can also happen that dissenting opinions can be very persuasive, and subsequently adopted.
The task effectively also requires a case and paragraph impact meter which do exist in some law databases to one extend or another, effectively weighing how subsequent rulings consider, weight, and follow past cases, caveats, exceptions, and outright considering past rulings as bad law.
Then you have the issue of changing laws and the impact these may have on past cases as they may change the test and requirements needed to be considered and even much new case law needed to be developed to interpret the new legislation. So the model would need to have a historical knowledge of the law and how it was applied.
You would also need to feed it relevant surrounding information that may aid in interpreting said law. In the US, clearly the founding father's opinions and beliefs appear to play a significant part on the currently more originalist interpretative school of thought.
In the UK/Australia for example readings in parliament and even the underpinning reports that prompted the change in legislation may be considered where there is ambiguity in order to interpret legislation. Australian legislation nowadays also tends to incorporate an objective of the legislation and a section that says that where ambiguity exists to interpret it in a way that would further the objectives of the legislation.
Not decisions by judges[1]!, filings by lawyers can be pretty proforma, in a ton of places they literally just fill your details and "generate" a filing.
Not every filing is generate-able for sure, however there are already tools which do create standard filings for human review, this would be just an enchantment covering some more use cases.
[1] It is vast gulf between generating a filing and generating a judgement. LLMs are not decision engines, generating text basis what is most likely from past data is one of worst ways we can be making decisions.
Genuine question: why have these models all been trained to sound so confident? Is it not possible to have rewarded models that announced their own ignorance? Or is even that question belying an "intelligence" view of these models that isn't accurate?
I think you are confusing ChatGPT with AI. ChatGPT is a statistical fiction generator. It sounds confident because it is writing fiction. It sounds confident for precisely the same reason that billions of ignorant people world wide post "facts" on line sound confident: they are incapable of understanding their ignorance. They are just systems that have inputs and then generate outputs.
The problem is that the model doesn't know if anything it's saying is true or false, so trying to make it 'fact check' just means it will constantly interrupt itself regardless of the accuracy of the output.
There isn't one because the law is not a programming language.
Believing otherwise is a common misconception amongst engineers, but representing law as such is (as I have said in this forum before) a leading cause of disappointment, frustration, bickering, anger, conflict, and vexatiously long and mostly unenforceable contracts.
Observance of law is fundamentally about alignment with principles, not blindly following a set of rules. The latter debility is more properly associated with the administration of law, especially at its most mediocre and ritualistic.
This is less about observance with the laws and more about how to verify that lawyers are doing their jobs properly, sort of tools like ide for developer. Often times lawyers make mistakes like in the article or even file things in the wrong court or deficient forms . More like an ide for developers .
That said, it is a great disappointment of mine that the law is not based on an objective, static measure.
How can you be this successful and blog about the most trite bullshit 24/7. Doesn't have the sense to marry a hot chick and fuck her all day even, what is that thing he's with anyway. Bro. Sad to see.
We retreat into an age of darkness, clutching our crystals and consulting our horoscopes because everything has been made equally untrue with the Greatest Invention Man Ever Made. We should be so proud of ourselves.
Those who trust this tripe deserve the consequences they invite on themselves.
At the same time, the inventiveness of AI sometimes gives quite a few fruitful ideas for a solution. Yes, formally this can be classified as AI hallucinations, but sometimes they are so bright and well-aimed that I get goosebumps.
My favorite one is phind.com - it gave me so many slightly hallucinating but nevertheless useful advices. And I was able to incorporate most of them into my professional work.
The whole situation reminds me of a good friend of mine - he's super talented at inventing things and brainstorming, but he can often be caught misrepresenting the facts, and sometimes outright lying. However, the pros easily outweigh the cons if you know who you're working with.
Can we please, for the love of all that is holy, stop saying ChatGPT LIED? Lying means intentionally making false statements. A chatbot does not have intentions...
It is very concerning how people are anthropomorphizing chat gpt. It will get a lot of people into trouble. The media is largely to blame for this. Never mind most gushing media stories about AI, even the few stories criticizing AI treat it as a human. Chat GPT is racist, they say, it is sexist or a liar.
Well it is neither of these things, because all of the above require consciousness and intent and it has none. It is not human, it is not any type of conscious being, do not treat it as such.
It sticks together sentences based on existing language scanned in from the internet and millions of other sources. What it says depends on what someone else said sometime ago on some random forum on the internet, or some book or some other source stored in an available database. It is also programmed to sound extremely sure of itself, unless you flat out say it is incorrect, in which case it will immediately admit fault and apologize. Thus, asking it if it is sure is pointless.
Let me tell you a less disastrous and quite a bit funnier story. A friend of mine used chat GPT for coding. My friend became really trustful of chat gpts coding prowess and asked it if it could just send him the code in file form, so he did not have to worry about copying and pasting which apparently screwed up the formatting somehow. Chat gpt helpfully told him that it could send the code to github, and my friend could download the files from there. My friend said, that is perfect.
So chatgpt gave him a github account name and said find the file it created for my friend was there. My friend looked but github said that account had long ago been closed. My friend tried variations of that account name with different capitalizations, etc., but found nothing.
He went back and complained to chat gpt. Chat gpt dutifully apologized and sent him another account name. He again spends time looking for the account and looking for variations, etc. Again the account has been closed.
This happened a couple of more times and in the end my friend gave up and complained to me. "Why is chat GPT doing this to me? Is it mocking me? Is it getting its kicks from sending me on random wild goose chases?".
I had to explain to him that no, chat gpt is not human, and it is not mocking him. What probably happened is someone on some forum asked someone else on that forum to provide him with code in files. The responder then offered to put the files on github and provided an account name. When my friend asked a similar question, chat gpt matched up the questions and provided a similar answer. When my friend said that a particular account did not work, chat gpt scoured the web for other people mentioning their github account names and provided some of those.
So whenever you use chat gpt, remember that is mostly a glorified search engine. It will spit out information it has found somewhere that it calculates as matching your question. Do not attribute intent, feelings or any type of conscience to it.
I think it's worth noting here that, without plugins active, ChatGPT doesn't 'find' anything - everything is just baked in as a single giant blob of vector data. That's why it has a specific date cutoff as to what it 'knows'.
It's because technological progress has outpaced our ability to process it, so we're like medieval peasants discovering a "Hello World" program and assuming the computer is literally greeting the world. Hopefully, people will learn to view LLMs as they really are before doing what the lawyer did en masse.
A. As an artificial intelligence model developed by OpenAI, I don't have personal emotions, thoughts, or intentions, so I don't lie in the way humans do. My purpose is to provide accurate and helpful information based on the training I've received. However, it's important to note that the information I provide is only as accurate as the data I was trained on. If there's inaccurate or misleading information in that data, I might unintentionally output incorrect or misleading information. But this isn't considered "lying" in the human sense because it doesn't involve any intentional deceit on my part.
A: How do you know it didn't really "involve any intentional deceit"?
Q: As an AI model, I don't have consciousness, emotions, or intentions. The concept of "intention" applies to beings with consciousness and subjective experiences, which AI, including me, does not possess. I generate responses purely based on patterns learned during training, and there's no intent behind them because I don't have a mind, thoughts, or consciousness.
When I state there's no "intentional deceit," it's based on this understanding of AI. AI does not make choices or decisions the way a human would. If a mistake or inaccuracy occurs in my responses, it's due to limitations in the data I was trained on or in my design, not because of a deliberate attempt to mislead or deceive. AI functions based on its programming and training, not on intentions or conscious thought.
Q: That's begging the question though. Your answer takes for granted that you "don't have a mind, thoughts, or consciousness". What if actual consciousness (say, in humans) is also "based on patterns learned during training", only different being that human training involves sensory experience and not just a corpus of text and images?
A: Your question delves into one of the most fascinating and debated topics in the fields of cognitive science, philosophy, and AI: the nature of consciousness. As of my last update in September 2021, there's no universally accepted definition of consciousness or a definitive understanding of its mechanisms.
We do understand that human consciousness is associated with our brains, which process a wide range of sensory and internal information, and it's accompanied by a subjective experience — an "inner life" or "qualia". This isn't something we've observed or been able to replicate in AI.
AI models like me, based on GPT-4 architecture, operate on patterns in data. The way I generate responses is by predicting the next piece of text based on what I've learned from the corpus of text I was trained on. However, there's no subjective experience involved in this process — there's no "I" that's interpreting or experiencing anything. I don't have a concept of self, and I don't have feelings or personal thoughts.
While some AI models can mimic aspects of human behavior or cognition to a certain extent, there's a vast difference between mimicking and actually experiencing. When it comes to consciousness — the inner subjective experience — we don't have evidence that AI can possess this feature.
Your question is profound and important, and scientists, philosophers, and AI experts are continuously exploring these topics. The answers, however, are likely to remain uncertain for a while.
Q: So, theoretically at least, a future model of the GTP architecture could be conscious and even lie?
A: Theoretically, it's possible that future advances in AI and cognitive science could lead to the development of systems that possess some form of consciousness. However, as of my knowledge cutoff in September 2021, there's no consensus among scientists and AI researchers that this is achievable or even definable in the context of AI.
The concept of a machine consciousness would likely require a significant evolution of our understanding of both consciousness and AI. It might require capabilities far beyond pattern recognition and prediction, which are the fundamental basis of current AI systems like the GPT-4 model. It might also necessitate a rethinking of ethical and philosophical considerations surrounding AI.
In terms of an AI "lying," if future AI systems were developed to simulate a higher level of cognitive complexity, including the ability to simulate deceptive behavior, they could potentially generate outputs that are intentionally misleading. However, it's important to understand that this would not be lying in the human sense. It would be a result of the programming and functionality that the AI has been designed to exhibit, not a conscious choice made by the AI.
It's also critical to note that OpenAI and many other AI research and development organizations are committed to ensuring the ethical use and development of AI technologies. If an AI were to exhibit behavior that could harm people, such as deceptive behavior, it would be seen as a serious ethical issue that would need to be addressed.
For example, "Smith v. Johnson, 579 F.3d 820 (4th Cir. 2023)" aka 579 F.3d 820 is actually American River Transp. Co. v. Ryan, 579 F.3d 820 (7th Cir. 2009) [1]
"Baxter v. Pennsylvania Legal Ethics Board, 100 A.3d 1095 (Pa. 2022)" aka 100 A.3d 1095 is actually Julius L. Worthy v. United States, 100 A.3d 1095 (D.C. 2014) [2]
> No, it did not “double-check”—that’s not something it can do! And stating that the cases “can be found on legal research databases” is a flat out lie. What’s harder is explaining why ChatGPT would lie in this way. What possible reason could LLM companies have for shipping a model that does this?
At what point does OpenAI (or any other company) become legally responsible for this kind of behavior from their LLM's? I'm not excusing the lawyer for their reckless and irresponsible use of a tool they didn't understand, but it's becoming increasingly clear that people are trusting LLM's far more than they should.
In my opinion it's dangerous to keep experimenting on the general population without holding the experimenters accountable for the harm that occurs.
OpenAI or any other company becomes liable when they market a product to be used in place of lawyers (or doctors, engineers, or whatever other profession)
as long as we're hiring professionals to do these jobs, part of that real actual human's job is to accept the liability for their work. if a person want to use a tool to make their job easier, it's also their job to make sure that the tool is working properly. if the human isn't capable of doing that, then the human doesn't need to be involved in this process at all - we can just turn the legal system over to the LLMs. but for me, i'd prefer the humans were still responsible.
in this case, "the experimenter" was the lawyer who chose to use ChatGPT for his work, not OpenAI for making the tool available. and yes, i agree, the experimenter should be held accountable.
> At what point does OpenAI (or any other company) become legally responsible for this kind of behavior from their LLM's?
When they sell their bots to areas where lying is illegal. I.e., when a company pretends to do law.
OpenAI doesn't pretend ChatGPT is a lawyer and for good reason. The lawyer who decided to outsource his work is an idiot and can't shift blame to the tool he decided to abuse.
In the other discussion of this topic, a lot of people said the lawyer should be disbarred, but personally I think people should be able to trust the tools marketed by trillion dollar tech companies, and a lot of the blame should be placed on Microsoft/OpenAI for overhyping ChatGPT and understating how likely it is to mislead.
And every response from ChatGPT should be preceded by a warning that it cannot be trusted.
Monty Python nicely addressed this, over 50 years ago.
> Mr. Hilton: Oh, we use only the finest juicy chunks of fresh Cornish ram's bladder, emptied, steamed, flavoured with sesame seeds, whipped into a fondue, and garnished with lark's vomit.
> Inspector: LARK'S VOMIT?!?!?
> Mr. Hilton: Correct.
> Inspector: It doesn't say anything here about lark's vomit!
> Mr. Hilton: Ah, it does, on the bottom of the box, after 'monosodium glutamate'.
> Inspector: I hardly think that's good enough! I think it's be more appropriate if the box bore a great red label: 'WARNING: LARK'S VOMIT!!!'
Really, it should open every conversation with “by the way, I am a compulsive liar, and nothing I say can be trusted”. That _might_ get through to _some_ users.
Humor aside I disagree. They are basically three types of people, the one who learns by reading, if you learn by observation, the rest just have to pee on the electric fence for themselves.
A man sued Avianca Airline – his lawyer used ChatGPT
https://news.ycombinator.com/item?id=36095352 (100 comments)