Hacker News new | past | comments | ask | show | jobs | submit login
ChatGPT-Authored Legal Filing “Replete with Citations to Non-Existent Cases" (reason.com)
62 points by dpifke 6 months ago | hide | past | favorite | 71 comments

If a an engineering firm drew up plans for a building that didn't meet even the most basic structural standards, and then PEs signed/stamped those drawings... That would be grounds for people to lose licenses, companies to be majorly fined, and in the extreme case where injury was involved, criminal charges.

A lawyer just straight making up citations of case law, even if hallucinated by an AI assist tool, should have similar repercussions re: license to practice law.

There needs to be no leniency for people misusing 'well, the AI told me...' as a crutch for actual knowledge and expertise in a professional field.

There needs to be no leniency for people misusing 'well, the AI told me...' as a crutch for actual knowledge and expertise in a professional field.

I think that should apply to any tool, but sadly it seems even most developers aren't immune to blindly trusting their tools, judging by what I've seen over the years.

As a memorable warning sign says: "The machine has no brain. Use your own."

> As a memorable warning sign says: "The machine has no brain. Use your own."

First lesson of the first day of the first year in my CS degree -- the professor starts saying something to the effect of: "I want you to think of computers as meaningless machines doing meaningless operations on meaningless symbols".

To this day, that sentence is a massive influence in how I think about computers and technology in general.

Even if it has a brain I can still recommend using your own :P

Word! GPT is a tool, you don't blame a hammer for ruining someones window after you put the hammer through the glass.

The lawyer claims that they have "never utilized Chat GPT as a source for conducting legal research prior to this occurrence and therefore was unaware of the possibility that its content could be false."

Every time I open the thing I get a popup with "While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice."

There is another phenomenon at play here imo.

If it gives you what you want, or something that looks like what you want or need, it gets harder and harder to say no and not use it.

A few times I was tempted to just go it blindly and I’m glad I didn’t.

TL;DR: verifying LLM output gets tiring.

In parts of popular media, there seems to be an assumption of a very high certainty for an impending "economic productivity revolution thanks to LLM AI". Case studies like this one show the future isn't so clear.

I think this future seems very clear, and just because ChatGPT won’t put thousands of attorneys or paralegals out of work tomorrow doesn’t mean it’s not on the horizon. OpenAI’s model was created for general purpose use so it’s not surprising that it fails in very specific and nuanced areas.

If we use the same principals but fine tune the (or create a new) model with the appropriate legal information, I think we’ll see far better results. If not now, then in the very near future. Companies like LexisNexis[1] have a strong incentive to make this happen.

1. https://www.lexisnexis.com/en-us/home.page

ChatGPT WILL put thousands of attorneys or paralegals out of work, who don't bother verifying its output, because they deserve to be fired and disbarred for that. Good riddance. Thanks, ChatGPT.

„Using ChatGPT is like talking to a journalist who has interviewed an expert. Knoll's law of media accuracy: "everything you read in the newspapers is absolutely true, except for the rare story of which you happen to have firsthand knowledge".“


Doesn't seem like they've asked the lawyers "what other legal filings did you use ChatGPT for?", which seems kind of important. ;)

As part of their defense, they said that the lawyer used ChatGPT for the first time. Which isn't necessarily true, but it means that he would have repeated the same answer if asked directly whether he used it in other cases or not.

Can someone explain the potential repercussion from this kind of foolishness? Isn't non-accredited legal counseling illegal? Do these things apply also to Europe?

> Can someone explain the potential repercussion from this kind of foolishness?

Lawyer sues openAI for megabucks because they didn’t have a warning about using ChatGPT as a paralegal and they got sanctioned by the court.

Thirty years of practicing law and they (plural, two lawyers involved) “accidentally” submitted the output from a multi-billion dollar company’s toy without doing any checks?

It's right there by the input box:

> ChatGPT may produce inaccurate information about people, places, or facts.

That really should be sufficient as "not fit for legal work without full review".

But wasn't it openAI bragging that ChatGPT4 can pass the bar exam?

Passing a test and doing the job are different things. Do you think a 5yo who passed MS certification could do the job? https://www.electronicproducts.com/five-year-old-becomes-the...

Could probably fall under the guise of false advertising. AI companies are commonly guilty of that. But I haven't seen an AI company go after legal professionals yet...

Obviously not, but there are a lot of people who don't get that.

Look how many people thought Tesla had full self driving when it says in the user agreement that they don't.

This isn't hidden in a user agreement, it's on the front page.

Yes on the front page /s

" Full Self-Driving Capability

Build upon Enhanced Autopilot and order Full Self-Driving Capability on your Tesla. This doubles the number of active cameras from four to eight, enabling full self-driving in almost all circumstances, at what we believe will be a probability of safety at least twice as good as the average human driver. The system is designed to be able to conduct short and long distance trips with no action required by the person in the driver’s seat. For Superchargers that have automatic charge connection enabled, you will not even need to plug in your vehicle.

All you will need to do is get in and tell your car where to go. If you don’t say anything, the car will look at your calendar and take you there as the assumed destination or just home if nothing is on the calendar. Your Tesla will figure out the optimal route, navigate urban streets (even without lane markings), manage complex intersections with traffic lights, stop signs and roundabouts, and handle densely packed freeways with cars moving at high speed. When you arrive at your destination, simply step out at the entrance and your car will enter park seek mode, automatically search for a spot and park itself. A tap on your phone summons it back to you.

Please note that Self-Driving functionality is dependent upon extensive software validation and regulatory approval, which may vary widely by jurisdiction. It is not possible to know exactly when each element of the functionality described above will be available, as this is highly dependent on local regulatory approval. Please note also that using a self-driving Tesla for car sharing and ride hailing for friends and family is fine, but doing so for revenue purposes will only be permissible on the Tesla Network, details of which will be released next year. "


I mean the ChatGPT disclaimer

Not sure ... do bar exams include physically looking up citations?

I sincerely doubt it’s ChatGPT that could pass the bar exam

Non-accredited legal counsel varies from place to place. In the UK there are "Mackenzie friends" who can't speak but can advise in court. As for filings, paralegals don't tend to need qualifications either, though most solicitors they work under will require it.

In short, for the UK at least, no, nobody needs qualifications to do that kind of thing. Though it is clearly a good idea.

Thanks for the clarification. But regarding this case more specifically: it seems that the judge in question is interpreting this as a deliberate attempt to deceive a court; hence, the question about repercussions.

Good question. In the UK (and probably most common law systems) I guess it'd be treated as any vexatious application would be. As far as the judge is concerned there's no legal ground for distinguishing between a regular bit of code and an LLM. It'll be the human who is prosecuted, if that's what you're wondering.

Depends on the judge and his mood - given that it was filed in opposition to a motion to dismiss, he could just dismiss the case.

I’m America the attorney can be disbarred or having their license suspended for filing false information.

In any important domain where accountability and factuality matters LLM's will not get the human out of the loop.

They (hopefully) will simply make their job easier but there are hidden risks: identifying well written "hallucinations" is not trivial.

Tip for the AI brigade: Assign a confidence level to segments of generated text.

> Assign a confidence level to segments of generated text.

Text generators are not reasoning machines. To assign a confidence level, you need a tree of facts and deductions. This is what you can get from an expert system.

With an expert system, you can in principle ask the system to explain its reasoning, and assign confidence to each step. ML systems are black boxes.

Expert systems are hard to train; you need a human expert, and probably a specialist in the expert system sitting beside him. And it takes a lot of work - so it's expensive.

I can imagine a hybrid expert system/machine-learning system, where the ML system is used to distil the expertise that is then used by the expert system to construct decision trees that are not opaque.

I wish a fraction of the money that gets thrown at LLM "toys" were instead invested in expert systems. I think expert systems research became unfashionable around the end of the 1980s, which is a crying shame.

I am not at all pessimistic about expert systems. These things go in cycles and piggybag on developments. NN's were niche for a long time. Speculative hypes act as a sort of Zitterbevegung: moving fast yet going nowhere.

But my suggestion is actually less ambitious: not to create confidence with reference to outside knowledge but purely on the basis of the training sample and its intrinsic statistics.

[1] https://en.wikipedia.org/wiki/Zitterbewegung

Basically LLM has no way of knowing that it can't as freely arrange tokens to make case references as it does for other text. Maybe it can be fixed by more parameters or something, not done yet.

As things are now, anything that needs to be factual or reasoning seems like a bad match for LLMs. Where I find them very useful is at rewording things, or at continuing something repetitive enough along a given pattern (eg in programming).

>Tip for the AI brigade: Assign a confidence level to segments of generated text.

you say this as if it’s a trivial feature that AI developers have just not bothered to leave in

I know its not trivial. You don't change the world by doing trivial things. Its a placeholder for features that will enable more controlled used of these tools.

The important aspect is to be able to guide the "reviewer" where to focus on. It could still be a flawed suggestion but its better than nothing.

you also don’t change the world with off-handed comments at the end of HN posts

tip for the crypto brigade: make your coins solid investments

From the GPT-4 paper (pp20):

2.13 Overreliance

...despite GPT-4’s capabilities, it maintains a tendency to make up facts, to double-down on incorrect information, and to perform tasks incorrectly. Further, it often exhibits these tendencies in ways that are more convincing and believable than earlier GPT models (e.g., due to authoritative tone or to being presented in the context of highly detailed information that is accurate), increasing the risk of overreliance.

Overreliance occurs when users excessively trust and depend on the model, potentially leading to unnoticed mistakes and inadequate oversight. This can happen in various ways: users may not be vigilant for errors due to trust in the model; they may fail to provide appropriate oversight based on the use case and context; or they may utilize the model in domains where they lack expertise, making it difficult to identify mistakes. As users become more comfortable with the system, dependency on the model may hinder the development of new skills or even lead to the loss of important skills. Overreliance is a failure mode that likely increases with model capability and reach. As mistakes become harder for the average human user to detect and general trust in the model grows, users are less likely to challenge or verify the model’s responses.

Source: https://arxiv.org/abs/2303.08774

My annotated version: https://lifearchitect.ai/report-card/

Common law is guided by precedent. Apparently, a US attorney is expected to “Shepardize” the cases he cites, which means checking how often they have been used successfully and whether they have been affirmed or overturned (the name comes from the author of an early database). This should put citing a case that never existed beyond the pale, no matter whether you realize ChatGPT is not a search engine.

This tallies with my experience marking student coursework. ChatGPT tends to fabricate references, it's a good signal for cheating.

That's a good point. Should apply also to publishing: a fabricated reference should be enough for a retraction.

My partner and I have also explored chatGPT for recommending cases and books. It produces more garbage than not. The dream of it being a handy librarian died pretty quickly.

Uncanny and compelling as it is conversationally, it really isn't that good.

How you use it matters a LOT and also which version you used. There's a reason that GPT-4 is up to 45 times more expensive than GPT-3.5 Turbo (the default). If you're serious about such a tool (It's not a bad idea), then you'd use the embeddings[0] to train the AI on your specific use cases or prompt-engineer GPT-4 properly.

[0] https://platform.openai.com/docs/guides/embeddings/what-are-...

I see we're reaching the 'you're holding it wrong' stage with LLMs.

It is an issue with the tech, and it's probably not an easy one to fix. Performance depends a lot on how its used, but even when performance is poor the output is often convincing.

The first few times I tried chat gpt I tried asking it goofy questions to try to break it. Then I asked it questions across various fields I have information on. It answered each question incorrectly in a way that bothered me. The. I tried asking it about a field I didn't know about, and googled for the answer. Wrong again. I literally never had it give me an answer that mattered.

I have a colleague who is gungho chat bot coding. The guy is smart, but ever since he started using it, he's adopted some weird ideas about what code should look like which are based on easily googleable chatgpt lies and he refuses to hear anyone else out. It's kinda sad how confidently wrong chatbots can infect people and stunt their growth...

Exactly my point. If the system gives you results that are worse than not using it, and the solution is to bolt on more stuff to the system, maybe it's time to take a step back and think long and hard if having it in production is a good idea.

I don't see anything wrong with pointing it out. If I find a previous legal case with a heading that vaguely sounds like something that could be applicable to me and try to use it as an argument, you'd likely say the same: previous cases could be useful, but you need to know how to use that record properly.

Are you comparing legal proceedings to a chatbot?

I'm comparing the usage of two resources which can be helpful or completely misleading, depending on how familiar you are with their purpose/limitations and your experience with them.

It's more like "the limitations you point out are fixed by paying more money", which is the OpenAI way.

By analogy:

"We can't fly into space, the propellers will have nothing to push against"

"We have rockets for that, they're more expensive but don't have that limitation."

"I see we're reaching the 'you're holding it wrong' stage with propulsion."

I think their point still stands. We have people and bots shilling chatgpt on a minutely cadence all over the internet right now talking about how "it is the same thing as a person" "look at how it is always correct in these complicated cases" then whenever something doesn't work there's mass gas lighting..."you didn't use the latest version!" "You probably engineered a fake prompt so it would fail!" "You just don't know how to use it"

To people who understand the technology your reply is fair. But it doesn't account for the millions of people who don't and are spewing manic sentiment about it's capabilities on the regular.

A lot of companies adopt this strategy for broken technology because it fosters a cult like group of zealots for a short period of time while they go off and try to fix everything. In my experience it usually doesn't last long.

Thanks! The way you steeled their point was much more convincing than what I had replied to, and made it clear to me what I had overlooked.

Failing to understand that you can't reduce everything to a single vector and claim "this is the worse it's ever gonna be... it's only gonna get better... when it this reaches 10x or 100x what it is now you'll see what it's really capable of" is a perfect analogy in fact for the hype around LLMs.

EDIT: found another way in which you're illustrating my point - yeah propellers and LLMs are great for flying under a hard ceiling. We need something entirely different to reach orbit. You can't just tell people to bolt on rockets to propeller driven aircraft, you need to invent something entirely different, like spacecraft.

Making bigger models and training them to specific tasks != Inventing a class of sequence predictors with a new paradigm.

The problem can be mitigated by using bigger models or training embeddings, but it cannot be eliminated.

At the end of the day, it remains a sequence completion engine. An LLM can only care about the statistical likelyhood of sequences, that the entire universe it exists in.

And if a sequence making a wrong statement happens to be more likely than, or close enough for the set "heat" of the algorithm to, a sequence denoting a correct statement, then there is a chance the model will hallucinate.

If it were otherwise, we would already have a good solution for prompt injection attacks, and we don't.

Large language models do a great job of emulating meaning. That's their biggest strength and greatest weakness. We got a finally good chatbot, at the cost of making people project so much more then a chatbot on it. LLMs can't mean anything. They can only produce surrogate meaning.

When you compress vast amounts of text into a model and query the model you can only get back information that is already in the training data set. It's inherently backwards looking, it cannot produce lowered entropy forward in time. You can't get more knowledge out of it, only less then you inputted in the first place.

Also, it's important to state that, the BEST case scenario is that you get the knowledge out that went in. The common scenario would be that you have loss. It's also important to remember this thing was trained on Reddit, etc. We all know how much complete trash, bot posts, hot takes, and conflicting views there are on there.

Good point. It wasn't even trained on high grade knowledge, just on very noise bodies of language.

> then you'd use ...

Sounds like a bunch of effort. :/

Why is why we're seeing VC funded startups starting to produce specialised tooling for various use cases so people don't have to do it themselves.

ChatGPT is fun, and for some of what I do it's usable as-is, but for most people it'll be a tech demo for something that will be built in and hidden behind layers of stuff to make it suitable for their specific use-case. At least until the base models get a lot more capable.

this is something I don’t understand. if you want GPT3/4 for your specific use case, then use the API?

half the complaints people have about chatGPT are because they’ve never bothered to try the—much more powerful—API

Oh. It's probably because that's not an obviously better progression.

Something along the lines of "why bother looking at their API, when the existing thing doesn't work properly?".

You're saying it's better via the API, but the standard (non-api) approach would have to be good enough first to draw people in more I guess.

A lot of people want a polished product, and go in with expectations it will be something it isn't.

I think part of the problem is people are used to equate eloquence with intelligence and knowledge, and when that assumption doesn't hold, they assume it's all smoke and mirrors rather than explore its limits.

Looks like this lawyer didn’t read ChatGPT’s disclaimer.

More seriously though, it works for general “filling” like “thank the team for reaching milestone”, but you really need to insert your real examples (which the lawyer didn’t). Wouldn’t trust it to make a full argument, like in this case.

> it works for general “filling” like “thank the team for reaching milestone”,

IMHO, it doesn't work for that, because the manager will come across as insincere for not even saying a few honest words of thanks.

That's hardly necessary in corporate America. They only need to look like a good team leader to the higher ups.

I'd like to think that morale problems, communication ineffectiveness, etc. will tend to surface within a few quarters in way visible to upper management.

That doesn't prevent installing a poor leader, but eventually that particular one can be corrected, through coaching or freak elevator accident.

And maybe someday the business schools will teach people that being genuinely trustworthy, in all org chart directions, is a best practice in modern, effective organizations. More of the right people for that will be drawn to leadership. And more of the people who are predisposed to make organizations dysfunctional and miserable will either begin the process of improving, or be herded to roles in which they can do less damage.

OpenAI is collecting billions in investments and valuation based on hype that they are doing little to combat.

The lawyer should lose his license for six months. But OpenAI should also face penalties for creating an Attractive Nuisance.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact