> Why, for example, can ChatGPT simply not recognize when it’s being asked to generate factual citations and caution the user to “check my sources”? Why can’t it respond to someone asking “is this text AI-generated?” with a clear “I’m sorry, I’m not capable of making that judgment”?
This is another big frustration for me, as someone who has been in the field for a long time (over a decade at this point). These are complex systems that are just reaching the point of usability, and yet people are going "well why don't you just simply do XYZ?"
If it were simple, it wouldn't have taken so long for these systems to emerge. We already barely get them to do what they're doing, and they still suck at that, so "just make it not do that" is so frustrating. The only thing worse is when people ascribe malicious motives like openai is nefariously deciding that it should inject those mistakes.
Maybe this is the "I could build that in a weekend" of ML. Sorry for the rant, it just makes me want to vent.
It’s a fundamental misunderstanding of what these systems are doing and how dumb they are.
People see a computer doing what they and other people around them do: natural language going in and out. With other people, we have a theory of mind and broad understanding of how minds work. So it’s understandable that people see ChatGPT and think, “it’s cool that a computer is doing what I’m doing.”
The marketing of these things as Artificial Intelligence is part of the confusion. LLMs are not thinking. They are not saying, “okay now the next thing I’ll do is add a citation. Since I’m doing that, remember to add a warning at the top before showing to the user.” That’s what this writer seems to expect.
The two paths to fixing this: make the tech better or educate people so they have a correct(er) understanding of LLMs.
I don’t think we’re going to get a meaningful subset of humans to take neural networks 101.
While I imagine I have a grip on what LLMs are, there's something ... emergent? ... in some behaviors that approach behavior akin to responses from someone or something with a "mental model". That can be confusing to encounter.
For instance, asked it to review code interacting with an IoT device API.
It unexpectedly warned me a person relying on an app incorporating this library of code should remain careful of a certain health problem since the source for what the IoT device periodically senses can fluctuate between readings, and spikes in that could cause that health problem.
I'm still pondering the chain of things that would have to be "thought through" to have that feedback about the code, when — and this is the part that runs against what I thought I knew — this connection, and the implications of only periodic sensing, may be evident to people who stop to think, but most wouldn't think about it, and it's not been written about in any way Google can find.
This isn't theory of mind, but it's indistinguishable from the generator relying on something like a mental model.
It's not that it's too easy, but rather AI models are not people. What do I mean by this? There is (for the purposes of this discussion) just one ChatGPT-4 model. This single model has to be all things to all people. It's not Ted. Or Jessica or Dante. You don't expect Jessica to have a reasonable answer to every question. You accept the fact that if you ask Jessica about public schools, she will go off the rails. It's not indictment on all humans -- it's just Jessica.
At some point we might have access to a variety of AI models where some will simply say, "I really only know physics pretty well. I prefer not to give my opinon on other topics". And others that will lie about most everything. But we will learn which are useful for what.
I would put it that most people who think they are applying the Turing test, actually aren't. Part of correctly applying the Turing test is independently checking the statements the test subject makes against your own knowledge. Part of the "thinking" that the test is testing for is checking beliefs against reality, and the Turing test needs to test for that capability, otherwise it's not being done right.
Also the test is adversarial, it’s not about throwing softballs to a child. The interrogator should have a good understanding of what constitutes intelligence, the kinds of mental faculties humans have, and ask questions that thoroughly test those faculties.
Yes, people making suggestions for changes in behavior saying it should be "simple" have a poor understanding of what they're talking about. But given the weaknesses of the models, perhaps OpenAI should not have released them for such flexible use for a broad audience.
For AI/ML applications being used for anything important, I think it's entirely appropriate to try to both limit use to the areas in which the model can be expected to succeed (including using for data from similar distributions to those used for training), and to educate users about what issues cannot be mitigated.
If OpenAI had released a "fiction writing assistant" product, a "fictional character fun chat" product, etc it would likely be misused less.
> But given the weaknesses of the models, perhaps OpenAI should not have released them for such flexible use for a broad audience.
I think the advantages greatly outweigh negatives - not only is ChatGPT an extremely useful tool, its also really helped to bring the seriousness of AI and its effects on society into the current discourse.
> If OpenAI had released a "fiction writing assistant" product, a "fictional character fun chat" product, etc it would likely be misused less.
I really feel that the misuses of this tend to be cases where I would say the issue lies with the user. At some point we have to take some level of personal responsibility for the tools and services we use. Consider the case of the lawyer who included a bunch of stuff from ChatGPT in a federal court filing - I would place the blame on OpenAI squarely at 0% on that one. This is a trained professional in a highly regulated industry who decided to outsource his job to a tool about which he clearly hadn't done even a little bit research. The issue isn't that ChatGPT hallucinated, it's that he exercised judgement that is obviously incredibly poor. I suspect if you went back through his work, you'd find other places where he cut corners and had lapses of ethics and/or responsibility. He's the problem, not ChatGPT.
Same thing with the professor who accused all of his students of plagiarism because he put their text into ChatGPT and asked if it had written it. On the one hand, he's trying to keep up with technology that's moving quickly and I have empathy for that. On the other hand, he's a college professor, so he should absolutely be able to do the level of research to find the many, many news articles on this topic that would have made clear that what he's doing is stupid. Making incredibly serious accusations towards students based solely on a tool you clearly haven't even tried to understand is a tremendous breach of ethics, which is the real problem.
Absolutely the user has some responsibility. However, chatbots seem to have a history of being attributed greater competence than they have, including by people who know at least partially how they work. If a google engineer whose actual job is to work with an identify issues in the outputs of a similar model can be fooled into believing that it's _sentient_, I think it's appropriate for OpenAI to expect that when they open their product to a large audience that it's going to be trusted to a risky degree.
In addition to humans' predisposition to see a knowing/thinking agent when they interact with language, there are now plenty of widely reported outcomes where e.g ChatGPT passes the bar exam, ChatGPT's answers to medical questions are generally judged better than those of doctors (on a web forum), etc. Any person googling around about ChatGPT will find plenty of content about its amazing abilities including the "sparks of AGI" microsoft paper.
OpenAI's copy says that you can "get instant answers" or "learn something new". It does not say "get responses which may or may not be answers and which may or may not be correct" and "learn something which may or may not be true".
Does OpenAI bear 100% responsibility for the model saying it likely generated student-provided text? Probably not. Does it bear more than 0% responsibility? I think so.
>At some point we have to take some level of personal responsibility for the tools and services we use.
This is a very American statement. In most places in the world it's just not going to go over at all. Since OAI is at least trying to play on the global field at this time, it matters.
It's an American statement, but it's also a professional statement. A lawyer is responsible for what they submit to the court in every legal system I'm aware of. An English judge isn't going to be forgiving of a lawyer who references cases that don't exist because that person didn't understand how ChatGPT works. The examples I cited aren't of average people believing dumb stuff that ChatGPT tells them (because I think we've seen very little harm caused by that) - they're cases in which someone used a tool they didn't understand in their work. It's fairly universal that when you're hired to do a job, you're expected to do so competently, and using tools you don't understand for your job is not doing it competently.
It just exposes the fact that these LLM/generative AI models don't actually know context.
ChatGPT doesn't "know" that it's "generat[ing] factual citations" so it doesn't "know" to caution the user. It just knows that when the user prompts like "X", it should respond like "Y".
The more information that comes out about how these things respond, the more it seems like ChatGPT shares a lot of traits with Clever Hans.
To save you the click: the user asks for what's trending on pinterest, and ChatGPT (in my mockup) shows a little inline warning note with a warning icon that says: "My data is only accurate up to September 2021 and I am unable to access or search the internet for more recent information."
I need to re-make that mockup because I don't like the way it says "My data ..." - I think it should say "ChatGPT's data ..." instead.
> the user asks for what's trending on pinterest, and ChatGPT (in my mockup) shows a little inline warning note with a warning icon that says: "My data is only accurate up to September 2021 and I am unable to access or search the internet for more recent information."
But that doesn't address the actual root cause of why ChatGPT is not reliable: that ChatGPT does not actually check anything. If you ask ChatGPT for what's trending on pinterest, not only does it not go check pinterest for what's trending now, it doesn't even go check its training data for pinterest posts to see what was trending as of the latest date in its training data. It just confabulates a response based on the word frequencies in its training data. It literally has no concept of "checking what it says", not even against its own data.
But what you're suggesting doesn't look like a "band-aid" to me. It looks like more of the same misleading information that ChatGPT's creators have been giving. Your UI message implies that ChatGPT is checking actual pinterest posts for the time period that's included in the training data; but it isn't.
My suggested warning text would be the same for every interaction with ChatGPT: "The text this program returns has no reliable relationship to reality."
Addressing the model limitations a bit: in the demonstration data that is provided to the base model, we should prevent computed or "looked up" answers.
I've seen some of the demonstration data that people are using to train instruction-tuned models and are being taught to respond by making up answers to solutions it shouldn't try to compute. Btw, the output is wrong.
{ "instruction": "What would be the output of the following JavaScript snippet?", "input": "let area = 6 * 5;\nlet radius = area / 3.14;", "output": "The output of the JavaScript snippet is the radius, which is 1.91." }, [1]
The UI note for now would get us very far but by filtering out demonstrations that retrieve or compute information we can go even farther.
Symbol tuning [2] is addressing the quality of demonstrations but we can take it further by removing retrievals and computations altogether.
Bonus: we can demonstrate how to make it respond so that the user/agent be informed of how to compute or retrieve.
It is hard and clearly the blanket disclaimer isn't working well enough!
I think what would work best is specific annotations tagged onto assertions of facts that might not be correct. When ChatGPT spits out what looks like a journal article citation, put a note on it that says "you should verify this article exists".
Maybe that's the hard part though, detecting in the software when ChatGPT might be making something up.
> detecting in the software when ChatGPT might be making something up
Grammatically speaking, is this statement an assertion of fact? If so, annotate it. It's probably not that hard to implement this (we've had grammar checkers for ages).
Of course then if you ask ChatGPT about anything, basically every single sentence that says something useful is annotated with your "Hallucination warning" and either (1) people dismiss your AI as useless and your business is in peril because you have trouble finding customers / investors, or (2) they start ignoring it, because the alternative is doing your own research, and the reason they're using your AI in the first place is as a shortcut to avoid that.
OpenAI's entire business model depends on users thinking there's nothing that ChatGPT can't do, not the opposite. If anything they'd like to elicit fear about AGI rather than show uncertainty. This would be like asking Meta to make clear when an image has been doctored, or asking Google to make their ads look less like real search results.
Yes, they benefit from people not knowing what ChatGPT can do and being curious enough to try it out. No, they don’t benefit from giving users bad answers, and it’s already the case that ChatGPT will sometimes refuse with an error message or warn when it can’t do something.
There’s no reason in principle they can’t do more RLHF training to report errors more often.
> There’s no reason in principle they can’t do more RLHF training to report errors more often.
I think it's just time and cost -- as I think they're doing a lot of this already. And the fact that the input/output schema is pretty much any text that fits within the context of a window size. So you can train on this set of prompts, but there can be a whole new class of prompts that you didn't think of before that are now problematic.
It's like the self-driving car on a closed track, versus driving in real life when there's a huge pothole blocking your freeway exit while in a construction zone and a school bus for some pulls over to drop off kids to help clean liter.
Perhaps the problem is broader and more fundamental, in that too many people have been conditioned to uncritically believe whatever they read in books, newspapers, and internet articles from 'reputable sources', and so they have a tendency to ignore the OpenAI warning that outputs may be inaccurate, untruthful, and otherwise misleading at times.?
Imagine if every cable news show and newspaper story led with "the information presented here may be biased, inaccurate and/or incomplete, because we need to align with the interests of our advertisers, owners and government regulators".
The solution would be teaching broadly applicable critical analysis skills at a fairly young age, but this would tend to undermine 'faith in institutions and leaders', which is doubleplus ungood.
This is very true, kind of reminds me of how in school they teach you how you to properly cite your sources and how fundamental citations are to our entire scientific advancements, but then you read or watch the news and realize no journalist ever cites their sources.
We were always just supposed to believe what the evening news or the new york times are saying, so the only answer the establishment has to fake news in social media is to say don't trust those specific social media accounts but our accounts you can trust. Instead of a more helpful education on media literacy in general.
>Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.
>In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.
>That is the Gell-Mann Amnesia effect. I'd point out it does not operate in other arenas of life. In ordinary life, if somebody consistently exaggerates or lies to you, you soon discount everything they say. In court, there is the legal doctrine of falsus in uno, falsus in omnibus, which means untruthful in one part, untruthful in all. But when it comes to the media, we believe against evidence that it is probably worth our time to read other parts of the paper. When, in fact, it almost certainly isn't. The only possible explanation for our behavior is amnesia.
No, this is just bad UI. Most software is better at reporting errors or warnings when it can’t do something. You should be able to figure out what the software can do by trying it out.
> Imagine if every cable news show and newspaper story led with "the information presented here may be biased, inaccurate and/or incomplete, because we need to align with the interests of our advertisers, owners and government regulators".
Don’t threaten me with a good time. Just the thought of this makes my pants tight.
I would find it really annoying to have it constantly calling out its own limitations. That's like having a conversation with someone lacking so much self esteem that they can't get a sentence across without qualifiers and apologies. What's wrong with just taking it as it is and entrusting the user with the role of determining what is real? It feels like it is hip to complain about this stuff, but I hope that all this whining doesn't ruin it for those of us who just want a useful tool to interact with. I'd rather get rid of all the guard rails and just have the raw power of the model to go on with my day and be more effective.
The UI issue is that the chat itself has no error messages or even warnings for many things it can’t do. Imagine a compiler that worked that way. (Well, they are sometimes like that, such as undefined behavior in C, but we know why that’s a problem, right?) A blanket disclaimer isn’t a substitute for more specific error messages.
The reason there often aren’t good warnings or errors is that the LLM itself doesn’t know its own capabilities. Warnings need to be trained in, and there are lots of gaps.
Whenever you ask it why it wrote something, it should warn that it doesn’t know [1], but it can make a guess. Whenever you ask it for citations, it should warn that it’s guessing and they might not exist. And so on.
I think some of it is mitigated relatively straightforwardly in UX by having a relatively dumb, discrete model that appends hardcoded warnings.
That way the ultra clever, complex natural language model can go wild responding to your request for a scientific paper about chicken chicken chicken written in pirate-speak, whilst dumb old model that simply scans for indications the interaction has a citation in it appends a hardcoded "it looks like this response contains citations. Please be aware these citations may not refer to real documents" disclaimer without being perturbed by the lack of intersection between citation warnings and pirate speak.
And if that's subject to the "Scunthorpe problem" where it flags up loads of false positives who cares. It's a separate model that doesn't impact upon the quality of the actual GPT, it's not as annoying a fail as GPT bias towards refusing answers, and even when its labels are wrong it's aligned with the goal of training the human not to trust computer generated responses so much
> The UI issue is that the chat itself has no error messages or even warnings for many things it can’t do. Imagine a compiler that worked that way.
Compilers do work that way. Bad code generation is a real thing and can have really bad real world consequences. Likewise, most compilers can't tell you why it generated the code that it did -- but you can put the compiler under a debugger and figure it out usually fairly easily (although not always so easily).
Not on topic, exactly, but my current hill to die on is awareness that despite its current promotion and popularity, ChatGPT is just one product in a vast landscape of AGI-peripheral technologies, does not represent all LLMs, and doesn't make the OpenAI company the arbiter of all things LLM or AGI.
The Senate subcommittee said they wanted to "avoid the mistakes of social media", and they can start by not prematurely appointing someone the official face and spokesperson of regulation of an incomplete product and vision.
I was pretty surprised (ok, annoyed) by a conversation with my partner where she referred to what I've been doing for the past year as "ChatGPT" and that she didn't realize that it wasn't synonymous with "DL/ML/AI/AGI".
I usually refrain from posting ChatGPT answers on HackerNews or other social media just because it's so low effort and boring, but I found the below humorous and on-topic, particularly how strongly caveated the response is:
> Q: if James Joyce was a Power Ranger which one would he be?
> A: Assigning a Power Ranger identity to a historical figure like James Joyce is purely speculative and subjective. However, considering the inventive and imaginative nature of Joyce's writing, his penchant for pushing artistic boundaries, and his ability to delve into the depths of human consciousness, one could argue that he might align with the Blue Ranger.
> The Blue Ranger is often characterized as intelligent, analytical, and introspective. They possess a keen intellect and a thirst for knowledge, traits that resonate with Joyce's intellectual pursuits and his exploration of complex themes in his works. Additionally, the Blue Ranger is known for their creative problem-solving and ability to think outside the box, qualities that parallel Joyce's innovative approach to writing.
> It's important to note that this analogy is purely for fun and shouldn't be taken as a definitive representation of James Joyce or his literary contributions.
It is not a fluke that ChatGPT becomes popular in a time when alternative facts wins elections. A time when prominent business people look down at education and talks about smart people being dangerous. A time when you have to sit down with your dad for an hour to explain that the post he read on Facebook is not really a news article.
I am not sure it is up to OpenAI to solve all these problems. I don't think OpenAI are doing worse than anybody else.
What I personally dislike about ChatGPT right now is that it seems to have more and more difficulty to actually stay on the context of the chat. It has become a QnA.
Secondly. Hidden ads. It even advocates for Azure bs that has even been removed from Azure. Even on data ChatGPT was trained on. Is that just by chance?
>It is not a fluke that ChatGPT becomes popular in a time when alternative facts wins elections. A time when prominent business people look down at education and talks about smart people being dangerous. A time when you have to sit down with your dad for an hour to explain that the post he read on Facebook is not really a news article.
I think you may be reversing cause and effect by your tone. The following statement is not new at all.
“There is a cult of ignorance in the United States, and there has always been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that 'my ignorance is just as good as your knowledge.”
― Isaac Asimov
At least in the past mass information was protected by gatekeepers. That printing presses and paper was expensive was a gate. That radio towers were expensive and channels few was a gate. That television towers were even more expensive and broadcasting was an expensive endeavor was an even larger gate. These gatekeepers where generally significantly invested in the continued distribution of their message and had some limits of restraint.
In those same times there were conmen with no restraint whatsoever, but it was much less likely they would have a medium in which to spread widely so it limited their audience.
With cable TV and internet we have developed an unlimited medium to a 24/7 hour audience anywhere in seconds. Truly monumental amounts of garbage and trash can be generated and fed to a hungry audience. They get their own little sheltered realities of junk food information in which they never have to leave.
Question from someone with a surface-level knowledge of how LLMs work:
As I understand it, LLMs work, at a very high level, by repeatedly choosing the token with the highest probability. Could potential confabulations be flagged by marking low-confidence sequences of text in the output? This seems like it's be particularly good at detecting when the model is making up a name or number, for example.
As far as I understand there's a parameter to make the output more "novel" (different/interesting/less predictable or generic) and that works by specifically not using the highest probability.
I suspect the probability of a statement being false (a confabulation) is unrelated to the probabilities of the words and grammar making up the statement.
That's a pretty good idea, actually. It'll be a bitch and a half to tune the threshold so it's reliable, and doesn't mark normally generated text though.
The verge is a GPT 3.0 level bot responding to the prompt "please generate a plausible sounding but fake problem with technology <x> that is alarmist and will wind people up who don't understand the technology well."
This is reporting on an instance where a lawyer used ChatGPT to generate a legal brief (complete with made-up citations) that was actually submitted to the court, and now the opposing counsel, the judge, and the lawyer's client all have to deal with the repercussions.
It is not a "fake problem"; it is a problem that has cost at least thousands of dollars in damage in one instance that we know about.
Would you feel the same way if the lawyer used a Wikipedia page that had made-up citations in his legal brief? What regulations do we need Wikipedia to stop this from happening?
Average people understand the limitations of Wikipedia (which has been around for 20+ years).
LLMs are new, and people don't understand their limitations yet.
Plus, we're trained from birth to interpret plausible-sounding text as likely correct and true. LLMs generate plausible-sounding text without regard for truth. So now we can either fix the LLM models or become much more critical readers/listeners. Unfortunately I don't see either happening anytime soon.
This has happened with Wikipedia, and then people talked about it on the news, in social media, and in classrooms for the last decade to the point most people typically say "remember to verify your wikipedia information".
The only problem there is that an unqualified lawyer was permitted to practice. However, it is understood that section of qualified people is imprecise and as such there is also clear path to revoking the right to practice. For all practical purposes, this is an already solved problem; one solved long ago.
Lawyers can and do make up bullshit briefs, but they're not allowed to do so, there is a requirement for them to ensure that they are valid - there is a clearly defined party who is responsible for any problems caused to the court, and that is the particular lawyer.
There are two examples I like to give for things like this.
First, a friend of mine with a literal PhD literally in quantum mechanics literally from Cambridge, who thought a form on a game website offering the possibility to sign up for a free user account to record high-scores and achievements was asking for money.
Second, a time I visited a samurai sword shop in Brighton and didn't notice the signs saying "do not touch" until almost the last second when one of the people I was with stopped my hand; and only then did I see the signs that were literally every meter both vertically and horizontally across the entire wall.
My wife and I are performing magicians and there are several psychological factors at play here, that magicians exploit routinely.
The first has to do with your brain's talent for recognizing patterns. The world around us is full of information overload, and the brain is exceptionally good at pattern recognition. If it weren't, we would have to scrutinize and analyze every piece of information presented to us as if it were novel. We couldn't function under such situations.
The second is called "motion blindness" and it is a physical phenomenon.The idea is that when your eyes are moving you can't perceive that they are moving. You can test this out by looking at yourself in the mirror and then focus on one eye. Then shift your focus to the other. You can feel your eye moving but you can't see it move. It is thought that the brain filters out this motion because otherwise we would feel motion sickness all the time. Add to the fact that objects in motion tend to grab our attention, magicians exploit this by doing secret moves under the cover of something else moving. During that short period of time that your eye is travelling to observe the motion you are LITERALLY BLIND.
So when you're in the Samurai Sword shop in Brighton, what are you focusing on? Obviously the swords. And where are the warning signs located? I'm assuming IN BETWEEN THE SWORDS. They are both background information AND in the exact location where you will be experiencing motion blindness.
In the case of the forms that "fooled" your friend into thinking it was asking for money. They look like every other example of something asking for money. Pattern recognition.
In the case of ChatGPT's "warning." ... it's fine print that a) looks like a standard copyright notice that we all train ourselves to turn into background information and b) it is right under the text input that contains a flashing cursor and the text that we are typing out ourselves. It is superfluous and in the most psychologically invisible location possible.
> During that short period of time that your eye is travelling to observe the motion you are LITERALLY BLIND.
Not relevant but a fun addition - you then make up what you saw as though there wasn't just a big blank gap in your vision. When you look at an analogue clock sometimes it seems like the second hand is still for far too long, that's because your brain (or eyes or somewhere on the journey between I'm not sure) is filling in the gap incorrectly.
For magic, not only will you be blind for part of this then you will experience it like you were aware of everything.
> During that short period of time that your eye is travelling to observe the motion you are LITERALLY BLIND.
There was a video I saw recently that exploited this, and it was WILD. I can't find it right now, but basically, they showed how blind you were while your eyes darted around. They had people look at a picture on a computer screen while some device monitored their eye movement. Whenever their eyes moved, an object in the picture would vanish or change. Nobody was ever able to detect an object vanishing or changing. It wasn't until they would be like "Wait a minute, wasn't there a person there?" or they were shown the original picture with all the objects in place that they would notice the picture was changing at all.
Pretty much our brains are liars shoveling a bullshit reality at us every moment of the day. It's amazing we've got as far as we have, and it's amazing that many of these things are pretty recent discoveries in human history.
> news broke that US lawyer Steven A. Schwartz had turned to the chatbot to find supporting cases in a lawsuit he was pursuing against Colombian airline Avianca. The problem, of course, was that none of the cases ChatGPT suggested exist.
Yeah, a lawyer going to court with "A chatbot told me..." instead of using actual legal tools at their disposal to cite those cases doesn't make as good of an argument as the author seems to think it does.
Realizing AIs hallucinate and are not authoritative is firmly in the territory of caveat emptor, we've had an adage about this for decades now:
I think this mischaracterizes the whole issue. There are already Open Source LLMs that can be run locally, You can use OpenAIs API which gives you access to multiple different types of models that can be trained for your use case. People keep creating Jailbreaks to get around restrictions put in place by the developers of various models etc. The only thing that is holding back AI is the resources required. Most people can't to afford train a LLM by themselves from scratch, so it's not a viable avenue for widespread ai use. Also you can use GPT 4 if you subscribe to ChatGPT plus or pay to use it's API
"I don’t think cases like these invalidate the potential of ChatGPT and other chatbots. In the right scenario and with the right safeguards, it’s clear these tools can be fantastically useful."
It's not calling for people to lose access to these tools.
This is another big frustration for me, as someone who has been in the field for a long time (over a decade at this point). These are complex systems that are just reaching the point of usability, and yet people are going "well why don't you just simply do XYZ?"
If it were simple, it wouldn't have taken so long for these systems to emerge. We already barely get them to do what they're doing, and they still suck at that, so "just make it not do that" is so frustrating. The only thing worse is when people ascribe malicious motives like openai is nefariously deciding that it should inject those mistakes.
Maybe this is the "I could build that in a weekend" of ML. Sorry for the rant, it just makes me want to vent.