This is overstated and easily disproved. ChatGPT produces accurate facts as a matter of course. test it right now: "How tall is the Statue of Liberty?", "In what state was Abraham Lincoln born?", etc. There are an infinite list of factual questions for which it produces accurate and correct answers.
It loves to hallucinate api methods that don't exist. It can struggle with indivudual logic problems or questions. But these limitations have clear explanations. It's doing language completion so it will infer probable things that don't actually exist. It wasn't designed to do logic problems so it will struggle with classes of them.
Dismissing it as completely unreliable is unnecessary hyperbole. It's a software tool that has strengths, weaknesses and limitations. Like with any other software, it's up to us to learn what those are and how it can be used to make useful products.
It’s completely unreliable because it sounds confident and plausible even when it’s wrong. Essentially it fails in ways that are very hard to detect.
If you ask it question you have no way to know whether it’s right or not unless you already know the answer.
I’ve asked it many questions about my area of expertise, distributed systems, and it was often flat out wrong, but to a non expert it sounded perfectly plausible. I asked my my wife, a physician, to try it out and she reported the exact same problem. Some of the answers it gave had even veered off
from just annoying to actually dangerous.
That doesn’t mean it can’t be useful, but using it to answer important questions or using it to teach yourself something you don’t already know is dangerous.
Weirdly enough, people seem to view this only from one side: "ChatGPT is an expert at lying!".
Actually, it was explicitly instructed to sound confident, serious and expertly.
The important thing here is: to most people, it is enough to sound like that in order to make them believe.
This heuristic is trivial to hack, gets exploited not just by ChatBots and is downright dangerous on a societal level.
Acquiring and vetting information is a task of crucial importance situated in between journalism and science. Just trawling the internet and autocompleting the acquired driftwood is a weird approach to begin with. Doing so without transparency in private companies is simply folly.
I know flesh and blood humans who are confident in their knowledge but still manage to make shit up and get facts wrong. I make up my own mind about what's true and what's useful by considering an ensemble of sources.
In my experience with ChatGPT, it gets less wrong and more right than most humans, and it's able to find relationships among facts and concepts that are impossible with Wikipedia and Google.
It's extremely useful and powerful in its current form, in spite of its many limitations.
With other tools, we don't demand perfection to find them useful.
Just because a hammer sometimes misses the nail and puts a hole in the wall, just because Wikipedia sometimes gets vandalized with lies, doesn't mean hammers and Wikipedia aren't essential tools.
A good hammer is designed with an optimal weight distribution that makes it easy to hit the nail. ChatGPT may be better than using your own head at times, people will still sue sooner rather than later over lost digits.
However, it provides an excellent opportunity for us mere humans to reflect upon the way we organize our society in regard to vital information processing.
We are awfully bad at it. So bad actually, we can't even tell how it should be in the first place and are disturbed already by some piece of software telling nonsense.
Yeah, it's kinda funny how we simultaneously expect the ideal AI to have human intelligence, but also not to make human mistakes. Guess what? If you build a perfect replica of a human brain, it will make human
errors, because that's what human brains do.
And incidentally, it's the consequences of those errors that teach us not to repeat them... which is the same feedback loop we use to train AI in the first place!
>I know flesh and blood humans who are confident in their knowledge but still manage to make shit up and get facts wrong. I make up my own mind about what's true and what's useful by considering an ensemble of sources.
It's basically like political candidate Kari Lake in Arizona. She is an extremely confident speaker, talking about the conspiracies against her, how she really won the election for Az Gov, all of them have no evidence or support. She is such an excellent speaker, she's just chilling to me because people will often just believe someone who is so well spoken. Chatgpt is that. Imagine connecting what chatgtp says with with a "super confident fascist human avatar" that is kind of Kari Lake. It will happen because it's obvious.
We are just going to have ever more persuasive unreliable sources of information.
It may describe a number of people you have worked with but it does not describe the average medical textbook which will usually describe the best knowledge we have of a condition. If chatgpt produces something that looks like it came from a medical textbook then it is hard to blame many people for believing it. More generally, people started to trust computers more than people as soon as calculators demonstrated their reliability.
I don't know, I get this feeling ChatGPT has also read all the quack books where the condition depends on the alignment of stars when you were born, or how chicken entrails land when the shaman "studies" it. Those books are also written confidently without giving any sign of being completely fabricated.
In the end, why do people believe what they believe? The answer is it connects with what they already believe. That's it. If you had a diet of what we call science, you'll have a foothold in a whole bunch of arenas where you can feel yourself forward, going from one little truth to another. If you are a blank slate with a bit of quackery seeded onto it, you end up believing the stars predict your life and that you can communicate with dead people with a Ouija board.
CGPT doesn't have an "already believe". It just has a "humans on the panel give me reward" mechanism, and all it's doing is reflecting what it got rewarded for. Sometimes that's the scientific truth, sometimes it's crap. All the time it's confident, because that's what's rewarded.
And hopefully after someone shows you that they can be described that way, you stop trusting what they say (at least about the thing they are known to be unreliable about).
A whole lot of people can be trusted within a reasonable identified set of constraint, and we do, while discounting them in others where they're known to spew bullshit. It's very rarely all or nothing.
Obviously. Chatgpt will randomly spew bullshit about nearly any topic though. So you can really only trust it for things you are already an expert in or things that are very easy to verify.
It's not random. It's probabilistic. There is a very big difference. The types of errors are predictable which means they can be improved, and empirically so if you follow the academic literature.
I have read enough of the academic literature to say with some confidence that the majority of the errors I’m talking about are not probabilistic in any meaningful sense. At least not in a way that can be predicted from the amount of readily available training data.
I don’t trust a random hacker news comment by someone I don’t know or can verify any further than I can throw it, so in that sense they are probably similar.
The comment I was replying to was “the errors you’re talking about are probabilistic if you read the literature” my response is “no they aren’t I have read the literature.”
Note that I’m talking about a specific class of error and proving a negative is difficult enough that I’m not diving through papers to find citations for something n levels deep in a hacker news thread.
Here you go, here's a bunch of papers that you have not read. If you had read them then you would know that the errors are predictable and therefore there are many measurable ways to make improvements.
I took the time to read through the only one of those that looked like it was peer reviewed and read the abstracts for the rest.
Survey of Hallucination in Natural Language Generation only provided promising methods for detecting hallucinations in summarization tasks, which are of course much easier to detect. Searching arXiv for a list of non-reviewed papers that sound like they might be related to the topic at hand is fun debate strategy. But no one else is reading this far into an old thread, so I'm not sure who you're trying to convince.
None of these paper prove your claims about hallucinations, and most aren't even trying to. However, even if the errors that I'm saying aren't meaningfully probabilistic aren't hallucinations.
What if instead of asking ChatGPT for facts, we instead asked it for plausible methods of determining those facts? So instead of answering "The statue of Liberty is 151 feet tall" it could instead respond with "Look it up yourself on wikipedia."
ChatGPT has made up sources when I've asked for them, so I wouldn't 100% trust it to provide great sources.
Also, if ChatGPT just redirects you to search, doesn't that remove most of the value?
> Also, if ChatGPT just redirects you to search, doesn't that remove most of the value?
That's pretty much the conclusion I've already come to. I have to verify everything ChatGPT tells me, so using it is pointless if I already know where/how to look something up.
> It’s completely unreliable because it sounds confident and plausible even when it’s wrong.
That applies to my boss, college professors, Wikipedia, or my neighbor after a couple of beers. It's not designed to give the correct medical steps to diagnosis a brain injury.
The problem is most people do believe these confident facts spoken by a leader, they aren't evaluating them. An easy example is the fairly widespread belief that somehow the us pres election was stolen. People said it, but there mountains of evidence don't exist. But millions of people believe it.
You're not thinking about this probabilistically. This isn't about "correct" and "incorrect". There is an entire universe between those two poles with regards to most kinds of knowledge. Facts are a specific kind of knowledge, and frankly, not that important. It's more important to know why the Civil War was fought than to know the exact dates and locations of each battle.
I've heard people say they wouldn't use ChatGPT even if there was "only a 1 in a billion chance that it made their bank account details public"...
May I introduce you to some very real probabilities:
If you are living in America there is a 1 in 615 chance that your cause of death will be the result of an automobile accident.
So yes, it is unlikely that we will ever create a tool that can answer with 100% confidence. It is also unlikely that a manufacturing process will result in a widget that conforms to allowed tolerances 100% of the time.
However, in manufacturing this is understood. A defect of 3.4 per million widgets is considered an incredibly high process capability.
These tools are being made more reliable everyday. Please have a realistic goal in mind.
Edit: Well I've learned this much. Many of you are not Beyesians!
I asked it 3 questions about distributed systems, 2 of the answers were so fundamentally wrong that they would lead to a complete misunderstanding of the subjects, and if you tried to implement systems based on that understanding they would be broken in hard to detect ways.
The problem is that an LLM can only be as reliable as its training material.
For example I wrote a blog post on the 2 generals problem. In the comments there is more text from inexperienced people asserting that a solution exists than there is text from my original article.
An LLM trained on that article and comments will always be wrong.
LLMs can be trained on smaller vetted training sets sure, but they also currently require massive amounts of data and there’s no guarantee that just waiting a few months or years will improve reliability enough to fix these issues.
Isn't part of the problem that an LLM will complete a statement with words that form a kind of statistical midpoint of the things that might come next. But when we're dealing with specific things in the real world, the true way to complete a sentence will not be that kind of statistical function, in fact it only reliably will follow from actually knowing the answer.
So, it's sort of like driving a car based not on a complete map of the area, but a partial map where the missing pieces are filled in from statistical averages of things that appear generally in the world-- but not necessarily in that location. We all may be impressed at how close the predictions come to reality, but the only truly reliable thing is that those predictions, measured with precision, will be wrong. And if you rely on them without verifying them yourself, you might have a serious problem.
I’m going to refrain from epistemological instruction but I would recommend you start with the Greeks and at least cover Kant and the analytical philosophers that followed.
The “true” way to answer a proposition is not even remotely as simple as you’ve described.
Or we could just deflate "truth" for this particular discussion, since what matters is whether predicting the next word works well enough without knowledge of the world. Pragmatically speaking, that's what matters.
Yes, but you have to know to improve the results, and the techniques don't alway work. I asked for a table of sports results from a specific date in the past that's easy to lookup, and it gave me wrong results. But they looked plausible. When I asked it about specific result, it gave me a completely misleading answer, but one that would be plausible to someone who doesn't know the sport very well. After informing it of the wrong answer, it corrected itself by giving a different wrong answer. I tried that a couple times with the same result.
This becomes a problem when people start to treat LLMs as authoritative on facts.
I have no strong interest either way. I played with chatgpt with no expectations going in, and I’ve played with it for many hours since. I have found it severely lacking for many purposes which people have said it’s suited. That’s it.
Consider why chatgpt is banned on Stackoverflow despite the median quality already being poor there (because its answers/questions are even worse than that; the more answers from chatgpt, the worse the site)
The irony here is that I get more help from ChatGPT than Stack Overflow.
I suspect I'm better at using the tool than most people. So if most people are not good at using the tool and then submitting their poor results to Stack Overflow, I can see the problem.
> Dismissing it as completely unreliable is unnecessary hyperbole
No it is not. If it hallucinates facts in 5% of cases, that is completely unreliable. You're basically saying "yes, it produces facts, as long as you already know the fact and double check it!" You cannot trust it with knowledge-based questions that you don't already know.
I just asked it the height of the Statue of Liberty. I had to then look it up for myself just to see if it was telling the truth, because there's no citation or 'confidence level'. How is that useful?
Yep. Ask it a popular question, and it might get it right. Ask an equivalent question that is unknown and it gets it hilariously wrong.
> What is the volume of the Statue of Liberty?
The Statue of Liberty is a hollow copper statue, and its volume can be calculated by multiplying its height, width, and depth. The statue's height is 151 feet (46 meters) from the base to the torch, and its width is 35 feet (10.7 meters) at the waist. The depth of the statue, or the thickness of the copper shell, is about 2.25 inches (5.7 centimeters) throughout most of the statue.
Using these measurements, the volume of the Statue of Liberty can be calculated as follows:
Volume = Height x Width x Depth
Volume = 151 ft x 35 ft x 0.1875 ft (2.25 inches converted to feet)
Volume = 985.3125 cubic feet
Therefore, the volume of the Statue of Liberty is approximately 985.3125 cubic feet (or 27.86 cubic meters).
It sounds confident, the maths looks correct, but the answer is entirely wrong in multiple ways. It might be interesting to see what prompt you would need to use for it to calculate say the cylindrical volume of the main body?
It's actually 990.9375. Curious how it botched the multiplication but still got an almost-right answer (to that multiplication, not to the actual question.)
I will point out that it's frequently useful with knowledge-based questions where it's hard to generate a correct answer, but easy to verify whether an answer is correct.
You don't consult most people you know on things you'd consult with an specialist, but asking ChatGPT is portrayed as something closer to consulting with an specialist than to consulting someone on the street.
> This is overstated and easily disproved. ChatGPT produces accurate facts as a matter of course. test it right now
"the idea that this stopped clock is broken is overstated and easily disproved. the clock produces accurate time as a matter of course. go ahead and ask what time is it, just make sure it is 3:45am or 3:45pm"
what? the argument here is that ChatGPT giving factual answers is a mere coincidence, not at all what the model was trained to do. It's a broken clock, it can tell you the correct time at very specific contexts, but you shouldn't rely on it as your source of factual information. If you feed him enough data saying the statue of liberty is 1 cm tall, it will happily answer a query with that "fact".
Any analogy is incorrect if you stretch it enough, otherwise it wouldn't be an analogy...
My clock analogy works up to this: ChatGPT success in factually answering a query is merely a happy coincidence, so it does not work well as a primary source of facts. Exactly like... a broken clock. It correctly tells the time twice a day, but it does not work well as a primary source of time keeping.
Please don't read more deeply into the analogy than that :)
Nope, not random behavior. ChatGPT works by predicting the continuation of a sentence. It has been trained in enough data to emulate some pretty awesome and deep statistical structure in human language. Some studies even argue it has built world models in some contexts, but I'd say that needs more careful analysis. Nonetheless, in no way, shape or form has it developed a sense of right vs wrong, real vs fiction, in a way you can depend on it for precise, factual information. It's a language model. If enough data says bananas are larger than the Empire State building, it would repeat that, even if it's absurd.
I didn’t say it was random behavior. You did when you said it was a happy coincidence.
I know it is just a language model. I know that if you took the same model and trained it on some other corpus that it would produce different results.
But it wasn’t so it doesn’t have enough data to say that bananas are larger than the Empire State Building, not that it would really matter anyways.
One important part of this story that you’re missing is that even if there were no texts about bananas and skyscrapers that the model could infer a relationship between those based on the massive amounts of other size comparisons. It is comparing everything to everything else.
See the Norvig-Chomsky debate for a concrete example of how a language model can creat sentences that have never existed.
> the model could infer a relationship between those based on the massive amounts of other size comparisons
That is true! But would it be factually correct? That's the whole point of my argument.
The knowledge and connections that it acquires comes from its training data and it is trained for completing well-structured sentences, not correct ones. Its training data is the freaking internet. ChatGPT stating facts are a happy coincidence because (1) the internet is filled with incorrect information, (2) its training is wired for mimicking human-language's rich statistical structure, not generating factual sentences, and (3) its own powerful and awesome inference capabilities can make it hallucinate completely false but convincingly-structured sentences.
Sure, it can regurgitate simple facts accurately, especially those that are repeated enough in its training corpus. But it fails for more challenging queries.
For a personal anecdote, I tried asking it for some references for a particular topic I needed to review in my masters dissertation. It gave me a few papers, complete with title, author, year, and a short summary. I got really excited. Turns out all the papers it referenced were completely hallucinated :)
Clock correctness is relative. If the antique windup clock in your living room is off by 5 minutes, it's still basically right. But if the clock in your smartphone is 5 minutes off, something has clearly gone wrong.
Nor is it only incorrect one billionth of the time, as you seem to be indicating through your hypotheticals. Depending on what I've asked it about, it can be incorrect at an extremely high rate.
To riff off the statue of liberty, a conversation (edited or brevity, to remove the ChatGPT waffling):
How tall is the Statue of Liberty?
The Statue of Liberty is approximately 305 feet or 93 meters tall from the ground to the tip of the torch.
When did terrorists attack the statue of liberty?
The Statue of Liberty has not been attacked by terrorists. However, there have been some reports and rumors circulating on the internet about such attacks.
In which works of fiction did terrorists attack the statue of liberty?
I am not aware of any works of fiction in which terrorists attack the Statue of Liberty. While the Statue of Liberty has been featured in many works of fiction, including movies, books, and TV shows, I cannot recall any specific instance of it being attacked by terrorists.
Do you know about the game Deus Ex?
Yes, Deus Ex is a popular action role-playing video game series developed by Ion Storm and published by Square Enix. The game takes place in various locations around the world, including New York City and the Statue of Liberty. In the game, the player must thwart a terrorist attack on the Statue of Liberty by infiltrating the terrorist organization and gathering intelligence.
(end of chat)
It directly contradicts itself here. The summary for Dues Ex is also wrong (terrorists attacked the statue of liberty before the game, and only the very first part of the game features the statue and it's not to "thwart a terrorist attack on the Statue of Liberty").
It can generate very accurate things, and can generates nonsense. However, you're never quite what it is, and there is no "well, I think it was something like this or that, but I don't know for sure". So the only way to know for sure it's accurate is to verify it with another source. And if you need to verify the accuracy then it is actually just inaccurate.
Of course, humans can also generate nonsense, but you shouldn't measure ChatGPT vs. all of humanity; because it's one vs. the worst of several billion. That's not a fair match. Compared to a reasonably person ChatGPT is horribly inaccurate.
> It directly contradicts itself here. The summary for Dues Ex is also wrong (terrorists attacked the statue of liberty before the game, and only the very first part of the game features the statue and it's not to "thwart a terrorist attack on the Statue of Liberty").
I wouldn't call that a contradiction. I think it's fair to only count attacks that go through.
It says it doesn't know about fictional terror attacks on the statue of liberty, and then it does point out one.
Also, turns out "The Statue of Liberty has not been attacked by terrorists" isn't even correct either: https://en.wikipedia.org/wiki/Black_Tom_explosion – you could argue a bit about the semantics of "terror attack" and whether it was the statue of liberty that was attacked, but it turns up on all the search results (I actually wanted to find other fictional accounts).
> It says it doesn't know about fictional terror attacks on the statue of liberty, and then it does point out one.
"I wouldn't call that a contradiction. I think it's fair to only count attacks that go through. "
It "knows" about a thwarted plan for an attack. I don't think it's a "direct contradiction" to "know" about that but still say "I am not aware of any works of fiction in which terrorists attack the Statue of Liberty."
It's not thwarted. Look at the picture from the previous comment. Does that look "thwarted" to you? ChatGPT confuses a lot of things here, and now you're confused because ChatGPT is confused.
There is a terror attack, but it's NOT the case that "the player must thwart a terrorist attack on the Statue of Liberty". It's merely something that happened in the fictional setting, before the story even takes place.
In many ways this thread demonstrates my point quite well: it's kind-of roughly correct, but mixes up some crucial things which does make a meaningful different, and now someone unfamiliar with the subject matter is confused as well. In this case, it's just a game and of no real importance, but it goes to show ChatGPT should now be used for anything of any importance in its current state.
"I am not aware of any works of fiction in which terrorists attack the Statue of Liberty" followed by "I know about Deus Ex" sounds like a contradiction to me (even though the summary is wrong). Had it instead said "I've heard of it, but don't really know the details" or something to that effect instead of an authoritative statement would be different.
But it doesn't really matter; it's inconsistent and unreliable. Whether you could classify this or that as a "contradiction" isn't important and mostly just pedantic argueing.
It's not a contradiction because it doesn't know about the successful attack in the Deus Ex backstory. At least, that's the claim in the conversation you quoted. Wrong but not self-contradictory.
> But it doesn't really matter; it's inconsistent and unreliable. Whether you could classify this or that as a "contradiction" isn't important and mostly just pedantic argueing.
But it wasn't inconsistent in your conversation. That's my point. It was wrong in a self-consistent way.
> it doesn't know about the successful attack in the Deus Ex backstory
There ya go:
What's the backstory of the statue of liberty in Deus Ex?
In the game Deus Ex, the Statue of Liberty serves as a central location and plot point. The statue has been converted into a highly secured base for the United Nations Anti-Terrorist Coalition (UNATCO), which is the organization that the player character, JC Denton, works for.
The backstory of the statue in the game is that it was damaged by terrorists in an attack prior to the events of the game. The terrorist group responsible for the attack is a shadowy organization known as "Majestic 12," which serves as the game's primary antagonist.
In response to the attack, the statue was converted into a heavily fortified UNATCO headquarters, which serves as the player's base of operations for much of the game. The statue is also home to various high-tech facilities and laboratories, where UNATCO develops advanced weaponry and nanotechnology.
Throughout the game, the player learns more about the backstory of the statue and the role it plays in the game's overarching conspiracy plot.
Was I not clear enough that I was talking about the single transcript? I really don't know how to be clearer.
You're now demonstrating inconsistency between conversations. Great. But your earlier claim was that it directly contradicted itself inside that specific conversation. I don't think it did.
(And no, a new conversation where it directly contradicts itself inside the same conversation won't change my mind, because I already know it can do that. I was just saying it didn't in your original example.)
I continued the previous conversation, but okay. You're being exceedingly "but akshually" pedantic about the entire thing. This entire conversation has zero value and nothing of any significant meaning was discussed at all. Congratulations; I hope you feel very smart about yourself. I was a fool to even reply to your initial comment to start with.
It's fair to call it untrustworthy, however. In other words, it's not safe to trust or believe anything it says without additional validation.
My policy is that teams should use ChatGPT but that everyone is individually responsible for their tasks. In other words, it's a tool, not a replacement, and if the tool does something wrong the responsibility still resides with the employee to the degree that they could have validated the results, but failed to do so. I think this strikes a good balance that preserves human jobs for as long as possible.
A lot of fictional books will contain claims that are true in the real world. It doesn’t fundamentally change the nature of what the author is doing, or whether you should trust any claim chosen at random from a fictional book.
You’re missing the entire point the op was making.
The fact that these queries are correct are not because that is the intent of chatgpt’s “query engine” it because it just so happens to have been fed data that synthesizes to something sane this time (vs confident crap any other time).
Compare this to a database where queries are realized against an actual corpus of data to give actual answers.
The purpose of Chatgpt is to provide fluent gibberish when given a prompt. It’s a LARGE LANGUAGE MODEL. It’s only able to give you response that looks good for a language (over a distribution or words), not anything actually knowledgeable.
When I query ChatGPT about a book or show I know well, it will start giving wrong answers after a few questions. I’ve had tell me that I was wrong when trying to correct it. I’ve also had ChatGPT generate syntactically invalid code that it provided results for. And valid code that produced different outputs from its explanation when I ran it. And then there are times when I’m having it generate fiction, and it gives me a nonsensical scenario. It’s also mistaken me for an LLM on one occasion.
>test it right now: "How tall is the Statue of Liberty?", "In what state was Abraham Lincoln born?", etc. There are an infinite list of factual questions for which it produces accurate and correct answers.
Those are easy because you can verify them. But you actually did go back and check to make sure they were right after you got answers out of it, right? And why is that, don't you trust it to just be right?
This isn't new, but after a bit of prompt hacking, here's ChatGPT very confidently telling you the wrong height of the Statue of Liberty[1]. I made it off by 20 feet, so there's no way anyone would seriously know.
> chatGPT is a very confident FICTION generator. Any facts it produces are purely coincidental.
Wish Stanislaw Lem were still alive and still writing great books, this whole mania around chatGPT and especially articles like this one remind me of his excellent Cyberiad, where at some point two robots from those stories almost get drowned by useless info after useless info (they were printing that info out on pieces of paper). I'm really curious how Trurl and Klapaucius would have handled all this chatGPT non-sense.
I'm ambivalent. It summarises text on papers that are simple enough, but interestingly it fails in ways that suggests it sometimes fails because it isn't "smart enough." E.g. I asked it about "Code Generation on the Fly: A Key to Portable Software", the doctoral dissertation of Professor Michel Franz, and a paper I know very well, and it spit back a description of the process that was perhaps superficially correct but the kind of correct you'd expect of a lazy student who'd not bothered to actually understand the paper. The summary was largely meaningless twaddle that didn't actually tell you anything useful.
I then started probing it for more detail, and pointing out mistaken assumptions, and it got better. But it was first when I pointed out that the technique for encoding and decoding has a resemblance to Lempel-Ziv-Welch compression that it succeeded in identifying the key part of the paper and give a description that while still not quite accurate at least captured the essence of the method.
It felt like I was giving an exam on the topic rather than "getting help" summarising it, and if I didn't know the paper, it would've been hard to tell if I was getting close without reading it.
For conceptually simpler things, though, I've gotten great results, and it seems to work great for things where I know what a good result looks like ahead of time, and "just" want to save looking up details where I'll recognise the right thing when I see it but don't remember it by heart.
That said, to be able to point out the resemblance to LZW and have it connect the dots is at the same time fairly impressive - I've had to point that out to experienced developers (the paper mentions Welch, "This method bears some resemblance to commonly used data compression schemes [Wel84]", with Wel84 reading: "T. A. Welch; A Technique for High_Performance Data Compression; IEEE Computer, 17:6, 8-19; 1984. {2}" but it does not go into any further detail of how they relate).
I think, overall, that this is an indication that it will get better at this as the model size increases further. It was able to at least recognise the semantic similarity or identify that this similiarity has been described elsewhere and use it to guide it's response in a way that strikes me as far from trivial.
[It did go on to hallucinate several relevant papers on the subject; interestingly it confirmed it had made them up when I asked if it had]
I was able to convince it that it was the year 2050, and then got it to describe historical events from future, including the destruction of the moon, and the subsequent rebuilding of the moon. Leading questions are fun, because I asked why the new moon was built using cheese, and it happily gave justification regarding the unique material properties.
Kind of wonder if there’s going to be a counterculture movement that looks a bit like old school PGP webs of trust. Humans inviting humans and trying to keep out the synths.
>The stories and information posted here are artistic^wmechanically generated works of fiction and falsehood. Only a fool would take anything posted here as fact.
I assume it's only a matter of time before someone plugs something similar into Wikidata and rather than using AI to generate facts statistically, look the specifics up where known.
It would be nice if ChatGPT would police itself and not say anything that isn't true. That's what people expect of computers, and why we're in the mess we are now.
But then the problem is people disagreeing about whether ChatGPT is a fair arbiter of truth.
> My wife’s career is in materials science, and she often writes technical papers related to her work. Recently she used ChatGPT to collect background information for a paper she was writing.
Damning indictment of computer literacy, but a good example of the problem facing the majority.
It's not that no one has been raising these issues, just read comments on HN posts. But no one in the hype-industrial complex has popularized them.
My experience doing that is that this is not common practice - talking about following up on a reference to find it actually refutes a claim, or more commonly that a citation has little to do with what it is being cited for (more extrapolation than a completely inappropriate use).
I agree with your should, but I do not think it is a realistic expectation right now.
As I see it, even if the only positive impact the ChatGPT ends up having on the world is in forcing researchers to double-check citations, then "mission fucking accomplished".
There are so many garbage papers in various venues that maybe it's a good idea. It forces readers to check them too, this puts them all on the lookout.
But in practice people are more likely to read you if you cite them. Furthermore reviewers are more likely to accept your paper if you cite the papers that they think are important. So the INCENTIVE is cite as liberally as you can.
"What’s fascinating to me is that as recently as a year ago, virtually nobody was talking about this risk. It wasn’t on anybody’s radar."
Like-minded people like Carl Malamud have worried about this in the context of legal knowledge for over a decade. When there is a monopoly (duopoly of West/Lexis(Thompson Reuters) on the "official" publication of statutes, regulations, codes, and other aspects of government, what protections are there that they won't introduce inaccuracies of their own?
What was really scary was that in order to contract for a new copy of say, a certain state's set of statutes, you used to (maybe you still do, I don't know) have to agree to either send back to the company or destroy your old copies. You couldn't be allowed to compare the version from the past if you thought "well this is weird, I always thought murder was illegal but this version says murder is legal." You don't know if that's a typo in this one version, a cosmic ray bitflip, or something more nefarious (and murder would probably be the least conspicious law to get that accomplished).
What if something that used to be punishable by a $10,000 or prison has a new version that suddenly states $100,000 or prison.
We have to have a way that preserves all old legal knowledge and all new legal knowledge - it will never be safe to just go all on digital infrastructure in this regard. There are so many times as a lawyer I want to see the oldest existing paper copy.
And now I have to worry about AI hallucinating an authentic-looking paper copy...
You know, I used to think George Orwell's 1984 was fanciful, but then I saw our governments and media do it all the time, including in current and past conflicts:
Q: When the Soviets justified their intervention by asserting that they intended to fight against a secret involvement of the United States in Afghanistan, people didn't believe them. However, there was a basis of truth. You don't regret anything today?
Brzezinski: Regret what? That secret operation was an excellent idea. It had the effect of drawing the Russians into the Afghan trap and you want me to regret it? The day that the Soviets officially crossed the border, I wrote to President Carter: We now have the opportunity of giving to the USSR its Vietnam war. Indeed, for almost 10 years, Moscow had to carry on a war unsupportable by the government, a conflict that brought about the demoralization and finally the breakup of the Soviet empire.
I can't watch videos from where I'm at, so I'm curious about the Ukraine claim. Am I supposed to believe that Russia did not in fact invade Ukraine? It's all just a bit of propaganda? Or are you claiming that we somehow convinced them it was a great idea and thus it is somehow our fault they decided to roll those tanks?
It’s basically a video where the BBC claims the exact opposite of what they were reporting before, and denies the very things they repeatedly said before, until the UK government got actively involved and influenced them to change their narrative:
During 2014 Maidan revolution: “There are far right neo nazis in Ukraine, they are the most organized and active part of the revolution, here is a whole documentary about that where we go interview them about their goals.”
And then 2022: “Azov has hardly any far right, they are in fact just a part of the official armed forces of Ukraine, and it is the Russians committing all the atrocities in battle”: https://www.bbc.com/news/av/world-europe-61205685
I am sure there is truth on all sides but certainly a wide gulf between what they reported then and now.
Here in USA, Trump supporters could also believe new Trump and even though he contradicted old Trump totally.
>They’re now in the process of changing the journals’ editorial controls to address this risk, but it’s unclear how far down the rabbit hole one must go checking citations’ citations ad infinitum.
This reminds me of a project my coworker did for the university cafeteria. They wanted a system so that their website could list all the ingredients of foods that were going to be served. My coworker was stuck using XSLT to make the page and he decided to use 6-layer nested if instead of learning how to use recursion in XSLT. We had to rewrite the system from scratch and were able to get rid of this limitation on the database side at which we discovered that some recipes had 7 and 8 deep graphs and only looking 6 deep was hiding known allergens.
It's good they're considering how extensively you should verify the info. So all the cited papers in the submitted article are relevant and authentic... are all the cited papers in the cited papers in the submitted article relevant and authentic? Big fun for the reviewers.
Just today I read a Quora bot answer that was confidently wrong, I could tell as I already knew the correct answer. The bot answer also contradicted the unanimous human answers that predated it, so they couldn’t even be bothered do a basic sanity check before posting the bot answer.
I expect a non-trivial number of people will tell you that we've been in a full blown epistemic crisis for some years now, and we have yet to find out just how far the rabbit hole will go.
Companies are lying about what these LLMs are capable of, full stop. That is the problem. And the reason for that is because they can't justify their costs if all they've created is a really good chat bot and writing assistant.
The responsible thing to do would be to have these LLMs refuse to respond if asked informational questions, but that would kill interest. So they won't. They're going to keep trying to shove it into applications that could potentially generate recurring revenue, and it's going to keep failing. I suspect they will go the way of voice assistants.
Why should we hold AIs to a higher standard than humans. The internet (and the world at large) is full of people confidently spewing misinformation - should we go ahead and censor all of them?
Huh? What I'm telling you is that this is not "AI", it is a LANGUAGE MODEL. It models language. People literally think that these chatbots have access to some sort of "knowledge base" or are capable of "looking up information", but they can't. It's not a matter of "misinformation", these companies are trying to trick you into thinking they have something that they do not.
And the idea that we shouldn't hold our tools to a higher standard is a joke. It's like if you sold me a screwdriver as a hammer and then went "wow, nobody's perfect, ok so we should just BAN all humans that are bad at hammering?!"
It sounds like we have a semantic disagreement here. As I understand it, a language model is a mathematical construct, while a particular implementation of a language model as a software system is an AI. At the least it's as much an AI as a Chess Engine, right?
> It's like if you sold me a screwdriver as a hammer and then went "wow, nobody's perfect, ok so we should just BAN all humans that are bad at hammering?!"
That doesn't sound quite analogous to me. My scenario would be that for a particular hammering job, I can hire either a human or a hammering AI. If the hammering AI is imperfect, but still better and/or cheaper than the typical human worker, (at least as a cold-hearted business manager) I'd prefer the AI.
The reason I'm drawing a distinction between "language model" and "AI" is because when you say "AI", it means absolutely nothing. Like you just pointed out, a chess engine also qualifies as "AI", but you can't use a chess AI to write you a marketing copy. So what is the difference? Obviously, you would object if someone tried to tell you that this chess AI can answer your questions. And I'm telling you that LLM's can't be used for educational purposes because they're chatbots.
>My scenario would be that for a particular hammering job, I can hire either a human or a hammering AI
This is why it's 100% analogous: you don't have a hammering AI. You have a screwdriver AI and the salesman is telling you that you can use it as a hammer! These companies are trying to get you to use LLM's for things they are not built to do, because it's hard for the layman to tell the difference. It's a language model. It has no knowledge base. It doesn't even understand "true" from "false". It only generates plausible-sounding text, but it learned from a massive corpus of human-generated text, so it seems intelligent.
Articles like this greatly underestimate the amount of misinformation out there throughout all of human history.
Convincing people that make-believe is objective reality has been a cornerstone of human civilization. The pyramids would never be made without it, America would not have been founded: "We hold these truths to be self-evident". A superpower of humans is the ability to believe-in and spread these myths.
Knowledge pollution existed much before AI. Did Indians actually scalp the Boston settlers? Nope, Benjamin Franklin made it up to strengthen anti-Indian sentiment.
Eh, I think that's a slightly different problem though.
First each of those problems was geographically limited and slow spreading. And yea, the king says dumb crap, but disagreeing with him will get your head chopped off. But at the same time people would do the things in the same way that ensured their continued survival in general.
We are a long damned way from that world now. If you told me tomorrow that farmers decided to stop grown corn and decided to grow 'bio-bitcoin' this year I would not immediately discount that statement. Between consolidation in the industry and Americas short term obsession with profits over long term consequences, yea, it has some plausibility. This extreme consolidation in almost every market presents risk to societies at a global scale. The world back then was far more 'anti-fragile' than what exists today.
And then you're not adding the constant firehose of bullshit and its effects. The bitrate of bullshit was far lower back then. Far back you didn't have books. To hear new and interesting bullshit you had to wait for travelers. Now there is a constant stream of bullshit from an unbound number of sources. It's not even like the older days where we stuck with a common bullshit narrative. We can each have our own custom made bullshit universe.
Don't forget the largest of them all -- religion. Shared mythology seems to be hugely important to human society. If the mythology breaks down, it sure appears to take society right along with it.
> America would not have been founded
This is a great example, for sure. It's hard not to shed a little tear when you sit through the presentation they give on a US Capitol tour about the founding of the nation and the principles that govern it. Makes you want to stand right up and start reciting the pledge of allegiance, and wrap yourself in a glorious American flag.
I'm only a little bit kidding. It really is a great story. I want it to be true, it makes me feel really good. It's a nice counterpoint to the Internet, which generally portrays America about as accurately but in the other direction.
And I think that the breakdown of this shared mythology may be the most dangerous thing happening in America right now. Which isn't to say that the right-wing nationalist flag-waving rhetoric is the right answer, but we probably would be better off collectively if we all believed in the fairy tale of the founding.
Society and culture are a shared mythology, without it you have a bunch of people doing their own thing but no real direction. Regions typically generated their own mythologies because of shared hardships. Religions typically co-oped those existing mythologies (though not always, genocide was used too) into their beliefs in order to more easily convert masses of fresh bodies.
Successful religions typically have a few meta-memes built in. "Have kids or get punished". "Teach those kids your religion or get punished". "Tell other people to join your religion" "If those other people don't want to join your religion turn them into fertilizer".
Always remember when reasoning about, and making assertions about, what AI/ML will be capable of, and hence how it will impact existing conventions, institutions, disciplines, etc.,
to account for the fact that what is possible and commercial today, is characterized by non-linear patterns of change in any dimension you choose. More capable, cheaper, faster, everything.
Consequently whenever you find yourself asserting what will be possible, or how we might be integrating, the systems of more than N months hence,
consider that you may be not accounting for exponential and generational change, not to mention disruptive black swan events from external domains.
TLDR modulo show-stopper from eg nuclear war, political dissolution, and climate crises, we are entering a moment of extraordinary disequilibrium, in which most of the world we inhabit will be upended and have to find new equilibrium.
When I was refining GPT-2 to make fake tweets of Donald Trump I encountered a similar problem about the DOI problem (and it would apply for anyone posting tweets with URLs). The twitter links would look legit because GPT-2 tried its best to create legitimate links. Here DOIs aren't being filtered by GPT3.5 so it does its best to generate a DOI that will look legitimate to a human to be correct but it will link to nothingness. So basically this is noise that GPT-3.5 needs to filter out.
> There’s potential danger when pranksters alter Wikipedia entries, or fraudsters publish scientific papers with bogus data, but the truth eventually comes out.
Pranksters? No, the danger is the people with power meddling with information. They start wars, not pranksters. But there's nothing that can be done about that so it's pointless talking about it.
> The scientific-historical record of accumulating human knowledge has grown steadily for ten thousand years. Yes some information gets lost, some gets proven wrong, some is disputed, and some gets hidden when winners spin the facts to flatter themselves.
"Scientific-historical"? Almost all historical information ( written record ) is lost. Less than 1% of all written text from ancient greece and rome survived. Also it's not a matter of the winners hiding facts that's the issue, it's the winners pushing particular "facts". Take the civil war. It was a war where one side had slave states and the other side was the confederacy. Yet we are told that it was a war to end slavery. Have fun squaring that circle.
> But despite the system’s flaws, until now it’s worked fairly well to maintain our shared understanding about what’s real and what’s true.
Our understanding of what's real and true is solely a function of who is in power. Same as it always has been and always will be.
It doesn't matter whether the tool is spoken language, written language, printing press, radio, tv, social media or AI. Knowledge "pollution" is what societies and nations breath to survive. The only struggle is the fight over who gets to pollute.
Are you denying that slavery was a factor that led to the civil war? Because that's incontrovertible. At the very least I can't see how you can credibly disagree that the issue of the spread of slavery was one driving factor. Pointing to Kentucky and Delaware misses the point. It's not like the thirteenth amendment -- ratified in the same year the war ended, which is not coincidental -- didn't apply to them.
Yeah this statement by the OP: "It was a war where one side had slave states and the other side was the confederacy" only gets away with not being an outright lie by stretching the English language into ways that wouldn't be parsed by most people.
There were some slave states that remained in the Union. All Confederacy states were slave states.
"It was a war where one side had slave states and the other side was the confederacy" implies the following:
[only] "one side had slaves states"
"one side had slaves states" [and the other didn't].
[there was the side with slave states] "and the other side was the confederacy"
All of these are wrong but all are implied by the most common, good-faith parse of that sentence. Indeed using a strict parse of that sentence and De Morgan's theorem for inference it is outright wrong.
chatGPT is a very confident FICTION generator. Any facts it produces are purely coincidental.
Please stop assuming anything it says it true. It was never designed to do that and it provably doesn't do that.