Hacker News new | past | comments | ask | show | jobs | submit login
It's plausible, but is it true? (atomic14.com)
56 points by iamflimflam1 on Jan 8, 2023 | hide | past | favorite | 56 comments



Recent and related:

Playing games with AIs: The limits of GPT-3 and similar large language models - https://news.ycombinator.com/item?id=34285717 - Jan 2023 (49 comments)


Someone had asked ChatGPT to produce working code in three languages. The results seemed impressive

https://www.reddit.com/user/francisco/comments/1044amy/chatg...

Until someone pointed out that simple things like the function torch.integrate_odeint doesn't exist (and other issues with the Julia codes as well) and it is not clear what the generated code is going to do even if you fix this. For now, the generated code is great at fooling non-technical VC's into thinking some deep intelligence is going on so they can now pivot from Crypto/Web3 grift into generative AI.


For obvious errors (i.e. you get an error thrown) if you feed back the error to the session it's surprisingly good at keeping enough context to then fix the error. E.g. maybe in that specific example it needed to import some library or got confused which libraries are available where and mangled it up so it won't run but if you tell it that happened when you ran it it'll fix just that issue in a rewritten output. I was pretty impressed with what it could "fix" of this type of error in some JS code I was asking it to make. Overall this isn't much worse than a person trying to one shot a code example so I can live with this part fine.

The danger is in the non-obvious errors like semantics ones you don't have intuition or test cases for because you got a block of code given to you. A good example of something that tends to break down quickly is asking it to generate long or complex regular expression matches. They are almost always valid regex but there comes a certain point where it goes from "this is the perfect regex for the question!" to "this regex works 99% of the time and unless you spent about as much time looking it over as you would just making it to have the same understanding you'll not know until it bites you" to "this regex is obviously not going to work" very quickly. This part is much harder to live with, it requires you stick to smaller snippets and integrating them which can be helpful sometimes but obviously isn't the end goal.


I ran into the same thing when I tried to have chatgpt boilerplate an Apache airflow operator that sends a message to an MS teams channel on failure.

It hallucinated the existence of a plausible but non existent built in teams integration module.

I asked for a github link to the source and it gave one that was a 404, but was colocated with other MS related modules.

It will be interesting when the language my models become more aware of what they know vs what is inferred to exist..


Maybe it already wrote the code for itself and humanity just hasn’t caught up


I learned Brainfuck in ChatGPT. I asked it to write some code in Brainfuck and it explained the code to me. Whatever code ChatGPT gives me, it needs to be able to explain it before I take it seriously.


And now you have an instructor who sometimes lies to you.


YMMV but many of my human professors were sometimes wrong.


Are you seriously comparing something that has provided zero utility (web3) to something that has already proved its worth many times over?

Copilot has completely transformed the way I write code to the point where there's a before and after. Nowadays I can't imagine working without an AI assisted tool. For me, it's already revolutionary.


ChatGPT is basically just Eliza on steroids.


Not really. Eliza didn’t come with baked-in knowledge, or the ability to associate concepts.

If ChatGPT is Eliza on steroids, string theory is creationism on steroids.


I use GitHub Copilot at work. At first, I found myself blindly accepting its suggestions. They seemed so plausible!

After I got bit in the behind by enough weird, AI generated bugs, I learned the right level of trust. I now know how to read the code it writes and check the parts it’s likely to get wrong.

I think, with this generation of the technology, being too trusting of AIs is going to be an issue mainly for people who don’t interact with them much or in circumstances where the boundaries between real people and AIs are unclear (e.g. customer service chats).

Once AIs’ error rate goes down, that’s where it’s going to be tricky. Humans are notoriously bad at monitoring low failure rate processes. Once things go right a hundred times, we like to assume they’ll go right forever. It’s what makes getting people to properly monitor things like nuclear power plants such a challenge.


First hand observation aside, the 2 ingredients of truth are authority and plausibility.

Robots make great authorities. They're shiny and speak with confidence.


Exactly. It's quite bad. In the case of ChatGPT, it could probably be mitigated by prefacing each interaction or each answer with a warning "I'm just a machine that makes stuff up; don't take what I say too seriously".

But for all the talk about "benefiting humanity" and being good to the world from so-called OpenAI, what we get instead is a constant affirmation that it's never wrong, with the upbeat tone of a dystopian policeman.


Until you realize they’re dumb. I don’t think people are so naive that they’ll ignore clearly dumb robots that they work with every day.

I do think people will make those mistakes with robots they’re not that familiar with.


Experience suggests that people are all too willing to unthinkingly follow instructions just because "the computer said so".

https://theweek.com/articles/464674/8-drivers-who-blindly-fo...


I would say GPS is in the zone of having such a low error rate that people can't process it right. If, as an example, GitHub Copilot added a security vulnerability in one in ten thousand lines of code, that would be a disaster because no one would double check it's work. Currently Copilot adds a security vulnerability in one in ten lines of code (made up number), so people with even a tad of experience using it know not to blindly trust it.

I don't mean to dismiss the issue of over-trust in computer systems - I think it's a very serious issue, and one that needs to be addressed across a lot of disciplines.


People have certainly done some dumb things in the GPS vein. That said, if you're taking confident navigation directions from either a person or a computer, it can be hard to make a split-second decision that something "doesn't seem right." (Obviously some of the more extreme cases still shouldn't happen.)


> People have certainly done some dumb things

This sentence doesn't need to be continued. It's already the ultimate state of affairs since forever.


That's true if your only goal is to generate truthful text. But if you're trying to generate useful text, continuing the sentence is necessary.

Maybe that'll be the next step once LLMs get very good at generating truthful text :)


Challenge the assertions of of a shiny-confident-robot and you will be challenged right back with, according to what shiny-confident-robot?

And no, first hand observation doesn't count, because that's mere anecdote.


It's an anecdote which is meant to illustrate a widely observed behavior among many humans: we're better at accounting for frequent errors than we are at accounting for rare ones. This is well established in the field of human reliability analysis (HRA). Take a look at the HRA method HEART (https://www.epd.gov.hk/eia/register/report/eiareport/eia_224...). There are three different error producing conditions (EPCs) related to rare failures (EPCs 1, 3, and 12 in table 1.3).

As for my argument that people who frequently work with this technology aren't going to be so susceptible to "shiny-confident-robot" - that is based on my first hand observation. But the counterargument (people trust LLMs too much) is not based in data either. The Pearce, et. al paper everyone cites about security vulnerabilities didn't look at how much people trust the code, just how frequently it gives wrong answers (https://arxiv.org/pdf/2108.09293.pdf). An actual user study (n=20) showed that Copilot users engage critically with the tool by examining and testing suggestions, comparing multiple suggestions, and crosschecking it against documentation (https://arxiv.org/pdf/2206.15000.pdf).


They're not dumb. They're management consultants!


> If you’re looking for factual answers - then you need to verify what comes out of these models. And let’s be honest you should be doing this with any source of information

I've seen a lot of statements like this - equating, often flippantly, ChatGPT's (lack of a) relationship with truth to the fact that humans make mistakes, stackoverflow answers are sometimes wrong, etc. It's really not the same thing, though, neither quantitatively (ChatGPT is "wrong" much more often than existing information sources) nor qualitatively (ChatGPT is "wrong" in a different way to existing sources - it is optimised for plausibility and freely mixes correct and incorrect statements in ways highly likely to trip up even the wary).

If I google the definition of a word and I see the top result contains a definition visible on the results page and sourced from Merriam-Webster, then no I generally don't need to double check it or even follow the link. I have reasons to believe both that the information did come from that source and that the source is unlikely to be fabricated. If I ask ChatGPT I have no reason to believe anything it says.

Here's what I got from ChatGPT for "What is Gell-Mann amnesia?"

> Gell-Mann amnesia is a term coined by science writer Michael Crichton to describe the phenomenon where people tend to forget information that contradicts their preconceptions or beliefs. The term is named after physicist Murray Gell-Mann, who won the 1969 Nobel Prize in Physics for his work on the theory of elementary particles.

> According to Crichton, Gell-Mann amnesia occurs when people encounter information that contradicts their beliefs, but rather than updating their beliefs to reflect the new information, they simply forget the information and continue to hold onto their preconceptions. This type of cognitive bias can be a significant barrier to learning and can lead to flawed decision-making.

The first paragraph of this contains a lot of completely correct factual information: the term was coined by Michael Crichton and named after Murray Gell-Mann, who did win a Nobel prize in that year for that reason. However, the actual statement of what Gell-Mann amnesia means in both paragraphs is completely wrong but utterly plausible. (The real meaning of Gell-Mann amnesia is extremely relevant to the ways some people are currently interacting with ChatGPT.)

For this output to be any use to me, I would have to individually Google (or rather Kagi in my case) every individual thing it has said. If I don't I will swallow complete fabrications. Not merely occasionally, but in my experience most times I use it.

I've seen people on hn saying they are doing hundreds of ChatGPT "queries" (they are not queries) a day and "learning a lot" from it. I cannot believe these people are spending anywhere near the amount of effort required to properly validate the results, because it would take more effort than just googling the info in the first place.

Even Sam Altman says not to use to use ChatGPT as a source of factual information: https://twitter.com/sama/status/1601731295792414720?ref_src=...


A good general rule of thumb is that if you want to know what something is or who someone is, at least if they're at all well-known, it's probably a pretty good rule of thumb that Wikipedia or a broader google search is the way to go. (Or other domain-specific Wikis and databases.)

Somehow it conflated Gell-Mann with confirmation bias. Which is a bit odd as there are plenty of references to Crichton's observation online.

This is less true of questions that require some synthesis of information from different sources--but then it's also harder to evaluate ChatGPT's output.


It's like the old academic line about Wikipedia: a good starting point for research, but never trust it.

I appreciate the mention of rubber ducking, that's where ChatGPT can really shine. Just exploring new ideas without trawling through mediocre web pages.


I’m only barely up to speed on the tech so I don’t know if this is feasible or not, but I wonder if it would be possible to combine something like ChatGPT with something like Google. Then maybe suggest hints to the model in real time during inference, based on facts found in another dataset.

It seems like if it were trying to complete info about John Prescott (as in the article) that it could do better if it had some separate context about John Prescott to tap into, scraped from Wikipedia or something.


It's feasible (and has been done). But there are still issues (at least source 2 & 3): The model confuses multiple people named John Prescott.

Query: `Describe the early life of John Prescott`

> John Prescott, Baron Prescott of Kingston-upon-Hull, was born on May 31, 1938 in Prestatyn, Wales [1]. He served as a British politician and stood as the Labour candidate in the 2012 election [1]. Prior to his political career he worked at the New England Aquarium for 22 years where he described how whales communicate with each other [2]. He also has a VetMB PhD from Cambridge University and is an expert on leptospirosis in family dogs from a public health perspective [3]. In addition to his work experience he attended Embry-Riddle Aeronautical University and Boston Public Schools [4][5] before becoming an early career teacher himself and eventually entering politics.

> [1]: John Prescott - Wikipedia (https://en.wikipedia.org/wiki/John_Prescott)

> [2]: John Prescott, 63, Ex-Director Of New England Aquarium - The New ... (https://www.nytimes.com/1998/07/06/us/john-prescott-63-ex-di...)

> [3]: Leptospirosis in the family dog: a public health perspective - PMC (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2228361/)

> [4]: Embry-Riddle Aeronautical University (https://erau.edu/)

> [5]: Boston Public Schools (https://www.bostonpublicschools.org/)

https://labs.kagi.com/ai/contextai?question=Describe+the+ear...


This tech works by predicting the next word based on the previous words. It doesn't have a concept of facts, sentences or even concepts. Just "what token is statistically likely to follow the previous tokens"

You can limit the sources of information that it uses to determine next word relationships. But then you give up the predictive power of having the entire internet as training data.

If you first do a web search to limit the available sources, you increase the computation overhead (first search and then summarize at the token level). and if you're already doing a web search, what's the point in using gpt? You lose the power of big data synthesis and it's just a sort of summarization / style transfer tool.

Individual tokens don't have sources and they can't be fact checked. They're just words or parts of words. Doing any kind of fact checking or citation adds an entirely separate technical process that is distinct from the transformer model.


> It doesn't have a concept of facts, sentences or even concepts

How different is that from humans I wonder? For example, when learning math we talk about developing an intuition about a concept to help actually use it in daily life. This is separate from the rigor of the actual theory, but theory is harder to remember or fully internalize. Intuition to me is more of a feeling that something is related without necessarily really understanding what is going on.


I'm sorry, but this point is trite and incorrect. Humans can manipulate concepts and compare them to other ideas. Which is why we're able to fact check and cite sources, where language models cannot.

It's true that I'm just putting one word after another right now, like a language model. But I can combine those tokens into higher level abstractions like facts and ideas, which I can then compare with other facts and ideas. If ChatGPT could do this, it wouldn't be plausibly wrong nearly as often. Instead it's just "what word is statistically likely to come next?"


> combine something like ChatGPT with something like Google.

Rumor has it that Microsoft is working with OpenAI to integrate into Bing. https://www.theverge.com/2023/1/4/23538552/microsoft-bing-ch...


Probably you could do something either modifying the prompt based on a tool that uses Google search or reprompting after the initial response (which you don’t show to the user) incorporating search info to get a “final” result.


Yes, ChatGPT just had web browsing disabled. But GPT can search the web: https://openai.com/blog/webgpt/


The conversation around chatgpt has been had before. It is more ancient than Plato and Aristotle. ChatGPT can only generate rhetoric. It is incapable of generating dialectic.


The article captures what is evident for many language model users. But it's good that it's captured. I have two more thoughts.

One is that many people have plausible but not true knowledge of things. They often share it online. As I have become more and more specialized in my tech niche, I've started noticing that online advice and discussions of that niche involve only cocktail party-level knowledge, which is wrong in many cases.

The other thought is that our memories also work in an approximate way. So "plausible, but not necessarily true" aligns with our thinking. Flooding the internet with convincing but not wrong "facts" could be very toxic to our minds, given how comfortable they are with that type of information.


This is the annoying thing I've found about AI writers past and present.

Maybe almost a year ago I was playing around with Rytr.me (and their competitors) trying to generate factual content I could use for SEO purposes on hundreds or thousands of band specific pages. Many top sites already have this, a bio of the band, list of albums, awards, some random Q&A's, etc.

This should be an easy job for an AI to do, yet in many cases it was plausible, yet not always accurate. They would miss key facts, point out songs, albums, events that weren't really that important (despite the info being there). Sometimes the data would be off, and I'd have to go look it up, and if I'm going to do that, I might as well just do it all by hand, because I can't trust anything generated. It is a pain enough to do for bands I think I know, but for bands I couldn't care less about, blech.

In theory it would have been good enough, but I didn't want to associate my site with that kind of crappy or incorrect data. Being dated info as well, it felt wrong publishing a page about an artist that died like a year ago and not mentioning it.

I've since circled back around and played with ChatGPT for this, but it's still just off enough that I don't want to use it.

The thing about playing with these tools is that you get really good at recognizing content written by an AI and you really start questioning a lot of things you read online for misinformation. It's kind of gross and scary to see these simple answers appearing at the top of Google for any question and realizing they could be completely wrong, but people don't know.

Just the other day we cut our dog's nail and hit the quick and he started bleeding like crazy, so we look up what to do and find a content blog with what appears to be a few paragraphs with a clear solution, but it also had like 7 other articles showing up with similar queries that all had different answers. I scrolled to the bottom of the Wordpress site and it had 7000+ pages, I go to the last one, they were all added to the site on the same day. Almost every query I typed into Google lead me to this site as one of the first results. It's fucked.


If you don't mind my asking... Is there any merit to the pages you're trying to create?

It sounds as if it feels automatable because it's not really something any human needs to read. Is there anything on that page worth searching for?

I'm so sorry that this sounds judgmental. I don't mean it that way. I'm just curious in that it sounds as if the web really is complete, and we're now just repeatedly rearranging it.

That's kind of the open question with all of these AI projects. We deny their creativity because all they can do is rearrange. But it sounds as if we humans are the ones running out of creativity.


Does a duplication of information need to exist? Definitely not and I don't want to do it (especially if I have to do it shittily with bad info via AI), however the only way to show up in a google search result for what I want is to include this information to trick Google into thinking I offer unique and relevant information so I can help rank up my offering and play the game like every other site fighting to be found online.

I am self-employed and run my own online business that requires traffic and sales. Apart from wasting a ton of my money into ads, I can work on SEO, which only costs time and some effort. I've done this and found that after 6-9 months many of my desired search queries put me in the first 10 results (and without AI or spamming crappy content).

In order to go further and attempt a longtail strategy I offer something customizable and relevant to the fan of every single band and artist out there, so Googling "Band Name [my offering]" would ideally lead to my result being one of the first and having it precustomized to that band. This is how every result is ranked for the biggest players that take up page 1.

Do I want to make the web a shittier place? Definitely not, and unlike many other folks I'm very hesitant to just throw thousands of crap articles that can't provide any value out there to try and get a few ad clicks. I don't have ads on my site. I want to help fans of an artist find the offering they are looking for. I often get amazing feedback and love letters from customers who are so thankful they've found what I sell, but the problem is how to get found, especially when competing for search relevance against some of the biggest websites on the internet. If I can do this instead of giving money to Facebook, I'd be much happier and more profitable.


> Is there any merit to the pages you're trying to create?

No, but maybe they can be monetised for a few extra dollars in ad revenue.

(Do I sound cynical? Yes, I'm tired of the recycled garbage that fills so much of today's web.)


That's my real concern with ChatGPT. That it will be used to fill up ad-infested sites with crap generic content for even less money than paying pennies to a new college grad to spew out 5,000 words a day.


There's lots of new quality content being created all the time. That said, it's reasonable to ask whether there's a need for another music database, another movie database, etc.


The hidden "it" in the title mainly refers to ChatGPT's answers. The topic of the submission seems to be

> we seem quite susceptible to believing things that seem plausible. This is probably quite an important part of just getting through the day - we don’t have time to check everything we read or hear - so we have to make some assumptions about what is true and what is not.

> So, what does this mean for us? Is it safe to use things like ChatGPT?


Many of the friends and relations I talked to at Christmas spend much of their work day writing. None of them are professional wordsmiths, it's just how their work is propogated. They could all see how AI assistance in writing would be useful, and acceptable.


> Large Language Models (LLMs) are great at generating plausible text - but they’re not necessarily great at generating truthful text. They are learning the structure of language, not the facts of the world.

Indeed. Symbol manipulation does not reasoning on critical thinking make.


I think ChatGPT works nice if you treat it like a coworker that you nag on a specific problem.

Sometimes it gives you the right answer right away, sometimes he tells you the name of a function wrongly, sometimes it fucks up, but it still is better than going alone if you check its answers and give it some feedback.

But I wouldn't trust it to write code for me tout-court, at least for now


ChatGPT can already generate almost working code. The obvious next step is to close the loop by replacing the human in RLHF with a compiler and unit tests. I think for coding tasks this would fix most of the hallucination issues.


Many of the hallucination issues seem to be chatgpt shorthand. It can only generate N chars, sometimes it makes up a function which doesn't exist but has a clear definition to get there.

One of my coworkers has a phd in simd accelerated algorithms. We asked chatgpt to make an optimal simd sort. It did fine, except for declaring that our CPU provided a built in that doesn't exist. When I stated my CPU was an intel chip with avx512 and I needed an implementation of this function for avx512..it happily write the required assembler.


So it turns out there’s a reason people include footnotes with sources.

Honestly, I imagine that’s an on going line of research for the nextgen AI bots.

Or maybe I need to get back to graduate school and get to work.


> we’ve come to assume that what we get from a Google or Wikipedia article must be true

This is getting old fast. Google [search] does not produce articles, nor does it produce answers, exactly. It's partly to blame for the confusion, certainly, but still: Google Search returns search results, and in some cases, snippets of text taken from the pages in the search results.

Google Search does not produce "articles", or "answers", or any kind of speech, or linguistic construction. What appears on the Google SERP has not been authored by Google. And that's true of course of all search engines.

ChatGPT is different. While what it says is the result of a compilation of words found on the web, it speaks in its own name, in the first person, and at least pretends to produce original discourse.

And when it makes stuff up and produces imaginary science papers as reference, or non-existing functions in code, it is behaving dangerously.


> This highlights the danger, as humans, we seem quite susceptible to believing things that seem plausible. This is probably quite an important part of just getting through the day - we don’t have time to check everything we read or hear - so we have to make some assumptions about what is true and what is not.

Truth is a relatively new concept. Humans spent eons surviving on 3 rules:

- Can I eat it? - Can f** it? - Will it kill me?

We're still dependant on / wired for those today.


Great summary:

>What we should avoid:

Getting factual answers and blindly trusting them

Trying to solve maths problems - for the love of everything that is holy stop trying this and then posting about how it failed

Anything that involves deep reasoning and deduction - it’s not a human brain

Things that I think are great for ChatGPT are:

Creating marketing copy - with a human reviewer in the loop

Generating code - but check that the APIs it suggests are real

Finding problems in code

Summarising code/text

Rubber ducking - talking to a computer can be a great way to work through a problem

Providing inspiration for ideas

And many many others…


ChatGPT is not a search engine or a knowledge base. It’s a hallucinating oracle.

Great at content summaries though.


In short, its strength is ideas and inspiration, not facts and "truth".

I think it's safe to say we all have friends and colleagues that are the same. Some match well for creativity, others would do well on Jeopardy.


"It's true, I checked Wikipedia" is probably still not great. It's a fantastic resource but Wikipedia is not, and by its very nature cannot ever be, a primary source.


I think the new AIs are great!

With a little bit of luck those machines will teach people that almost all things they believe are true are in fact false, when you check them.

And if this doesn't work out humanity will just get what it deserves, anyway.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: