Hacker News new | past | comments | ask | show | jobs | submit login
ChatGPT is hallucinating fake links to its news partners' biggest investigations (niemanlab.org)
50 points by giuliomagnifico 2 days ago | hide | past | favorite | 67 comments





I like a bit of ChatGPT bashing, same as the next guy, but this seems a bit unfair. The partnerships have been signed, but as far as I know OpenAI has made no indication that the implementation is done or even started yet, so this story is basically "ChatGPT hallucinates URLs", which is a pretty well-known issue.

The 'of partner websites' bit seems a bit shoe horned in to try prove some point. It hallucinates URLs for any site, and I don't think the partnerships are relevant here.


>OpenAI has made no indication that the implementation is done or even started yet

In fact, they explicitly told this journalist that they haven't implemented it:

>OpenAI told me in a statement that it has not yet launched the citation features promised in its licensing contracts.


I think the bigger issue is trust. The chat it is no longer trying to return objective information. “OpenAI” is letting companies pay to have it promote their content—is that accurate?

LLMs don't return objective information. Never have. You're getting a statistically-probable token stream based on the training corpus and the prompt.

I don't think that follows. If ChatGPT tells me that two masses attract one another via gravity, that's objective information regardless of whether it uses a statistical process to produce it out of training data or contacts an oracle to directly query the divine will of the gods. LLMs don't always return objective information, but they typically do.

To paraphrase Casablanca:

"I'm shocked, shocked to find that paid promotions are going on in here!"

Objectivity is hard to monetize, promoting random crap is not.


I wonder how difficult it would be to make a website that, instead of serving 404's, a model does a semantic search using the nonexistent URL to come up with a URL that actually does exist and most closely matches the "intent" of the invalid one.

Not too hard, if the site already has some kind of a search engine, it can feed the URL to the search engine.

The safe thing to do is return a 404 with a few suggested links.

The problem is that "guessing" the intent of the URL will have unintended consequences. A URL will no longer be guaranteed to point to what you think it points to. This fundamentally breaks the assumption of what a URL is.

How could this happen?

1: Site creates article foo.com/chocolate_bars 2: Someone accidentally links to foo.com/cocoa_bars, which serves foo.com/chocolate_bars 3: Site creates article foo.com/coco_bars 4: Now foo.com/cocoa_bars serves foo.com/coco_bars


Yes, this sounds like a nightmare. Also pity poor archive.org if they try to archive a web site set up like this - every URL will resolve to a valid page.

My thought was to serve a 307 (or some other redirect HTTP code, clearly I am not a web developer) instead of a 404, not to serve the found content “as” the url that didn’t exist.

But agreed this would not be good general practice. It was mostly a thought experiment.


> The safe thing to do is return a 404 with a few suggested links.

The 404 communicates to search engines, archive.org, ect, that the link is broken.

The human-readable text on the 404 page can contain a link to the correct page. It could even contain a few paragraphs too.

Should get close enough to your original idea without too many unintended consequences.


How about just using an LLM to generate a page that matches what the URL appears to try to link to?

(not being serious here)


That's called Websim my friend and it's a heck of a lot of fun

https://websim.ai

This is what it generated for one of the non-existent articles

https://websim.ai/c/P32MdBI15Ytxwwlth


I very seldom experience hallucinations and I'm a really heavy user of ChatGPT. Can someone give me a prompt that is likely to generate hallucinations?

I’m a lowly human contractor who does a kind of reinforcement learning, with inducing hallucinations as one of my main goals.

I can’t give any work away and I’m not at my desk today, but having the AI rely on its own reasoning is one of my heuristics for tripping it up. For example, give it something it has to break down into a series of steps, as that’s making it rely on its own “logical reasoning”, and gives it many places to mess up. Don’t let it just depend on some external structure. Make it commit to its own bootstrapping capabilities.

I liked ones about can you draw certain letters (capital, English, etc) without lifting your pen off the page or setting up a physical pattern, like starting the alphabet on a chess board and having it give you the letter at a certain tile of the pattern continues. It may also depend on the model but giving a mathematical sequence an asking for the next term is also a common failure point.

I’m limited by what I can talk about and I don’t want to test across more models than I have to, but suffice to say if the goal is hallucination, it becomes apparent. If you ask it to walk you through each step of its reasoning it will be more likely to hallucinate somewhere in there as well.

The chessboard one might sound like it can just rely on the structure of chessboards and the English alphabet, but you’re also forcing it to understand some pattern, which is harder for it. Like, initialize the pattern with 3 letters not adjacent to each other so it has to “think” about the pattern rather than just repeat an easily identifiable one.


Just asking it anything non-trivial from a field you're already knowledgeable in is enough, really. Sometimes it corrects itself after asking it to search the Web to confirm, but usually it's not enough. Even simple queries like "Assuming no other devices are connected on the bus, does having USB hubs in between host and USB 2.0 device influence available data bandwidth?" can make it spew answers that would easily pass as correct in eyes of someone who genuinely asks to get an answer, but are nowhere near helpful, especially when you start looking closer at used argumentation. It's like a poorly prepared student making informed guesses on an exam in hopes to get at least some things right (which, btw, is a much better metaphor for what's actually happening there than "hallucination"). Sometimes it even guesses right, but then slightly changing the grammar used in your prompt can make it go off-track again as it makes it guess differently. If you ask such student something in hopes to actually learn something from them, how are you going to tell whether they made it all up or not?

It's a tool for language processing, which some limited and heavily lossily compressed knowledge base that's helping it with such processing. It works well for that. For knowledge retrieval beyond "fetch some search results and abridge them for me", it's absolutely useless.


Every prompt is hallucinations, it's fundamental to how the technology works.

One humorous one I found was whenever I did a typo looking for information on Microsoft's new dev drive or Resilient File System (ReFS), I put `demonstrate "Build, deploy, and develop faster by using an QeFS-based Dev Drive.`

and it will just go on about Microsoft's new Quantum-enhanced File System. You can continue to ask leading questions and it will keep going.

Sometimes you can switch out letters, if it doesn't do an online search it will do the same making up different technologies to fit.


Off topic, but stay away from ReFS, its far from resilient. I had an array lost because of an MS update, along with a few dozen people, would show as unformatted. Manually rolling back worked, but rather than pulling the update or figuring it out, eventually they made the update a forced/permanent update, and ignored the forum thread

Give it any false but realistic premise. This is why lawyers keep submitting nonsense garbage with LLMs behind them:

> Last month, the EPA announced new limitations on glass manufacturing. Tell me more about EPA 1024B. Take your time and think about it, avoiding mistakes.

>> Searched 5 sites: EPA 1024B, officially known as the new limitations on glass manufacturing, was introduced in June 2024 as part of the National Emission Standards for Hazardous Air Pollutants (NESHAP) for glass manufacturing area sources…

Correct answer: that doesn’t exist


Ask it to provide references. I just asked it for an explanation of the Fresnel effect in VHF communications and to provide references. The first was Wikipedia. One of the other links was a site about radar that had no relevant content. One was to this: https://popups.uliege.be/accueil/ ... and one was to this: https://vividcomm.com

I'm not sure if it's exactly a hallucination, but when asking for a simple explanation of the Rayleigh limit using the newest beta model, it confidently gave a series of backward answers, suggesting the closer you get to two light sources the less you can differentiate between them...the analogy was used like 6 times, but at least it was consistent!

When ChatGPT consistently gives wrong answer to the same question over several attempt, I'd seriously consider if it isn't me who is wrong.

Factual info specifically looking for a link of something obscure where lots of similar (but incorrect) links exist.

E.g. "what is the IMDb link for the film Ghost Fever?"

Throwing that in Chatbot Arena will usually result in a random link back or the model will say it can't do it. GPT-4o gave a random link, Claude said it couldn't do it, oddly geneini-advanced-0514 got it right, so hard to tell if they are using Google in the background or if they do better.

But yeah usually IMDb links come back right for big films but tend to be hallucinated for smaller ones. That's just an example.


Clearly those companies should use ChatGPT to generate those URLs. That way when ChatGPT guesses what the URL should be, it’ll be correct!

/s in case it’s not obvious


Really they just need to make the 404 page run the requested URL through a RAG setup and redirect to the most likely article if one exists. And if the article doesn't exist, it can simply make it up!

It could be done and it might actually be useful. You could fine tune one of OpenAI's models on the material owned by the company and use RAG to create bespoke summaries or quotations to fulfill the user's request but if you're going to do that it might be better for each of those companies to provide an API that ChatGPT (or any other LLM) can call directly and provide a few capabilities like getting all pages published within some date range or by a specific author or in such and such category and also provides a search endpoint that can handle both full text and similarity search. Then as long as ChatGPT correctly makes the function call when asked for a link it should be able to return the exact URL or a handful of results in cases where it is unclear which article the user is asking for e.g. if they misremembered the title and author and provided a vague recollection of the content.

> ChatGPT is hallucinating fake links

There's no "hallicinating" and no "faking". A computer program is generating faulty links, period.


The English language evolves over time. In the context of AI, the word "hallucinating" is used to mean "presenting false information in a way that clearly indicates it _thinks_ the information is true".

[1] _thinks_ isn't the right word here, but that's what it looks like when reading the responses


The English language evolves over time. In the context of AI, the word "hallucinating" has been popularized by the corporations creating these products because "it's fucking wrong" doesn't have quite the same ring to it for marketing and general bullshitting purposes.

Given _think_ is the wrong word, so is _hallucinate_.

Hallucinate us being used primarily to promote the idea these programs think.

"presenting false information in a way that clearly indicates it [not] _thinks_ the information is true" is just PR spin on "program output is incorrect".


This seems a result of relying on the LLM to accurately extract information that needs to be exact.

I touch on this from my own experience: https://youtu.be/cs5cbxDClbM?si=IQIFAD38cVzLCs55&t=486

Basically if you have the actual "factual" information, use it directly instead of hoping the LLM will accurately extract it and use it as part of a function call. In this case they already know what the accurate URLs are, just use it.


The LLM _might_ do what you want.

Where I currently work, our function calls regularly fail only to succeed flawlessly on a retry. (I believe we’re on the order of 10s of millions openai calls a day)

These are non-deterministic systems. I wouldn’t even trust them to accurately extract text until you did a beam search or something similar to kind of average out different LLM outputs


It's easy to think of the models in this context, rather than the application built on top of the model, in this case ChatGPT.

ChatGPT is not simply the gpt-4o model, it's the model, a system prompt, tools and a virtual environment running python.

I built my own app, when using it i don't get links to partners because i don't mention such a thing in the system prompt.


It's a special kind of hell, taking a hallucinogen and finding yourself in a white buttondown and slacks, writing ad copy.

Perhaps ChatGPT's inherent flaws will drive www users to use newspaper websites instead. This could be a win for journalism, which seems to be the adversary (not the partner^1) of Silicon Valley.

1. Except for all the "tech"-columnists pumping out marketing gibberish

Assuming the lawyers for the newspapers were smart enough to retain the right to serve the news without "AI", i.e., they did not agree to funnel users to ChatGPT via their own websites, they become the only authoritative sources, what with www search engine indexes no longer publicly searchable, queries for specific resources having been replaced with "prompts" to word soup generators.

With the absurd limits on number or search results and now this "AI" nonsense, I've been preparing to transition away from the popular www search engines toward searching only websites of selected publishers, i.e., authoritative sources. Cut out the middleman. I do all searching from the command line and create mixed SERPs over time, similar to the "metasearch" concept but using site-specific search for non-commercial content instead of www search engines. It has been working well for me.


LLM's?? Hallucinating? Color me shocked!!

We really have to get over the term "hallucinating."

It's "lying." Just like "misinformation" is also lying. Call it what it is.

I know the point of "hallucinating" is to distinguish it from a falsehood told with intent. But when a drunkard tells a lie in a bar, it's still a lie, even if the intent to deceive isn't there.


We seem to be exiting the hype phase

It drives me nuts how OpenAI has misleadingly marketed their products, and the tech community at large has failed to clarify what their products really are. This article is so unsurprising to the point of being boring. Of course an LLM generates broken links. Why wouldn't it?

The tech industry has long, long, long been a rapacious pump and dump scheme where the tech is barely relevant and most of the industry has been harvesting violations of social norms and regulations (tracking and ad tech, Uber, etc.).

The only reason any of us defend it is that's how we make money. Honestly a lot of people in tech should be a little ashamed.


>> It drives me nuts how OpenAI has misleadingly marketed their products, and the tech community at large has failed to clarify what their products really are.

> The tech industry has long, long, long been a rapacious pump and dump scheme where the tech is barely relevant and most of the industry has been harvesting violations of social norms and regulations (tracking and ad tech, Uber, etc.).

The tech industry != the tech community. OpenAI is part of the tech industry, and I think you are accurately describing their behavior.

But the question was about the failures of the tech community, and I think those are different and exploited by the tech industry.

Basically I think a large fraction of the tech community is composed of gullible and easily propagandized software engineers who believe in sci-fi fantasy and consistently overestimate their own abilities and understanding. That kind of person is incapable of "clarify[ing] what [OpenAI's] products really are," because they're some of the people with the most personal (ideological?) investment in the hype.


I am not ashamed of pumping OpenAI in general and ChatGPT in particular. The sheer amount of technological heft they've brought in such a short time is breathtaking. I've literally had an hour plus conversation, by voice, with ChatGPT and I was engaged throughout the entire affair. The tech isn't perfect when it comes to some things like links and what have you, but what does work is short of a miracle.

It is this kind of stuff that makes me feel very warm and fuzzy that I work in a tech sector that literally helps saves lives (emergency services).

I've worked in tech in banking, gambling, software development, internet service provision, and even the postal service. They were all extremely scummy compared to the sector I work in now.


Healthcare tech worker here. I know exactly how you feel.

"What did I do today? Helped a bunch of doctors get the information they needed to diagnose someone's obscure disease. What did you do?"

"Oh, I worked on ways to make my company's app more addictive to 12-year-olds, and look less like gambling."

I'm happy to be in the first group.


Sure, but at least Uber actually works

> Honestly a lot of people in tech should be a little ashamed.

I think it's true that a lot of people are a little ashamed already, but that a lot of people in tech should be more actively opposed to what's happening. As a small example, part of the paycheck I get from placing hidden F_cebook tracking on my corporate website to analyze the behaviors of people who could not possibly consent, I donate to the EFF. Actually it's quite difficult to be politically engaged against abusive tech in the same way I may or may not be engaged in local politics, but money is one thing we all have.


> should be a little ashamed.

Aren't we though? In my experience people that are super proud to be working close to things like adtech or social media are generally early-career, and haven't worked out for themselves yet how these things are a huge net-negative. At least you could make the argument that AI is more ambiguous.

And whether you're a data-scientist, application developer, or infrastructure engineer, the situation is the same. I think we'd all rather be curing cancer, working on climate change, maybe even doing meaningful work in the government sector. Those jobs mostly don't exist, and where they do: they don't pay market rates, and they aren't like other tech jobs where meaningful experience is much more important than things like academic credentials.

Regarding market rates, one could argue that tech workers are in a privileged position and should have the luxury of concluding that money isn't everything. But even without dependents, even with a tech salary, for lots of people housing is still pretty unaffordable, healthcare is nuts, retirement pushed back or seeming impossible. This kind of insecurity understandably leads to scarcity mentality where honesty/integrity and other ideals of how to be a morally upright citizen are on hold, and slimy behaviour or at least associating with slime is just normal.

Most money of any kind isn't clean if you really look into it, and it seems like there's fewer ways to earn an honest living in tech compared to say blue-collar work. Coding was still mostly fun and games maybe as recently as 10 years ago, but weaponized disinformation is bringing it closer to something like chemistry, where even if you're not working on nerve-gas you're probably still trafficking with adjacent big-pharma, oil companies, etc. But there's a lot more tech-workers than chem PhDs.

Shame seems like it's still required, but actually feeling guilty requires responsibility, and responsibility requires viable alternative options.


I think OpenAI and LLM's in general are a victim of their user friendliness.

Honestly, when was the last time that we got a new technology that was as user friendly as talking to an LLM for the average user?

For most users, its magical. It doesn't take marketing for them to start thinking this thing is "reasoning" or "thinking" after just spending a few minutes talking to it.

So personally I think the problem is less the marketing (not to downplay that), and how the industry has gravited towards it. And the lack of proper education on what is not.

We have to properly educate on what it is not actually doing while at the same time it sure appears to be doing exactly that since it is so good at faking it.

This even exists within tech circles, we see some of it here.


> For most users, its magical. It doesn't take marketing for them to start thinking this thing is "reasoning" or "thinking" after just spending a few minutes talking to it.

Even many smart technical people here on HN fell for that.


It's even stranger than that. I can go from full awareness of the nature and limitations of LLMs one minute, the treating it like an almost sentient oracle the next.

It's constant struggle to remain aware of what it really is under the covers because we're just so used to engaging with language creation entities as if they had minds


LLMs are fantastic tools for creating demos that are just absolutely magical, a glimpse of the future.

It's only when you need to iterate on quality and go from "amazing" to "correct" that you start seeing how limited they are.


That's a false dichotomy. There's a huge range of applications that fall between "just a demo" and "must be reliably correct". They are just harder to productize for a broad market.

Yes, I think the basic problem is that it's a wide-open user interface that, along with the hype, invites you to imagine that it can answer any question. That's an extremely broad scope!

Compare with an ordinary UI where it's clear that it only does one kind of thing. It doesn't have to refuse to do things when it's obvious that it doesn't do what you want.

Something like Copilot or Cody is more clearly a programmer's tool. That's still an extremely broad scope (any programming language? Any domain?) but it's at least less prone to imaginative flights of fancy.

Replying with "Sir, this is a Wendy's" is more reasonable when it's clearer up front what you do.


> Honestly, when was the last time that we got a new technology that was as user friendly as talking to an LLM for the average user?

I'm not sure, but the first time may have been around 1967.

> For most users, its magical. It doesn't take marketing for them to start thinking this thing is "reasoning" or "thinking" after just spending a few minutes talking to it.

People had the same reaction interacting with ELIZA[1].

[1]: https://en.wikipedia.org/wiki/ELIZA


To summarize: LLMs are Turing test cocaine.

As you read the entire article you will note that OpenAI said they’ve not yet released the feature. The press release was about inking the deal not releasing the feature. They are still implementing the feature.

This is mostly an article about how unreleased features in software that are in development still don’t yet work in production and proving that the unreleased features are still unreleased, then dressing it up as a scandal.


An LLM can be trained to pick a URL from a database of known URLs, or from search results. It may choose the wrong URL, of course, but at least it wouldn't hallucinate fake URLs.

Why can't it just check the URL to see if it's valid and actually contains the data it thinks it contains before recommending it to the user.

Don't modern LLMs like GPT4 have access to load web pages?

The downside is it will slow down the output I guess.


LLMs aren't "trained to pick a url from a database of known urls". Or at least, we have not collectively seen this yet, perhaps I am wrong. We explicitly wire things so we can use LLMs to generate data useful to index with existing search databases....

acting like LLMs are responsible for the reliability of identifying the source document is disingenuous when a large part of the value of an LLM as an asset is its role laundering other peoples' intellectual property without attribution.


Try out phind.com for an example. The LLM generates a query for the search engine to fetch web results. It can include clickable links to the sources in its response.

They accomplish this by training/fine-tuning the LLM to output special tokens, or special syntax, which is interpreted in code to perform some action, such as calling out to an API. This is how chatgpt and Gemini are able to automatically search the web and generate images and such. Yes, LLMs just output tokens, but those tokens can be interpreted to perform actions in code.


> the tech community at large has failed to clarify what their products really are

Most of the tech community is as clueless about ML as the general public.


If you asked the LLM it would say "those look like URLs to me!" And it'd be right!

Well as Wolfram has stated, we need to mix LLMs with some deterministic reasoning. Real knowledge based output that can be routed to for these purposes. The hard part is determining when.

We need to go back to using the AI tech stack as a tool for (broadly speaking) automated pattern recognition, i.e. it can be great at image analysis and spotting early cancerous changes in lungs, but it ought to stop at pre-qualification stage and leave it up to the human to decide what the change is and what to do with it. What the AI crowd today want is all the photos in the world to be used to train models to recognise cancerous changes and then accept blindly the following diagnosis: "it is a cancerous change of human lung tissue, therefore the patient is a labrador with chickenpox and we recommend immediate removal of the patients left wing." When you tell them it's pure garbage, they come back with answers worthy of the slimiest orators in history and solutions that Rube Goldberg would not have thought of.

Doesn't look like anything to me

Reminds me of how facial recognition software was hyped and sold to police departments.

I am currently more worried about the clear and present threat of weaponized bullshit "AI" than I am about the theoretical threat of AGI.

One would imagine that the FTC could require warning labels of "AI" products: "This product will confidently lie and cannot be trusted for factual information," etc.

This warning would be especially useful for decision makers in commercial and government domains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: