Simpler than that: It's all hallucinations, some of them just happen to be ones humans approve-of.
It's kind of like a manufacturer of Ouija boards promising that they'll fix the "channeling the wrong spirits from beyond the mortal plane" problem. It falsely suggests that "normal" output is fundamentally different.
This is a great insight and fascinating to me as well. What even is the solution though? It does seem like it follows logically though, since the earliest days of the internet huge swaths of wrong, fraudulent, or misleading info has plagued it and you’d usually have been wise to check your sources when trusting anything you read online. Then we had these models ingest the entire web, so we shouldn’t be surprised at how often it is confidently wrong.
I guess reasoning and healthy self-doubt to be built in system. Already the reasoning thing seems like 2025's candidate for what large labs will be zeroing down on.
This is the interesting part of the experiment. Since these LLMs are general and not specifically trained on historical (and current) stock prices and (business) news stories, it isn't a measure of how good they could be today.
My first through after seeing this post was that it's a real world eval. We are running out of evals lately (arc-agi test, then sudden jump on frontier math, etc). So it's good to have such real world tests which show how far we are.
If you believe (as many HNers do, although certainly not me) that LLMs have intelligence and awareness then you necessarily must also believe that the LLM is lying (call it hallucinating if you want).
If you ask chatgpt to tell a story of a liar it is able to do so. So while it doesn't have a motivated self to lie for it can imagine a motivated other to project the lie on.
Reminds me of recent paper where they found LLMs are scheming to meet certain goals; And that is a scientific paper done by a big lab. Are you referring from that context?
Words and their historical contexts aside, systems which are based on optimization can take actions which can appear like intermediate lying to us. When deepmind used to play those atari games - the agents started cheating but that was just optimisation wasn't it? similarly when a language based agent does a optimisation, what we might perceive it as is scheming/lying.
I will start believing that LLM is self aware when a research paper from a top lab like Deepmind/Anthropic put such a paper in a peer reviewed journal. Otherwise, it's just matrix multiplication to me so far.
IMO a much better framing is that the system was able to autocomplete stories/play-scripts. The document was already set up to contain a character that was a smart computer program with coincidentally the same name.
Then humans trick themselves into thinking the puppet-play is a conversation with the author.