Hacker News new | past | comments | ask | show | jobs | submit login

A less anthropomorphic approach might be to say that LLMs can predict the correct “shape” of an answer even when they don’t have data that gives them a clear right answer for the correct content, and since their basic design is to provide the best response they can, they’ll provide an answer of the correct shape with fairly random content if all they have good information to predict is the shape and not the content.



I think parent has hit on the how and GP has hit on the why.

How LLMs are able to give convincing wrong answers: they “can predict the correct ‘shape’ of an answer” (parent).

Why LLMs are able to give convincing wrong answers is a little more complicated, but basically it’s because the model is tuned by human feedback. The reinforcement learning from human feedback (RLHF) that is used to tune LLM products like ChatGPT is a system based on human ranking. It’s a matter of getting exactly what you ask for.

If you tune a model by having humans rank the outputs, despite your best efforts to instruct the humans to be dispassionate and select which outputs are most convincing/best/most informative, I think what you’ll get is a bias towards answers humans like. Not every human will know every answer, so sometimes they’ll select one that’s wrong but likable. And that’s what’s used to tune the model.

You might be able to improve this with curated training data (maybe something a little more robust than having graders grade each other). I don’t know if it’s entirely fixable though.

The brilliant thing about the parent’s comment about the “shape” of the answer is that it reveals how much humans have (uh, historically, now, I guess) relied on the shape of information to convey its trustworthiness. Expand the notion of “shape” a bit to include the medium. If somebody bothered to take the time to correctly shape an answer, we take that as a sign of trustworthiness, like how you might trust something written in a carefully-typeset book more than this comment.

Surely no one would take the time to write a whole book on a topic they know nothing about. Implies books are trustworthy. Look at all the effort that went in. Proof of effort. When perfectly-shaped answers in exactly the form you expected are presented in a friendly way and commercial context, they certainly read as trustworthy as Campbell’s soup cans. But LLMs can generate books worth of nonsense in exactly the right shapes without effort, so we as readers can no longer use the shape of an answer to hint at its trustworthiness.

So maybe the answer is just to train on books only, because they are the highest quality source of training data. And carefully select and accredit the tuning data, so the model only knows the truth. It’s a data problem, not a model problem


Cool, thanks for tying a neat ribbon around it.

> The brilliant thing about the parent’s comment about the “shape” of the answer is that it reveals how much humans have (uh, historically, now, I guess) relied on the shape of information to convey its trustworthiness.

This is the basis of Rumor. If you tell a story about someone that is entirely false but sounds like something they're already suspected of or known to do, people will generally believe it without verification since the "shape" of the story fits people's expectations of the subject.

To date I've decried the choice of "hallucination" instead of "lies" for false LLM output, but it now seems clear to me that LLMs are a literal rumor mill.


What's the point of the technology if it will provide an answer regardless of the accuracy? And what prevents this from being dangerous when the factual and ficticious answers are indistinguishable?


We have the same problem with people. Somehow, we've managed to build a civilization that can, occasionally, fly people to the Moon and get them back.

Even if LLMs never get any more reliable than your average human, they're still valuable because they know much more than any single human ever could, run faster, only eat electricity, and can be scaled up without all kinds of nasty social and political problems. That's huge on its own.

Or, put another way, LLMs are kind of an concentrated digital extract of human cognitive capacity, without consciousness or personhood.


> without consciousness or personhood.

Hopefully, for the former.

Be a bit terrifying if it turns out "attention is all you need" for that too.


"without all kinds of nasty social and political problems"

I assure you, those still exist in AI. AI follows whatever political dogma it is trained on, regardless of if you point out how logically flawed it is.

If it is trained to say 1+1=3, then no matter what proofs you provide, it will not budge.


Yes, it could be dangerous if you blindly rely on its reliability for something safety-related. But many creative processes are unreliable. For example, coming up with bad ideas while brainstorming is pretty harmless if nobody misunderstands it.

Generally, you want some external way of verifying that you have something useful. Sometimes that happens naturally. Ask a chatbot to recommend a paper to read and then search for it, and you’ll find out pretty quick if it doesn’t exist.


What happens when the tech isn't only being used to answer a human's questions during a shortlived conversation though?

The common case we see publicized today is people poking around with prompts, but isn't it more likely, or at least a risk, that mass adoption will look more like AI running as longlived processes talked with managing done system on their own?


> The common case we see publicized today is people poking around with prompts, but isn't it more likely, or at least a risk, that mass adoption will look more like AI running as longlived processes talked with managing done system on their own?

If by “AI” you mean “bare GPT-style LLMs”, no, they can’t do that.

If you mean “systems consisting of LLMs being called in a loop by software which uses a prompt structure carefully designed and tested for the operating domain, and which has other safeguards on behavior, sure, that’s more probable.


Yes, people are doing that. I think it's risky.

One way to think about it, though, is that many important processes have a non-zero error rate. Particular those involving people. If you can put bounds on the error rate and recover from most errors, maybe you can live with it?

An assumption that error rates will remain stable is often pretty dubious, though.


Not if they're bad at it. ChatGPT and friends is a tool that's useful for some things and that's where it'll see adoption. Misuses if the technology will likely be exposed as such pretty quickly.


These are the 1-million dollar questions when it comes to LLMs. How useful is it to talk to a human who likes to talk, and prefers to say something over admiting they dont know? And if you have a person with münchhausensyndrome in your circles, how dangerous is it to listen to them and accidentally picking up a lie? LLMs with temp > 0.5 are effectively like these people.


I have the same concerns, but am feeling more comfortable about Munchausen-by-LLM not undermining Truth as long as answers are non-deterministic.

Think about it: 100 people ask Jeeves who won the space race. They would all get the same results.

100 people ask Google who won the space race. They'll all get the same results, but in different orders.

100 people ask ChatGPT who won the space race. All 100 get a different result.

The LLM itself just emulates the collective opinions of everyone in a bar, so it's not a credible source (and cannot be cited anyway). Any two of these people arguing their respective GPT-sourced opinions at trivia night are going to be forced to go to a more authoritative source to settle the dispute. This is no different than the status quo...


The number one problem for generalized intelligence is establishing trust.


> What's the point of the technology if it will provide an answer regardless of the accuracy?

The purpose is to serve as a component of a system which also includes features, such as the prompt structure upthread, that mitigates the undesired behavior while keeping the useful behaviors.


for one, telling people something they like to hear is an amazing marketing tactic


I suggest using 'form' instead of 'shape'; the latter is mainly concerned with external form. In context of LLMs, form would be the internal mapping, and shape the decoded text that is emitted.


those things are anthropomorphic by design. there's no point in being cautious, unless it's from an ideological stand point


I think the social concerns around attributing personhood to LLMs transcend ideological concerns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: