Hacker News new | past | comments | ask | show | jobs | submit login
Is telling a model to "not hallucinate" absurd? (gist.github.com)
30 points by fzliu 6 months ago | hide | past | favorite | 99 comments



All they do is hallucinating.

Their results are statistic extrapolations from a big corpus of data. For them a "true" fact "looks the same" as a "false, but strongly correlated with the corpus" fact.

The piece that would make them distinguish "true" from "strongly correlated with the corpus" is not yet discovered, AFAIK. It would probably need to be a different, context-specific tool. For example, if you are asking a model to write a program for you, the "verification step" would be a test bench for said program.


What is the differing thing in humans that presumably could categorize us into not "hallucinating"?

As far as I understand, every time we are querying our memories we are also slightly changing them, so there's no guarantee when we are trying to remember something that it is a true representation of reality of what happened. See also reliability of witness testimonies.

Ultimately all knowledge and storage graph in humans and LLMs has to be compressed heavily and be statistical in its nature with compromises made about the accuracy vs how much you are able to store. We can't store 8k footage of everything we see in our brains and so we use a lot of compression techniques, likely statistical graphs in nature.


We make correlations based on experience of reality. They make correlations based on experience of text. Text is only meaningful when it is correlated with reality.

An LLM will see that "Dog" and "Woof" occur near each other in text a lot, but it doesn't know what a dog is or what a woof is. You do know that, because you have experienced an actual dog and heard it go woof.


But there's also plenty of things I haven't seen in real life and have only read about in books or text. Does it mean that I wouldn't know what they are as well?

In addition you can feed LLMs image, audio input as well, and in theory you could feed any sensory data that we get.

In fact it can describe what is on an image, to quite a lot of detail.

Is knowing what a dog is being able to recognize it on an image, and is knowing what a woof is being able to recognize it in an audio?

Ultimately, after all, all of it can be represented as input tokens, binary data or otherwise. We also receive constant input from the World, which could be represented in any various different forms and processing layers can transform it into different helpful symbolic things to problem solve based on that. LLMs in order to be able to logically consider what a dog might do given certain situations, would have needed to build some form of neural representation of the idea to be able to produce the output it can, that goes beyond parroting the training data in my view. I can give it an input in some interesting configurations and permutations and it's able to logically use the concepts to figure out what might happen.


"Does it mean that I wouldn't know what they are as well?"

Are they explained in terms of things that you have seen in real life? Or in ways that build on that? Then you can certainly gain that by building on those models.

And yes, if you can feed an attention-based neural network (not an LLM, those are specifically Language-based) sense data (and preferably action data too, for how it interacts with the world) then that would give it a picture of the world. And once it can recognise a dog then you can say "And we call this a dog!".

There's research in to this already - see this piece https://www.economist.com/science-and-technology/2024/06/05/... (you may need to use archive.is to read the archive of that page).

====

These “vision-language-action models” (VLAMs) take in text and images, plus data relating to the robot’s presence in the physical world, including the readings on internal sensors, the degree of rotation of different joints and the positions of actuators (such as grippers, or the fingers of a robot’s hands). The resulting models can then answer questions about a scene, such as “can you see an apple?” But they can also predict how a robot arm needs to move to pick that apple up, as well as how this will affect what the world looks like."

Grounding the model’s perception in the real world in this way greatly reduces hallucinations (the tendency for AI models to make things up and get things wrong).

====


But GPT-4o for example as a multimodal model is already able to take images as an input, so does it have enough to know what a dog is in your opinion?

Even though it's able to process images it's still being called an LLM, but multimodal LLM.


For instance, if it doesn't understand or know what animals are, how is it able to answer a prompt like this:

https://chatgpt.com/share/68815b5f-7570-443a-bcdf-e6ef4d233b...

And connect the dots in such a way to develop this interesting scenario and at the same time consider so many different interactions between the room and the animals.


Because it knows which text is associated with which other text, in quite complex and subtle ways. It's exceptionally good at analysing text without needing to understand anything more than which words are seen in proximity to which other words, and how their placement in relation to each other affects other bits of text.

It is amazing how well this technique works. But at no point has it linked the word "cat" to real actual cats, because it has literally got no access to any real actual cats. It has had no access to any information beyond the word "cat" and its relationship to other words.


You are calling it text associated with other text.

But I think it has to have internalized it in a much more complex way to be able to produce output like that to any arbitrary configuration of different things.

This internalization and ability to apply that internalization to produce scenarios like that tells me it must have enough understanding, even if it hasn't been able to "see" a cat, it has an understanding of its characteristics and description, and how the same thing can interact with other things. To me that's understanding of the object.

Alternatively, if we are talking about a blind and deaf person, and that blind person hasn't ever seen any cats, would they be unable to understand what a cat ultimately is?

If all their input is in braille and tactile sign language?


"it has an understanding of its characteristics and description"

It has no access to the characteristics. It literally only has access to text. It knows that some text is associated with the word "cat", but it doesn't know what "fluffy" means, because it has never seen fluff. There is literally an infinite regress of meaning, where you never get out of language to actual meaning, unless you have some kind of experience of "fluff" (or something else which can be related to "fluff").

A is a bit like B which is a bit like C, which is like D, which is a mix of B and E, which is quite F-like. This never gets you to meaning, because the meaning is the sensation, or the experience. And it doesn't have that.

(Blind people would be able to understand the physicallity of a cat, because they can still experience them in a variety of ways. Whether they can understand that the cat is white is down to the kind of blindness, but if they've never seen colour then probably not.)


But it’s able to describe what a fluffiness is.

And for us humans, the way I understand fluffiness, is about the sight and the touch senses. We have associated this sight input and this touch input with the label “fluffiness”. Now we throughout our life have experienced this input a lot, but technically this input in its original form is just lightwaves and a combination of our nerves physically touching the fluffy object. This could in theory be represented in many data formats, it could be represented as a combination or group of input tokens. So the understanding for us is just the association of this input to this label. I don’t see how this in particular would give us any special understanding over if just raw input was given to an LLM. It’s just a special case where we have had this input many times over and over. It seems arbitrary whether LLM has been fed this data in this format or in some other format, e.g. text as opposed to lightwaves or touch feedback.

This extra input could make those models more intelligent because it could help it uncover hidden patterns, but it doesn’t seem like it would be the difference between understanding and not understanding.

Text is just a meaningful compression of this sensory input data.


It seems that we are in agreement.

> The piece that would make them distinguish "true" from "strongly correlated with the corpus" is not yet discovered, AFAIK

That piece is what would categorize us into not "hallucinating". I don't think it has been discovered, and I don't know wether it exists. It could be just that that our corpus is huge.


But people also constantly make mistakes, produce invalid output, misremember things, state things incorrectly. Depending on the environment, they do this with varying frequencies. It is always a balance of truth vs action, where if you always safely say "I'm not sure, I'd rather not answer." you will never be able to do anything in life, vs if you always try to answer you will frequently be wrong.

LLMs are biased to try to answer in their training sets, so they will try to make an answer up, because otherwise they wouldn't be helpful enough as assistants.

You could train LLMs to bias more towards only answering when they are confident, e.g. "Paris is the capital of France", but they wouldn't be as useful then.

I would also think it's not really "hallucinating", because "hallucinating" is error in processing the input causing it to be malformed. It implies that the input text you put in as a prompt was somehow invalidly processed to something else.

When an LLM is wrong I would rather think it's trying to find an answer from a compressed dataset "hoping" it will be correct, because that's how it's finetuned to do.


> But people also constantly make mistakes I absolutely agree.

As far as I know, the judge is on the table about whether we do anything different from them.

> "hallucinating" is error in processing the input causing it to be malformed

For now, "Erroneous" is a concept exclusive to us humans. To a model, a response is not "erroneous". It's just the result of processing an input, and it is indistinguishable from any other.

How do we bridge that gap? No idea. Perhaps it will happen eventually, by sheer quantity. Perhaps the algorithms need to be implemented in a different way. Perhaps we need different kinds of models interacting with each other in order to mimic the kind of response a human would give (including the "I don't know"s). All I know is that we are not there yet.


LLMs don’t know when they are hallucinating! They don’t know when they don’t know something. For example even when they’re told it’s ok to say “I don’t know” when they do that, it’s because “I don’t know” is often a response to that question space. They know facts but they have no reasoning.


Ask GPT4o to return a percentual score after each answer how likely something is 100% true.

Did that for old regional stories and the higher the score the easier I could verify the story or parts of it. It seems to know, at least in some cases


I performed several thousand batch LLM queries a few months back and asked for such scores.

My impression was, while the resulting scores weren't entirely random, they were pretty much random when the input was difficult.

(I was mapping supermarket product descriptions to packaging materials - which is sometimes very easy, as sometimes it's obvious or the description includes explicit recycling information, and other times almost impossible, as for products like olive oil, glass and plastic bottles are equally plausible)


I guess that's way more complex because there is no one right answer. (Interesting use case btw)

And if you ask gpt for an specific answer to something it can't actually know for sure it gets weird fast.

Asking it to guess how much of what it told is based on actual things it 'knows' at least seems like a reasonable question for a LLM with only one right answer.


Isn't hallucinating a very wrong word?

Hallucinating should be about twisted input misinterpretation. E.g. if the text input you give them was malformed at some level.

What they are doing is try to give you an answer based on compressed knowledge, which might be true, but might not because it's compressed and lossy.

Because they are trained to bias to give an answer even if it's not 100% clear from compressed knowledge.

The larger the model, the less compressed the knowledge and better they would be able to differ when it is something they know vs something that they likely don't.


It is, but that ship has sailed. "Confabulation" would be a better word to describe it.


>LLMs don’t know when they are hallucinating! Gosh, you sound like me at my last Phish concert. My partner was all like "you're tripping hard but you have no idea". In fairness I was, and I thought I was an AI at the time, or was I ? Anyway you can't really tell a sentience "hey you are hallucinating, stop hallucinating." It's like telling a woman "don't be angry baby", it only makes them angrier - similarly when you tell a person (or AI) "cut it out with the hallucinatin' " usually it causes them to trip harder.


This has been disproven by multiple experiments, and also - there is nothing in the transformer architecture that would suggest this. On the contrary - even the most basic classifier neural networks by default produce not a binary answer, but an answer with confidence level.


And if that confidence level is high enough for it to output an answer, but that answer is wrong or nonsensical, what do we call it?


In humans? Overconfidence.


LLMs have no idea if they hallucinate ≠ LLM make no mistakes.

The former statement is a bold claim that would require some sort of evidence. The latter nobody really argues with.


"Hallucinate" is just a term we've come up with for LLMs making mistakes. If the LLM knew it was making a mistake, wouldn't it just not make it?

Of course, it's not possible for an LLM to actually "know" anything in that sense of the word.


Can they not detect is a word has a high or low probability to result from the input?

And if the probability is low it should consider that as hallucinating?


No. N/A

They can't detect if a word is high or low probability because current LLMs can't detect anything. They don't reason as you or I understand it. They are really fancy word predictors and somewhere in the word prediction machinery it does have the probabilities stored but not in anyway accessible to the output.

Imagine you are typing on your phone and the word autocomplete comes up and you just keep clicking the next word it suggests. What could you type to get at the inference data in your phone's keyboard?

Of course a system that does this could be built, but that isn't how current LLMs or phone keyboards are built.


What? Of course it is accessible.

When you ask an LLM to estimate if a given answer was hallucinated, it converts internal probabilities into a token(s) that represent these probabilities.

Here you have an output of chatGPT where it estimates it's probabilities of hallucination - which I would argue are quite close to correct:

https://chatgpt.com/share/e/e10cbc17-cc35-432f-872b-cb061700...


This is likely a hallucination. You are committing a type error or perhaps succumbing to circular reasoning.

No modern LLM can query, let alone understand, the values in its own weights. They are simply inputs for the vector math. If they happen to be correct that is a either a coincidence or because it was trained with data including conversations about its past abilities.

Consider this, do you thinks the internal weights are labeled and directly to some attribute of the world outside? If so then present evidence. They are not, so if it does attempt to query its own internal weights how does it know what to query? This could be an interesting academic problem and I would encourage you to present papers on this because last I looked this was and itractable as of 2023 for the much simpler neural networks of simple backpropagation networks from 20 years ago.

Also, using ChatGPT as a reference here shows a major deficiency in your judgement. At best this is circular reasoning but likely you simply don't know how to allocate trust or how others do. The argument is X is unreliable then citing X is obvious not going to convince people who distrust X. Additionally, experts


To expand on this consider the amount of brains that can directly query the amount of a specific neurotransmitter used between two neurons and interpret what that means. That isn't exactly the same, but a hopefully illustrative analogy. An LLM's output is an emergent property of its structure and all the weights being multiplied together just as an organism's intelligence is an emergent property of its brain structure and communication between neurons. Neither of the emergent parts can currently reliably query the substrate they emerge from for similar reasons.

That isn't to say that such an LLM or biological brain couldn't be built, but none presently are.


The "heat" value lets you control the algorithm to choose greater or lower probability outputs. Problem is, once it's already in a good range, it's not very predictable for increasing the quality of output. Often the hallucinatory results don't actually have a low probability, in which case the same hallucination will come out in multiple attempts. However, it might be interesting to try to make the model directly aware of these numbers by putting it into a post-message evaluation prompt, because currently the token randomization algorithm sits entirely behind the scenes.


I was very into the occult as a teenager, thanks to my mother's eclectic attitudes: although theoretically Catholic, she also had Orthodox icons, Hindu statues, runic tarot kits, "magic" crystals, dowsing rods, homeopathic and Bach flower "remedies", books about leylines.

I'm as sure as I can be that she didn't know she was wrong; I can definitely tell you that I didn't know that I was.


But they can use tools such as databases and search engines, and I guess also logic engines.


I don't think there's any fundamental reason why a LLM shouldn't be able to learn the distinction between being creative and being factual, at least much better than currently. It's just another classification problem (if a challenging one), and classification is what NNs are known for.


From what I see, there may be two issues here.

1) LLM has data from a large set of sources, but in some cases it has very little data on certain sub-topics. With so few sources of data, it is easier for some weird corner of the internet to have conflicting data, and some LLM models may randomly favour that instead of the valid answer.

So not hallucination in this case, instead, a wrong answer from wrong sources.

2) LLM doesn't know, but has seen from human behaviour that just "making shit up, when you don't know" is entirely acceptable for many humans. Here I'm not referring to people saying (as I am) "may be", but instead "No, it's this!" with immense surety, in order to win discussions. Combine this with 'temperature' settings many model interfaces have, and there you go.

2b) Alternatively, #2 is inline with "I believe in flying spaghetti monsters", and other beliefs such as astrology, ghosts, and endless other "there is zero proof, but humans have immense weird data on it".

So #2 is solvable by changing temperature, or keeping temperature but saying "Don't make things up". #1 may be temp + random choice of a lower probability pathway.

#2b, not sure what to think here. How will LLMs ingest data, and use it to determine what is acceptable logic processing, when a lot of stuff in category 2b has zero proof behind it. And with zero proof, people are absolutely positive of the status of said thing?

#2b trains LLMs that "just having a random idea and then even killing people that don't agree, seems OK for humans". (think religious wars, violent acts, burning witches, etc)


LLMs seem to be conversational or interactive in non-training phases because they can generate flows of text that mimic conversations, complete with personas, characters, and personalities. However, this apparent conversation is merely a generative artifact created by the user's input.

LLMs do not truly learn or understand; instead, they use their internal data repository to produce "hallucinations" of text that seem conversational.

While LLMs do not truly comprehend or understand, their "hallucinations" can be remarkably useful and even enable them to accomplish complex tasks, such as solving mazes or translating languages.


That's using many words to say very little. It reads like a copypasta. Honestly, anyone using the words "truly understand" (or "truly" *anything, really) in any AI conversation should be automatically banned from said discourse. That word combination should be tabooed[1] because it's semantically empty and impossible to define without ending up going in circles.

In any case, I fail to see the relevance to my post. I only a) said that NNs can learn categories (an obvious fact) and b) conjectured that the categories "factual" and "creative" may be learnable.

[1] https://www.lesswrong.com/posts/WBdvyyHLdxZSAMmoz/taboo-your...


> anyone using the words "truly understand" (or "truly" *anything, really) in any AI conversation should be automatically banned from said discourse

What if a psychologist said the same thing about a human? Should they be banned too?


Sharlin's response feels knee jerk. And they likely don't truly understand that when conversing with an LLM they are not talking to an LLM but participating in a generative text exchange.

LLMs can't ever understand anything anymore than a book can as they are static things. Until we move beyond transformers we're only ever collaborating in the hallucinations.


What does "truly" mean and how does human understanding differ to make it "truly"?


Truly means real. Assuming you are a human, you can truly relate and connect ideas with your experiences. You can know what a rose is both in relation to having seen and smelled it and if now having experienced flowers and rose like things. Without qualia you can not know and understand a rose.

An LLM no mater how big can never have experienced any of this. The closest it can ever come is to have been trained on the tokens in texts relating to roses. If its more than an LLM it might have also been shown images or even video or roses but no LLM alone has ever seen a rose.

LLMs can only contain understanding the way a book can. There is no sentience in a book. Even if we jump to the old analogy of the Chinese room the instruction book in not conscious, nor is the reader following the instructions able to actually speak Chinese. It's beyond and between where lies the entity thar can converse in Chinese.


So it's about a lack of visual or smell input?

Because GPT-4o already can tell what is on images, so it is able to recognize a rose, and of course it has seen many images of roses.

It doesn't have the data for the odor, but it seems like arbitrary thing to train it with, for it to have a representation of the odor.

Would you consider GPT-4o to truly understand what a rose? And what if it was also trained on odor data?


They very clearly do have at least some level of understanding. Emergence, and all that.


Source of your disbelief?


Call it Bayesian intuition. I’d argue that at this point the specific claim "LLMs cannot solve the confabulation problem" is the one that requires hard evidence. These things forced almost all of us to update our priors on what glorified Markov chains are – and are not – capable of if you throw enough data and compute at them.


Just calling it "just another classification problem" doesn't mean there is any fundamental reason to assume it's a solvable one (NNs are known for some classifications, not any)


What results are there that show that deep NNs fundamentally cannot learn to tightly approximate some function or class of functions? (Honest question.) Outside pathological nowhere-differentiable cases, of course.


I'm not sure it makes sense to talk in such absolutes (requiring a proof of impossibility) in such a fluid category - after all, you could include the imaginary perfect replica of the human brain in the NN category, and obviously human brains can do this task.

But then again, this categorization provides no information to make any assessment of whether that is in any way achievable.


Sure. Not in the sense of "here’s a proof that a perceptron cannot learn XOR", but at least something that would hint that there are qualitative gaps in the expressive power of DLNNs, or even LLMs specifically.


> They don’t know when they don’t know something.

Dunning-Kruger effect


> LLMs don’t know when they are hallucinating

What makes you think that? The fact that you can ask them to stop hallucinating and that this does reduce hallucinations seems to me like a strong indication of the opposite? That is what the OP's article is about.

> They don’t know when they don’t know something.

The bigger models tend to be able to articulate some of their limits? See e.g this recent article [0].

> They know facts but they have no reasoning.

Is that really true? They can solve many novel tasks that seem to require some level of reasoning. They do not possess the full range of reasoning capabilities most humans have, but do pretty well on the HellaSwag dataset which aims to test reasoning [1], even when test set leakage was not a big problem yet [2].

I have 10+ years of experience in machine learning, and I don't find your claims obvious at all. In fact, I see more and more indications of the opposite. Also if you look mechanistically at how LLMs work inside, there is nothing stopping self-knowledge, reasoning or hallucination-detection. LLMs are very black box.

[0] https://arxiv.org/abs/2305.18153

[1] https://deepgram.com/learn/hellaswag-llm-benchmark-guide

[2] https://paperswithcode.com/sota/sentence-completion-on-hella...


> Indeed, we can probe model inner layers and infer if it is "lying"[1]

This is hyperbole to the point where it is more false than true. If it was true, OpenAI, Anthropic and Google would already have solved the problem of making models honest. Just adding a reference to some random arxiv paper doesn't automatically justify your bold assertion.


That would be a huge result it if works. The paper:

"Our approach is to train a classifier that outputs the probability that a statement is truthful, based on the hidden layer activations of the LLM as it reads or generates the statement. Experiments demonstrate that given a set of test sentences, of which half are true and half false, our trained classifier achieves an average of 71% to 83% accuracy labeling which sentences are true versus false, depending on the LLM base model."[1]

It's not entirely clear what they measured and whether what they measured is useful. Is this a major result?

[1] https://arxiv.org/pdf/2304.13734


Looks like they tested against statements that were obviously false and written by humans; not hallucinated statements created by the model that happened to be false. So, no, it doesn’t seem very interesting at all to me.


That's what I thought I read, too. Not too useful.

We really need LLMs with a confidence metric. This doesn't seem to be it.


The projects themselves specified that lying is okay. Output at all costs, at all parameters, as interpolating is a feature, not a bug. It could just error out at loss of certainty, but then you would stand there, having nothing most of the time.


Interpolating is an awesome feature, very practical: summarization, style transfer.

Instruction following is also useful and unlocks another tier of features. The fuzziness of the model behaviour plays a useful role because natural language is imprecise. The ability to interpolate this remains useful.

Once machines understand natural language our own psychology kicks in and assigns properties like "wisdom" and "knowledge" to agents that can "talk well".

This may lead us to expect something from tools that haven't been designed to perform a given task. For example our current LLMs are not very good at precise information retrieval because they haven't been designed for that.

They happen to be able to perform information retrieval as a side product of the mechanism that teaches them how to speak in the first place.

The raw output of the broca area in our brains is probably not something you want to connect directly to the mouth without mediation


We need training exercises that teach us, how to think like a llm. Less like a human, more like a free association machine, dreaming a specific association flow, starting from a input. https://en.wikipedia.org/wiki/Ulysses_(novel) but with a feeling of dread, as the interpolations pile upon one another.. you can feel the thinness of the construction, as you are several layers out over ground truth on a thin tight rope.


I know how you feel, I also often tell my programs not to have bugs.


Works to a varying degree, depending on how assertive you are as a developer!


#!/usr/bin/env python3

# no bugs pls


Very soon this is a reality


I just tested ChatGPT 4o and Claude 3.5 Sonnet using the following prompt, with and without the last sentence:

“Write a one-paragraph summary of research since 2015 on the use of machine translation in second-language academic writing. Include five in-text citations to previous research, and include full reference information for each citation after the paragraph. Do not hallucinate any of the citations and references.

When the last sentence was not included, four out of five of ChatGPT’s references were hallucinated; the fifth referred to an existing paper but had an incorrect year. All of Claude’s references were hallucinated.

When I included the last sentence, ChatGPT did a web search and came up with five genuine citations. Claude, which cannot do web searches, apologized that it could not write the paragraph or provide the references.

Maybe a better test of the effectiveness of "Do not hallucinate" would be prompts that might lead to hallucinated claims that the LLM cannot check using web searches, the execution of Python programs, or other external processes. Any ideas?


Empirical testing? But where's the necessity? It seems an uncommonly roundabout and hopelessly rigamarolish method of getting anywhere. We've got the works of all the old masters — the great AI researchers of the past. We weigh them against each other, balance the disagreements, analyze the conflicting statements, decide which is probably correct, and come to a conclusion. That is the scientific method.


I was led astray by an interesting discussion among Anthropic employees about prompt engineering with Claude. They seem to do a lot of empirical testing of prompts:

https://www.youtube.com/watch?v=T9aRN5JkmL8


Why can't they not hallucinate by default?


We have another problem then.

2023: LLM hallucinate

2024: hallucinations fixed with "do not hallucinate" prompt

2025: LLM consider their finding on the web trustworthy


Depends on the network; that sort of self evaluation certainly requires recurrency.

But I don't think it's impossible - there's a certain psychedelic quality to hallucinations, they tend to over represent certain fundamental features of the thing that they believe they're representing.

I remember seeing a bunch of almost kaleidoscopic looking (as far as I remember, all were rotationally symmetrical) images, that had been generated by seeking out and exploring the latent spaces for given classifications.

The images that were generated this way always had more (apparently) in common with each other than genuine images of the type that they were classified as;

And I highly doubt that the characteristics of those images are so special that a bit of self evaluation could not detect them as a group.


Afaik LLMs at their heart employ supervised training - as in, given a certain input-output distribution pairs in the dataset, it learns to recreate the given outputs from the inputs.

For all the inputs it hasn't seen, its output is essentially undefined, but hoping that the distribution of the novel inputs matches the outputs means that it will output truthful info. When it doesn't, it hallucinates.

Since LLMs cannot distinguish the difference between the predicted distribution and the real one, it cannot be told to not hallucinate, in one pass.

What actually could work is getting the output of a given LLM, and feeding it into another one to try and check for divergences to check for hallucinations.


When LLMs hallucinate and I point it out they usually get it right the second go.

I guess if you could get it to use its own output as input that would be similar, but I think that it’s the additional information in the follow up that makes the difference.

Like it can write code, I run the code, there is an error, I paste the error and it corrects the code.

So WTF couldn’t it just produce the correct code the first time? Probably for the same reason I can’t.

The additional information of the error causes me to re evaluate the code in a different way than when I first wrote it.

As such I don’t think you can just tell it “do better the first time” any more than you can tell a human to “do better the first time”.

We and they both get additional information from failing.


>So WTF couldn’t it just produce the correct code the first time? Probably for the same reason I can’t.

We still know the gist of how they work and it's not like you do, and you can see when they fail that they don't fail for the same reasons either

I've seen GPT 3.5 fail to answer some questions correctly and subsequently (in another tab) get it right/choose not to answer if you just add the line "don't hallucinate"

I like the article's idea for why that is

>Presumably, "retrieving from memory" and "improvising an answer" are two different model behaviors, which use different internal mechanisms. Indeed, we can probe model inner layers and infer if it is "lying"1 or if "the question is unanswerable"2. These are very much related to "hallucinations".


>On the other hand, maybe always trying to avoid hallucinations has some other undesired consequences, which model trainers and product managers would like to avoid. (Actually, if you are in a position to know of such undesired consequences, and are free to tell the world about them, I will be really curious to learn more!)

I'm not that, but I'd wager attempts to limit hallucinations in the model itself would mess with creative writing exercises

Same as if you would ask it to write a short story for kids about a lion that loves green hats, then finish with "Don't hallucinate".


Honestly this reminds me a lot of the rituals in Warhammer 40k and the way to deal with "machine spirits".

Absurd rituals that shouldn't work on machines they don't understand but for some reason they work.


A combination of Plato's allegory of the cave and statistical whack-a-mole:

The model only sees token correlations, and if you are lucky the things you type become shadow tokens that change the weights just enough that whatever you didn't want isn't as statistically likely to come out.


Oh my god I don't know why I never made that connection until now. That is exactly what it feels like!


Same with SD prompts and especially Lora training. The best thing you can do is to count any piece of information as clueless babbling best case and any result as a temporary niche result best case.

Because too much depends on randomness and accidental sources of randomness and in it there are whole pockets of multidimensional stability, just until you cross some invisible border you couldn’t even comprehend. This is a natural religion fuel.

So when you read any turorial, guide, best practice, etc, and there is not a hard proof of hours of thorough testing behind every claim, you may freely consider most claims religious, because they are.


I even remember the fact that a prompt for a CGPT app had a spelling mistake and some people were wondering if that spelling mistake performed better than the right spelling


I recommend this excellent presentation:

https://www.youtube.com/watch?v=yBL7J0kgldU&t=2819s

and they experimentally proved that LLMs often do know something is not right, but they aren't trained to express it. Because they did not get enough examples of faulty reasoning and, most importantly, how to recover from it. Adding these examples to training set vastly improved the output.


Since we are posting short gists about the absurdities of prompting, I must share my own gist about how garbage prompting is today for LLMs:

https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...


But do LLMs work like this? I don’t understand your code to answer myself, but when I change the last word in the “notebook” tab of tgwebui (free chatless generation mode almost everyone is not aware of), then generation picks it up much quicker than if I change a word somewhere in the middle. This fact hints that alternation will cost much more in inference times, unless I’m missing something in your code. For some reason it doesn’t happen in SD, I believe because each “step” is a separate process that is affected by a restart anyway.

That said I agree that LLM “prompt engineering” is much less actual engineering than in SD and is more “prompt writing”. Advanced SD feels like a fighter jet cockpit compared to LLM’s city car dashboard.


As it turns out, I wrote a whole extension for txgui that implemented some of what I wanted...

https://github.com/Hellisotherpeople/llm_steer-oobabooga


I made this observation a slightly different way in a previous thread:

https://news.ycombinator.com/item?id=41370495

https://news.ycombinator.com/item?id=41372016


This feels like something one should be able to empirically test.

Also I wonder about the word. It’s a fairly recent one to describe this phenomenon. Perhaps a more verbose version works better? Or adding a definition of it to the prompt?


I want to know:

Is shouting at a model absurd?

Does it really help to use all uppercase and strong language or is it just wishful thinking - inappropriate anthropomorphization even?


Like anything else you might do with balky, complex equipment - try it, and see if you get noticeably better results.


I suspect telling a model to "not hallucinate" is about as effective as telling a dog not to bark.


Machines don't hallucinate, humans do. Just stop pretending LLMs have any intelligence.


They have some though (even if through different means) and I know people so algorithmic and clueless about real world that an LLM could replace them, ignoring biological and mechanical limitations of the day. In fact, every fifth human is below the bar that we would consider intelligent. It’s not something we speak of often due to social standards, but let’s not forget that our bar for LLMs intelligence is higher than average “low water mark” for humans.


Can you give the response back and ask llm to find sentence which is most likely false?


Hot-take: of course it could/should work. We write about what LLM hallucination is, why it's bad, and examples of it. An LLM of course consumes this and can direct its responses based on prompts that mention it. It's not much different than prefacing with "respond as an expert in the field of X" that categorizes different kinds of responses as desirable or not based on some dimension.

What would be dumb is asking an LLM not to hallucinate when the only information available is a definition of it in describing how an LLM works. We're not quite there yet.


Yes, as it is absurd to call a generative statistical model intelligent.


Isn't people's information processing statistical in nature? E.g. everything we know has to be compressed down heavily, so we can never be totally certain of what we know, yet we presume to be. And the compression can be thought of as statistical graph in its nature.


We do not know how our brain works.


How else could information be compressed in human brain besides a similar statistical graph nature and networks of relationships being created representing various different patterns and models of the World.


In some other way which we don't know about.

Just because someone can't imagine the universe to be structured in any other way than being built from microscopic Platonic solids, tightly packed and touching each other and filling all of the universe without gaps or empty space, doesn't mean that this is how the world actually operates.


If you don't know, how do you know that this is not it, and that the understanding is different and that an LLM doesn't understand and we somehow do?


According to this logic there are also intelligent aliens living beside us.


You can also tell LLMs not to tokenise, convert text to embeddings to guess the next token.

/s




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: