Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.

If the training data contained a bunch of answers to questions which were simply "I don't know", you could get an LLM to say "I don't know" but that's still not actually a concept of not knowing. That's just knowing that the answer to your question is "I don't know".

It's essentially like if you had an HTTP server that responded to requests for nonexistent documents with a "200 OK" containing "Not found". It's fundamentally missing the "404 Not found" concept.

LLMs just have a bunch of words--they don't understand what the words mean. There's no metacognition going on for it to think "I don't know" for it to even think you would want to know that.



>I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily.

I'm not sure if this objection is terribly helpful. We use terms like think and want to describe processes that are clearly not involve any form of understanding. Electrons do not have motivations but they 'want' to go to a lower energy level in an atom. You can hold down the trigger for the fridge light to make it 'think' that the door has not been opened. These are uncontentious phrases that convey useful ideas.

I understand that when people are working towards producing reasoning machines the words might be working in similar spaces, but really when someone is making claims about machines having awareness, understanding, or thinking they make it quite clear about the context that they are talking about.

As to the rest of your comment, I simply disagree. If you think of a concept of an internal representation of a piece of information, then it has been shown that they do have such representations. In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know. This generalised to provide "I don't know" for examples that were not in the training data. For the fine tuning examples to succeed in that, it requires the model to already contain the concept.

I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.

I think that the models can express details about words shows that they do have detailed information about what each word means semantically. In many respects because of tokenisation indexing embeddings it would perhaps be more accurate to say that they have a better understanding of the semantic information of what words mean the what the words actually are. This is why they are poor at spelling but can give you detailed information about the thing they can't spell.


> We use terms like think and want to describe processes that are clearly not involve any form of understanding.

...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.

> In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know.

This is why I included the HTTP example: this is simply telling it to parrot the phrase "I don't know"--it doesn't understand that it doesn't know. From the LLM's perpective, it "knows" that the answer is "I don't know". It's returning a 200 OK that says "I don't know" rather than returning a 404.

Do you understand the distinction I'm making here?

> I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.

The average (non-programmer) human, when asked to write a "Hello, world" program, can definitely say they don't know how to program. And unlike the LLM, the human knows that this is different from answering the question. The LLM, in contrast thinks it is answering the question when it says "I don't know"--it thinks "I don't know" is the correct answer.

Put another way, a human can distinguish between responses to these two questions, whereas an LLM can't:

1. What is my grandmother's maiden name?

2. What is the English translation of the Spanish phrase, "No sé."?

In the first question, you don't know the answer unless you are quite creepy; in the second case you do (or can find out easily). But the LLM tuned to answer I don't know thinks it knows the answer in both cases, and thinks the answer is the same.


>...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.

There is a difference between explanation by metaphor and lack of precision. If you think someone is implying something literal when they might be using a metaphor you can always ask for clarification. I know plenty of people that are utterly precise in their use in their language which leads them to being widely misunderstood because they think a weak precise signal is received as clearly as a strong imprecise signal. They usually think the failure in communication is in the recipient but in reality they are just accurately using the wrong protocol.

>Do you understand the distinction I'm making here? I believe I do, and it is precisely this distinction that the researches showed. By teaching a model to say "I don't know" for some information that they knew the model did not know the answer to, the model learned to respond "I don't know" for things that it did not know that it was not explicitly taught to respond with "I don't know". For it to acquire that ability to generalise to new cases the model has to have already had an internal representation of "That information is not available"

I'm not sure where you think a model converting its internal representation of not knowing something into words is distinct from a human converting its internal representation of not knowing into words.

When fine tuning directs a model to profess lack of knowledge, usually they will not give the same specific "I don't know" text as a way to express that it does not not know because they want the want to bind the concept "lack of knowledge" to the concept of "communicate that I do not know" rather than any particular word phrase. Giving it many ways to say "I don't know" builds that binding rather than the crude "if X then emit Y" that you imagine it to be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: