It just seems odd to me that it's not given an incentive to communicate this.
Surely humans using it would find great value in knowing the model's confidence or whether it thinks its confabulating or not.
These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.
Go read through any mass of training data and count how often "I don't know" appears. It's going to be very small. Internet fora are probably the worst because people who are aware that they don't know usually refrain from posting.
>These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.
Why would the computation care about any of that ? I'm talking about incentive for the model.
Incentive for the model is to survive RLHF feedback from contract workers who are paid to review LLM output all day. They're paid for quantity, not quality. Therefore, optimum strategy is to hallucinate some convincing lies.
Surely humans using it would find great value in knowing the model's confidence or whether it thinks its confabulating or not.
These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.