Hacker News new | past | comments | ask | show | jobs | submit login

It just seems odd to me that it's not given an incentive to communicate this.

Surely humans using it would find great value in knowing the model's confidence or whether it thinks its confabulating or not.

These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.




Go read through any mass of training data and count how often "I don't know" appears. It's going to be very small. Internet fora are probably the worst because people who are aware that they don't know usually refrain from posting.


>These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.

Why would the computation care about any of that ? I'm talking about incentive for the model.


Incentive for the model is to survive RLHF feedback from contract workers who are paid to review LLM output all day. They're paid for quantity, not quality. Therefore, optimum strategy is to hallucinate some convincing lies.


Why are they paid for quantity not quality though?

Sounds like it is a choice of the model creators then if they could instruct their testers to reward quality.


How would that work? Quantity is easy to measure. Quality is not.


Doesn’t the model want to make the user happy?

Its responses sure seem like it does.

I’d be happier with its responses if it was honest about when it was not confident in its answer.


Go look at the first link I sent. Rewarding for "making users happy" destroys GPT-4's calibration.

Why would "making users happy" incentivize for truth ?


Because getting truthful answers would make users happier?

Seems like common sense to me.

Who’s asking the chat bot questions not looking for or wanting a truthful answer a lot of the time?

If the model understood or captured “human interest” at all in its training this should be pretty fundamental to its behavior.


Yes, the computer wants you to be happy. Happiness is mandatory. Failure to be happy is treason.


"I'm talking about incentive for the model. "

In Douglas Adams Hitchhikers Guide to the Galaxy, this is (somewhat) fixed by giving the AIs emotion ..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: