stupid question: can a LLM (i.e neural network) tell me the probability of the answer it just spew? i.e. turn into fuzzy logic? Aaand, can it tell me how much it does believe itself? i.e. what's the probability that above probability is correct? i.e. confidence i.e. intuitionisticaly fuzzy logic?
Long time ago at uni we studied these things for a while.. and even made a Prolog interpreter having both F+IF (probability + confidence) coefficients for each and every term..
Not out of the box I think; I wouldn’t trust any self-assesment like that. With enough compute, you could probably come up with a metric by doing a beam search and using an LLM to evaluate how many of the resultant answers were effectively the same as a proxy for “confidence”.
Similar to bootstrapping a random variable in statistics. Your N estimates (each estimate is derived from a subset of the sample data) give you an estimate of the distribution of the random variable. If the variance of that distribution is small (relative to the magnitude of the point estimate) then you have high confidence that your point estimate is close to the true value.
Likewise in your metric, if all answers are the same despite perturbations then it's more likely to be ... true?
I'd really like to see a plot of your metric versus the SimpleQA hallucation benchmark that OpenAI uses.
The way I understand it, an LLM response is a chain of tokens where each is the most probable token. Maybe there exists more complicated candidate and selection approaches than that, but biggest number works for me. For the sake of simplicity, let's just say tokens are words. You'd have access to the probability of each word in the ordering of the sentence, but I'm not sure how that would then be used to evaluate to the probability of the sentence itself or its truthiness.
You can say "give me a % chance you think this thing will happen and why" and it will spit out a lot of context behind it's thinking, I'm not a math guy and I'm aware "probability" has some more complex math stuff in it, but just from a "why do you believe this so strongly?" perspective, I've personally found it's able to have me agree or disagree fairly. You can then give it additional context you know about, and it will refine it's estimation. Basically I've started treating them like context connection systems, just to look to see if dots even possibly could connect before I do the connecting myself.
Suitably modified, they can. Bayesian neural networks provide uncertainty quantification. The challenge is calibrating the predictions, and deciding whether devoting model capacity to uncertainty quantification would not be better spent on a bigger, uncertain model.
According to the following paper, it's possible to get calibrated confidence scores by directly asking the LLM to verbalize a confidence level, but it strongly depends on how you prompt it to do so:
Maybe stupid answer, but I’ve read a few older papers that used ensembles to identify when a prediction is out of distribution. Not sure what SotA approach is though.
Long time ago at uni we studied these things for a while.. and even made a Prolog interpreter having both F+IF (probability + confidence) coefficients for each and every term..