I think anything requiring strong reasoning will probably have issues. However, ...

I think anything requiring strong reasoning will probably have issues. However, I think most Enterprises is only interested in knowing that the summary of a document doesn't contain hallucinations, which I think most models will probably get right. If you go by a super majority rule and use 5 models, I think most business will be satisfied that the summary that it was given doesn't contain hallucinations.

However, like you said, we are dealing with a non-deterministic system so the best we can hope for is a statistically likely answer.