> none of this process is needed to answer the actual question which the LLM would know from its training data.
I think this isn't true; even if the model has the answer stored implicitly in its weights, it has no way of "citing it's source" or demonstrating that the answer is correct.
This is a great example of something GPT-4 gets confidently wrong, today. I just ran this query:
Prompt: "The year is 894 AD. The capital of France is:
Response: "In 894 AD, the capital of France was Paris."
This is incorrect. According to Wikipedia, "In the 10th century Paris was a provincial cathedral city of little political or economic significance..."
The problem is that there's no good way to tell from this interaction whether it's true or false, because the mechanism that GPT-4 uses to return an answer is the same whether it's correct or incorrect.
Unless you already know the answer, the only way to be confident that a LLM is answering correctly is to use RAG to find a citation.
lol you just gamed it with an edge case where the most likely completion is incorrect and you're proving my point that the simple case doesn't need RAG but weird complex edge cases do.
What, because you want your encyclopedia to be confidently wrong on everything that -- unbeknownst to both you and your encyclopedia reader(s?) -- happens to be "a complicated edge case"? (And what isn't "a complicated edge case", in some way or other?)
I think this isn't true; even if the model has the answer stored implicitly in its weights, it has no way of "citing it's source" or demonstrating that the answer is correct.