Hallucination is unfortunately inevitable when it comes to any autoregressive model, even with RAG. You can minimize hallucination by prompting, but you'll still see some factually incorrect responses here and there (https://zilliz.com/blog/ChatGPT-VectorDB-Prompt-as-code).
I unfortunately don't think we'll be able to solve hallucination anytime soon. Maybe with the successor to the transformer architecture?
Hallucination is naturally a concern for anyone looking to depend upon LLM-generated answers.
We’ve been testing LLM responses with a CLI, we’re using it to generate accuracy statistics, which is especially useful when the use-case Q/A is limited.
If ‘confidence’ can be returned to the user, then at least they can have an indication if there is a higher quality-risk with a given response.
I unfortunately don't think we'll be able to solve hallucination anytime soon. Maybe with the successor to the transformer architecture?