There was a 2020 paper called Retrieval Augmented Generation that describes this technique too [1]. That paper used BERT embeddings which are a little cheaper and uses BART as a generator.
One of the common failure modes of RAG (and I assume your technique as well) is hallucination: basically, making stuff up that isn't in any of the docs [2].
One of the common failure modes of RAG (and I assume your technique as well) is hallucination: basically, making stuff up that isn't in any of the docs [2].
[1] https://arxiv.org/abs/2005.11401
[2] https://parl.ai/projects/hallucination/