The word "training" implies creating a new model by fine-tuning an existing model on top of new documents.
As several other comments in this thread have already indicated: this is almost always the wrong direction. Which is confusing because it's the direction everyone always assumes they should go in at first.
The approaches that does work is surprisingly simple: take the user's question, search for snippets of your documents that appear to be about that question, then paste all of those snippets into the prompt along with the user's question and see what answer you get.
This is known as RAG: Retrieval Augmented Generation. It's a very powerful approach.
> take the user's question, search for snippets of your documents that appear to be about that question, then paste all of those snippets into the prompt along with the user's question and see what answer you get.
We use RAG at my job, but we don’t do any preprocessing on the message from the user, so the results are not always great for us.
Do any of you have experience using a small local model just for extracting keywords from messages which you then use for the retrieval? And then feed the search result and your prompt into OpenAI or whatever as normal.
I've been trying out an interesting embedding model that knows how to treat text as a question be as a phrase about the world, and embeds the question such that it's likely to end up close to phrases that might answer that question: https://til.simonwillison.net/llms/embed-paragraphs
Embedding and chunking large amounts of documents is expensive though, in both compute and storage.
The other trick I've been planning to explore is using an LLM to turn the user's question into a small number of normal FTS search queries and then run those to try and get context data.
> The other trick I've been planning to explore is using an LLM to turn the user's question into a small number of normal FTS search queries and then run those to try and get context data.
I have also been working on this. I still fail to see why this approach isn't the default frankly. There's little benefit to vector databases.
Also maybe try to include tags or categories when you index and then you can filter on those when doing the vector search. Might get a similar effect from BM25.
Also llamaindex does RAG better than some other solutions.
how do RAG implementations work with generic prompts vs specific prompts? meaning, there are prompts that could easily be answered by the base model itself and doesn't require RAG. but some prompts might involve questions about something proprietary where RAG is actually useful.
so is the default to just run the RAG search index on every prompt and if it returns nothing then you get the plain answer from the base model otherwise you get the augmented answer?
As several other comments in this thread have already indicated: this is almost always the wrong direction. Which is confusing because it's the direction everyone always assumes they should go in at first.
The approaches that does work is surprisingly simple: take the user's question, search for snippets of your documents that appear to be about that question, then paste all of those snippets into the prompt along with the user's question and see what answer you get.
This is known as RAG: Retrieval Augmented Generation. It's a very powerful approach.