The word "training" implies creating a new model by fine-tuning an existing mode...

codetrotter · on Dec 25, 2023

> take the user's question, search for snippets of your documents that appear to be about that question, then paste all of those snippets into the prompt along with the user's question and see what answer you get.

We use RAG at my job, but we don’t do any preprocessing on the message from the user, so the results are not always great for us.

Do any of you have experience using a small local model just for extracting keywords from messages which you then use for the retrieval? And then feed the search result and your prompt into OpenAI or whatever as normal.

simonw · on Dec 25, 2023

I've been trying out an interesting embedding model that knows how to treat text as a question be as a phrase about the world, and embeds the question such that it's likely to end up close to phrases that might answer that question: https://til.simonwillison.net/llms/embed-paragraphs

Embedding and chunking large amounts of documents is expensive though, in both compute and storage.

The other trick I've been planning to explore is using an LLM to turn the user's question into a small number of normal FTS search queries and then run those to try and get context data.

akrymski · on Dec 25, 2023

> The other trick I've been planning to explore is using an LLM to turn the user's question into a small number of normal FTS search queries and then run those to try and get context data.

I have also been working on this. I still fail to see why this approach isn't the default frankly. There's little benefit to vector databases.

ilaksh · on Dec 25, 2023

https://docs.llamaindex.ai/en/stable/examples/retrievers/bm2...

Also maybe try to include tags or categories when you index and then you can filter on those when doing the vector search. Might get a similar effect from BM25.

Also llamaindex does RAG better than some other solutions.

laurels-marts · on Dec 25, 2023

how do RAG implementations work with generic prompts vs specific prompts? meaning, there are prompts that could easily be answered by the base model itself and doesn't require RAG. but some prompts might involve questions about something proprietary where RAG is actually useful.

so is the default to just run the RAG search index on every prompt and if it returns nothing then you get the plain answer from the base model otherwise you get the augmented answer?