I really love to hear about details. I have plan to build a RAG based on regulations. It is very hard because source are different and reading and interpreting legal documents completely different area of expertise. I can't answer some questions?
- How can I start small in very specific area?
- How can I grow it?
- How to validate and measure success of the RAG solution?
- How to feed with data continuesly?
How to start small in a very specific area:
1. Write down a list of 10 to 20 questions that you want your RAG to be able to answer. Then write down the correct answers next to each question. If you don't know the correct answers, ask a subject matter expert. Build a RAG chatbot that can answer those questions first.
2. You have to start out using tools that are capable of growing with you. For example, don't use ChromaDB because it can only fit in memory. Use services that can scale up.
3. I cover this in my book, you can use the 10 to 20 questions as integration tests and run them every time you commit code to make sure everything still works.
4. How to feed with data continuously... do you mean keeping your databases updated? You can write a script to check for regulations published in the last week, and only upload those regulations to your database.
I hope all that helps, let me know if you have any other questions!