Ask HN: Lessons from Building a Fortune 500 RAG Chatbot (50M Records in 10–30s)

tntpreneur · 2025-03-21T08:57:32 1742547452

I really love to hear about details. I have plan to build a RAG based on regulations. It is very hard because source are different and reading and interpreting legal documents completely different area of expertise. I can't answer some questions? - How can I start small in very specific area? - How can I grow it? - How to validate and measure success of the RAG solution? - How to feed with data continuesly?

tylersuard · 2025-03-21T18:30:01 1742581801

How to start small in a very specific area: 1. Write down a list of 10 to 20 questions that you want your RAG to be able to answer. Then write down the correct answers next to each question. If you don't know the correct answers, ask a subject matter expert. Build a RAG chatbot that can answer those questions first. 2. You have to start out using tools that are capable of growing with you. For example, don't use ChromaDB because it can only fit in memory. Use services that can scale up. 3. I cover this in my book, you can use the 10 to 20 questions as integration tests and run them every time you commit code to make sure everything still works. 4. How to feed with data continuously... do you mean keeping your databases updated? You can write a script to check for regulations published in the last week, and only upload those regulations to your database.

I hope all that helps, let me know if you have any other questions!

romanhn · 2025-03-20T21:15:42 1742505342

Focusing on the RA part of the RAG, which techniques or tools would you say contributed the most to the quality of the results? What sort of tradeoffs did you have to make?

tylersuard · 2025-03-21T18:32:27 1742581947

EXCELLENT question. We tried about 5 different ways of retrieving data, and we found that what works best for us is a search-as-a-service. We use Azure AI Search, but there are lots of other ones out there, including Google Vertex AI Search, Algolia, Amazon CloudSearch, or ElasticSearch.

What tradeoffs? It is fast and accurate, but it does get expensive when you have over 50 million records.

vlit20 · 2025-03-21T07:50:49 1742543449

How did you measure success of the RAG solution beyond five-star user approvals? Are there any critical metrics that determine success or failure?

tylersuard · 2025-03-21T18:34:47 1742582087

We have tests in place to make sure every function, every search is working, otherwise things won't deploy. We do have a dashboard with usage etc., but I have not spent much time looking at it. I think we just count broad usage across the company as a success. And yes we consider one-stars and complaints to be a failure, but we get those less often, maybe once every 2 weeks, and it is almost always the user's fault. Does that answer your question?

zergnick · 2025-03-21T01:26:35 1742520395

When you are dealing with documents with different structures, how to do the document chunking efficiently without losing important metadata?

tylersuard · 2025-03-21T18:37:13 1742582233

First of all, great question.

Second, we use a search service, and vectors are treated as supplementary to the text search, so chunking doesn't matter as much. We will usually take an entire PDF page and embed that, no matter what structure the data on that page is. We do keep track of the name of the document and the page number. For SQL records, we just turn each record into a text string and embed that.

zergnick · 2025-03-24T02:16:17 1742782577

Thanks for your feedback! Could you share a bit about your team? I’m curious how many people are involved and what kinds of skills or roles are needed to make this happen.

karanveer · 2025-03-20T08:59:35 1742461175

do share the db you used for starters and the overall stack like MERN or MEAN or Firestore BAAS or Supabase or something extremely different..

tylersuard · 2025-03-21T18:40:58 1742582458

Ok so the company has like 20 databases, plus over 100,000 pages of PDF catalogs. We tried using agents to query the company's SQL databases, but that took 30 seconds each call, and that is unacceptable because we wanted to return an answer to the user in 10 to 30 seconds. So what we ended up doing is, we created an Azure AI Search service and we made a different search index (like a collection) for each data source, one for each database and one for our repository of 100,000 pdf pages.

Our stack was just Python, Autogen for the agents, and as I mentioned Azure AI Search. We use Azure Web Apps for the backend, and OpenAI models for the generation. Great questions!

decide1000 · 2025-03-20T11:08:21 1742468901

27th marked! What infrastructure did you use?

tylersuard · 2025-03-21T18:44:14 1742582654

Great, thank you!

The main program is hosted on Azure Web Apps, the search is Azure AI Search, we use AutoGen for the agents, and we use OpenAI for the generation. Azure has a lot of tools that support AI and search, so we use those too.

__m · 2025-03-20T06:08:07 1742450887

the title will be "Lessons from Building a Fortune 500 RAG Chatbot"?

tylersuard · 2025-03-21T18:44:42 1742582682

Ah, sorry, I forgot to mention it. The title will be "Enterprise RAG: Scaling Retrieval Augmented Generation"