I’ve been working on something that provides document search for agents to call if they need the documents. Let me know if you are interested. It’s Open Source. For this many documents it will need some bucketing with semantic relationships, which I’ve been noodling on this last year. Still needs some tweaking for what you are doing, probably. Might get you further along if you are considering rolling your own…
Let me know if you want to go over the code or want to discuss what works and what doesn’t. We had a loop on the action/function call “pipeline” but I changed it to just test if there was a function call or not and then just keep calling.