Yes but i see it as multiple steps. Like perhaps the llm solution has some probabilistic issues that only get you 80% of the way there. But that probably already has given you some ideas how to better solve the problem. And this case the problem is somewhat intractable because of the size and complexity of the way the data is stored. So like in my example the first step is LLMs but the second step would be to use what they do as structure for building a deterministic pipeline. This is because the problem isn’t that there are ten thousand different meta data, but that the structure of those metadata are diffuse. The llm solution will first help identify the main points of what needs to be conformed to the monolithic schema. Then I will build more production ready and deterministic pipelines. At least that is the plan. I’ll write a substack about it eventually if this plan works haha.