All the "memory" systems/pipelines for LLMs seems to be using the exact same approach:
1. Chunk Docs + Embed
2. Store in a VectorDB
3. Query embeddings based on semantic relevance
In my work, this has consistently failed to get meaningful context for prompts. Who has seen a better way of handling this problem?
That being said, people are finding the basic steps you showed above sufficient. There are parameters you can change in these 3 steps. Have you tried changing any of those?
- how you chunk, chunk size and rules - how you embed, which model and size - how you query, the metric(s) used
2 is probably only important to quality of results in that it determines what is available for you to use in the other steps, notably the embedding comparison metric that really defines relevance