https://arxiv.org/abs/2401.18059
>RAGCache reduces the time to first token (TTFT) by up to 4x and improves the throughput by up to 2.1x compared to vLLM integrated with Faiss.
Are RAG systems really so heavy they add 4x to the TTFT?
Only the pure python + unoptimized orm ones. So only most of them.
https://arxiv.org/abs/2401.18059