RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

joshcho · 2024-04-30T13:38:25

Worth a comparison with RAPTOR, another tiered RAG system.

unraveller · 2024-04-30T13:54:13

cool but deep integration requires the LLM provider to run it and get privacy right, that'll be $premium plus tip.

>RAGCache reduces the time to first token (TTFT) by up to 4x and improves the throughput by up to 2.1x compared to vLLM integrated with Faiss.

Are RAG systems really so heavy they add 4x to the TTFT?

ComputerGuru · 2024-04-30T15:07:40

> Are RAG systems really so heavy they add 4x to the TTFT?

Only the pure python + unoptimized orm ones. So only most of them.