Hacker News new | past | comments | ask | show | jobs | submit login
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation (arxiv.org)
33 points by PaulHoule 17 days ago | hide | past | favorite | 3 comments



Worth a comparison with RAPTOR, another tiered RAG system.

https://arxiv.org/abs/2401.18059


cool but deep integration requires the LLM provider to run it and get privacy right, that'll be $premium plus tip.

>RAGCache reduces the time to first token (TTFT) by up to 4x and improves the throughput by up to 2.1x compared to vLLM integrated with Faiss.

Are RAG systems really so heavy they add 4x to the TTFT?


> Are RAG systems really so heavy they add 4x to the TTFT?

Only the pure python + unoptimized orm ones. So only most of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: