Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Then the part that matters: where the KV lives

When your abstract was clearly generated by an LLM and not curated to at least make it sound human, it does not make me want to read your paper.

 help



especially because this is the most painfully glaring flaw in their plan. Their solution is for an inference provider to... store the KV cache (which they can compute!) on-premise, on their own disks, but pay some third party for it?

Well, it’s one flaw. I would argue that the bigger flaw, which you alluded to, is that the cost of computing the cache yourself maxes out in the single digit dollars even very large frontier models, and that’s a one-time cost. Even if you imagine all the logistics are free and all the transfers are instant, what are we even talking about here from an economic perspective?

KV caching is a super interesting engineering space, especially when you’re talking about local models where compute and memory bandwidth are highly constrained and you’re trying to trim fractions of a second everywhere you can by flipping between different ICL prefixes. But selling caches for specific documents just makes no sense at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: