> Then the part that matters: *where the KV lives* When your abstract was clearl...

numeri · 2026-06-12T22:47:14 1781304434

especially because this is the most painfully glaring flaw in their plan. Their solution is for an inference provider to... store the KV cache (which they can compute!) on-premise, on their own disks, but pay some third party for it?

mistercow · 2026-06-12T23:28:26 1781306906

Well, it’s one flaw. I would argue that the bigger flaw, which you alluded to, is that the cost of computing the cache yourself maxes out in the single digit dollars even very large frontier models, and that’s a one-time cost. Even if you imagine all the logistics are free and all the transfers are instant, what are we even talking about here from an economic perspective?

KV caching is a super interesting engineering space, especially when you’re talking about local models where compute and memory bandwidth are highly constrained and you’re trying to trim fractions of a second everywhere you can by flipping between different ICL prefixes. But selling caches for specific documents just makes no sense at all.