Are they quietly compacting context to reduce kv cache usage, before the actual compaction? Like there’s a slider for how much to compress it, and that’s never revealed to us?
reply
Are they quietly compacting context to reduce kv cache usage, before the actual compaction? Like there’s a slider for how much to compress it, and that’s never revealed to us?