Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
KVarN: Native vLLM backend for KV-cache quantization by Huawei (github.com/huawei-csl)
130 points by theanonymousone 16 hours ago | past | 13 comments
Sinkhorn: Make LLMs even smaller through quantisation while maintaining accuracy (github.com/huawei-csl)
4 points by ilitirit 8 months ago | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: