Hacker News new | past | comments | ask | show | jobs | submit login

I think quantization (e.g. 4-bit, https://arxiv.org/abs/2212.09720) and sparsity (e.g. SparseGPT, https://arxiv.org/abs/2301.00774) will bring down inference cost.

Edit: This isn’t handwaving btw, this is to say some fairly decent solutions are available now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: