Hacker News new | past | comments | ask | show | jobs | submit | zhisbug's submissions login
1. Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)
461 points by zhisbug 34 days ago | past | 98 comments
2. Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io)
5 points by zhisbug 85 days ago | past | 1 comment
3. Break the Sequential Dependency of LLM Inference Using Lookahead Decoding (lmsys.org)
17 points by zhisbug 6 months ago | past | 2 comments
4. Important and *MUST-KNOW* techniques for a 2023 LLM serving system (twitter.com/haozhangml)
1 point by zhisbug 9 months ago | past
5. Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready (twitter.com/lmsysorg)
7 points by zhisbug on April 28, 2023 | past | 1 comment
6. Alpa: Auto-parallelizing large model training and inference (by UC Berkeley) (github.com/alpa-projects)
7 points by zhisbug on June 23, 2022 | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: