Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
zhisbug's submissions
login
1.
Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x
(
hao-ai-lab.github.io
)
461 points
by
zhisbug
34 days ago
|
past
|
98 comments
2.
Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation
(
hao-ai-lab.github.io
)
5 points
by
zhisbug
85 days ago
|
past
|
1 comment
3.
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
(
lmsys.org
)
17 points
by
zhisbug
6 months ago
|
past
|
2 comments
4.
Important and *MUST-KNOW* techniques for a 2023 LLM serving system
(
twitter.com/haozhangml
)
1 point
by
zhisbug
9 months ago
|
past
5.
Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready
(
twitter.com/lmsysorg
)
7 points
by
zhisbug
on April 28, 2023
|
past
|
1 comment
6.
Alpa: Auto-parallelizing large model training and inference (by UC Berkeley)
(
github.com/alpa-projects
)
7 points
by
zhisbug
on June 23, 2022
|
past
|
1 comment
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: