Hacker News new | past | comments | ask | show | jobs | submit login

240 tokens for 70b requires 16.8 * (bytes per parameter) TB memory bandwidth. So unless it's like 4 bit quantized, it doesn't sound plausible?

In the same spirit, llms are memory-bound, so what possible hardware advantage can chip firm have? Buying faster memory?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: