Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
aurareturn
78 days ago
|
parent
|
context
|
favorite
| on:
Llama 3.1 405B now runs at 969 tokens/s on Cerebra...
Each Cerebras wafer scale chip has 44GB of SRAM. You need 972 GB of memory to run Llama 405b at fp16. So you need 22 of these.
I assume they're using SRAM only to achieve this speed and not HBM.
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
I assume they're using SRAM only to achieve this speed and not HBM.