Each Cerebras wafer scale chip has 44GB of SRAM. You need 972 GB of memory to ru... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

aurareturn 78 days ago | parent | context | favorite | on: Llama 3.1 405B now runs at 969 tokens/s on Cerebra...

Each Cerebras wafer scale chip has 44GB of SRAM. You need 972 GB of memory to run Llama 405b at fp16. So you need 22 of these.

I assume they're using SRAM only to achieve this speed and not HBM.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact