Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
cma
81 days ago
|
parent
|
context
|
favorite
| on:
Llama 3.1 405B now runs at 969 tokens/s on Cerebra...
It is worth splitting out the stacked memory silicon layers on both too (if Cerebras is set up with external DRAM memory). HBM is over 10 layers now so the die area is a good bit more than the chip area, but different process nodes are involved.
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: