Hacker News new | past | comments | ask | show | jobs | submit login

They are doing it with custom silicon with several times more area than 8x H100s. Iā€™m sure they are doing some sort of optimization at execution/runtime, but the primary difference is the sheer transistor count.

https://cerebras.ai/product-chip/




To be specific, a single WSE-3 has the same die area as about 57 H100s. It's a big chip.


It is worth splitting out the stacked memory silicon layers on both too (if Cerebras is set up with external DRAM memory). HBM is over 10 layers now so the die area is a good bit more than the chip area, but different process nodes are involved.


Amazing!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: