It is worth splitting out the stacked memory silicon layers on both too (if Cere...

cma 81 days ago | parent | context | favorite | on: Llama 3.1 405B now runs at 969 tokens/s on Cerebra...

It is worth splitting out the stacked memory silicon layers on both too (if Cerebras is set up with external DRAM memory). HBM is over 10 layers now so the die area is a good bit more than the chip area, but different process nodes are involved.