I'm not sure I follow - 614 GB/sec is pretty squarely in dGPU territory (~5070 l...

bigyabai · 2026-06-26T00:27:11 1782433631

Competitive for 16-24GB dGPUs, but for 100gb+ inference workloads it's going to be a decode bottleneck. For smaller models it'd be fine, but the same goes for the smaller GPUs.

In particular though, the fatal bottleneck is the weakness of the iGPU. Filling a KV cache on a 100gb+ model could take a few minutes, or even hours if you're trying to restore a 256k-to-1m token session.