The bandwidth and latency is as much a bottleneck as the capacity is. That's why...

The bandwidth and latency is as much a bottleneck as the capacity is. That's why recent ML and GPU chips have moved to on-package "high-bandwidth memory." Even if you were to add more high-bandwidth memory off-package (via lots of parallel DIMMs, say), it's difficult to actually manage this memory well, because of a host of competing factors, including a) on-chip resources (controllers etc) that must consume area and power to manage the new memory, b) increased design cost for validating handling the extra (crappier than HBM) memory, c) software not knowing how to hide the (much higher) latency of this memory - deep learning isn't as good at memory hierarchies as CPUs are, d) a chip that has spare on-chip bus bandwidth for this kind of thing has wasted bus bandwidth. Et cetera.

So if you are building GPUs or AI accelerators, you tend to just go ahead and build this in.