TPU, LPU, and other are custom ASICs already exist. I also assume Nvidia's AI focused offerings have some optimizations that make them better at the usage patterns and avoid spending silicon on the typical graphics rendering pipeline.
My understanding is the biggest issue is keeping the model fed with data, especially during training across multiple GPUs/VMs, which is required for the bigger models.
My understanding is the biggest issue is keeping the model fed with data, especially during training across multiple GPUs/VMs, which is required for the bigger models.