But I can say for sure that Intel iGPUs, AMD GCN, AMD RDNA, AMD CDNA, and multiple NVidia-generations of GPUs all have hyperthread-like rescheduling of independent workgroups.
In fact, something like 8x wavefronts / warps run in parallel on modern GPUs. When one wavefront / warp stalls due to a memory read/write (or a PCIe read/write), the GPUs universally "hyperthread-out" and hide the latency.
Its "different" from how CPUs do it, but the fundamental principals are the same. (CPUs have a redundant set of registers tracked in a register file. GPUs on the other hand, have a set of registers and the kernel-scheduler (or whatever handles CUDAstreams) carefully assigns those registers to not conflict with any running wavefronts).
The statement so listed is blatantly false, at least for Intel, AMD, and NVidia GPUs. Maybe Apple iGPUs are built different, but I find that unlikely.
> The statement so listed is blatantly false, at least for Intel, AMD, and NVidia GPUs. Maybe Apple iGPUs are built different, but I find that unlikely.
The statement is for Apple GPUs only, that’s the whole point. Software can be easily ported to Metal (in a weekend according to Roblox devs) but until it’s optimised for TBDR it will underperform.
But I can say for sure that Intel iGPUs, AMD GCN, AMD RDNA, AMD CDNA, and multiple NVidia-generations of GPUs all have hyperthread-like rescheduling of independent workgroups.
In fact, something like 8x wavefronts / warps run in parallel on modern GPUs. When one wavefront / warp stalls due to a memory read/write (or a PCIe read/write), the GPUs universally "hyperthread-out" and hide the latency.
Its "different" from how CPUs do it, but the fundamental principals are the same. (CPUs have a redundant set of registers tracked in a register file. GPUs on the other hand, have a set of registers and the kernel-scheduler (or whatever handles CUDAstreams) carefully assigns those registers to not conflict with any running wavefronts).
The statement so listed is blatantly false, at least for Intel, AMD, and NVidia GPUs. Maybe Apple iGPUs are built different, but I find that unlikely.