But I can say for sure that Intel iGPUs, AMD GCN, AMD RDNA, AMD CDNA, and multiple NVidia-generations of GPUs all have hyperthread-like rescheduling of independent workgroups.
In fact, something like 8x wavefronts / warps run in parallel on modern GPUs. When one wavefront / warp stalls due to a memory read/write (or a PCIe read/write), the GPUs universally "hyperthread-out" and hide the latency.
Its "different" from how CPUs do it, but the fundamental principals are the same. (CPUs have a redundant set of registers tracked in a register file. GPUs on the other hand, have a set of registers and the kernel-scheduler (or whatever handles CUDAstreams) carefully assigns those registers to not conflict with any running wavefronts).
-------
The statement so listed is blatantly false, at least for Intel, AMD, and NVidia GPUs. Maybe Apple iGPUs are built different, but I find that unlikely.
> The statement so listed is blatantly false, at least for Intel, AMD, and NVidia GPUs. Maybe Apple iGPUs are built different, but I find that unlikely.
The statement is for Apple GPUs only, that’s the whole point. Software can be easily ported to Metal (in a weekend according to Roblox devs) but until it’s optimised for TBDR it will underperform.
But I can say for sure that Intel iGPUs, AMD GCN, AMD RDNA, AMD CDNA, and multiple NVidia-generations of GPUs all have hyperthread-like rescheduling of independent workgroups.
In fact, something like 8x wavefronts / warps run in parallel on modern GPUs. When one wavefront / warp stalls due to a memory read/write (or a PCIe read/write), the GPUs universally "hyperthread-out" and hide the latency.
Its "different" from how CPUs do it, but the fundamental principals are the same. (CPUs have a redundant set of registers tracked in a register file. GPUs on the other hand, have a set of registers and the kernel-scheduler (or whatever handles CUDAstreams) carefully assigns those registers to not conflict with any running wavefronts).
-------
The statement so listed is blatantly false, at least for Intel, AMD, and NVidia GPUs. Maybe Apple iGPUs are built different, but I find that unlikely.