GPUs in particular have a very hyperthread/SMT like model where multiple true threads (aka instruction pointers) are juggled while waiting for RAM to respond.
Still, the intermediate organizational step where SIMD gives you a simpler form of parallelism is underrated and understudied IMO.