
SIMD, SIMT, SMT - parallelism in NVIDIA GPUs - PopaL
http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
======
zheng
A very good and reasonably approachable discussion of the pros and cons of how
CUDA programming actually is realized in hardware. The explanation of how the
GPU's handle context switching is particularly thoughtful and enlightening. It
took me a long time to figure this out a couple months ago, a guide like this
would have saved me a few nights.

I was surprised that the author didn't once use the term CUDA though, they
even discuss actual syntax from it, but don't mention the language (extension)
once.

~~~
DiabloD3
Probably because the hardware is language agnostic. You would consider how the
hardware works no matter if you were using GLSL (GL's shader language), HLSL
(D3D's shader language), OpenCL (Compute sister language to GL/GLSL),
DirectCompute (Compute sister language to D3D/HLSL), or CUDA.

~~~
PopaL
I really doubt you could use CUDA on AMD or Intel GPUs, after my knowledge
CUDA runs only on NVIDIA platforms.

------
radarsat1
Very nice article. In my limited experience with OpenCL programming, the most
difficult thing is understanding how memory access patterns affect
performance. It's not made easier by the fact that it may be different on
different platforms.

I wonder if what's needed is a higher-level representation that can compile to
the best access patterns for the given hardware. (And something that can try
several access patterns for your problem and choose the most efficient one.)
GPU programming is still quite new, so I guess it's bound to show up
eventually.

If it can't handle _all_ possible situations, such a tool would be still be
useful, even if you end up having to go down to the CUDA/OpenCL level for
certain problems that are too difficult to express declaratively.

