
How to isolate an algorithm with CUDA - zeryx
http://blog.zeryx.me/cuda/2015/09/15/dynamic-kernel-instructions.html
======
wyldfire
if(commandQueue[itr].first().def== typeHidden ... else
if(commandQueue[itr].first().def == typeMemGateIn)

I don't know if it's still the case but in the past CUDA/OCL kernels would do
all of the execution work for each path in the CFG and only write the results
for the actual path to global memory.

~~~
zeryx
wyldfire: good point I should have brought that up! For my use case (neural
network design) there was no divergence between threads so each kernel ran
exactly the same path (the for loop is unrolled at runtime by the compiler)
but if my if/else if block had divergent paths you would be correct.

