How to isolate an algorithm with CUDA

wyldfire · on Sept 16, 2015

if(commandQueue[itr].first().def== typeHidden ... else if(commandQueue[itr].first().def == typeMemGateIn)

I don't know if it's still the case but in the past CUDA/OCL kernels would do all of the execution work for each path in the CFG and only write the results for the actual path to global memory.

zeryx · on Sept 16, 2015

wyldfire: good point I should have brought that up! For my use case (neural network design) there was no divergence between threads so each kernel ran exactly the same path (the for loop is unrolled at runtime by the compiler) but if my if/else if block had divergent paths you would be correct.