Hacker News new | past | comments | ask | show | jobs | submit login
How to isolate an algorithm with CUDA (zeryx.me)
2 points by zeryx on Sept 16, 2015 | hide | past | favorite | 2 comments



if(commandQueue[itr].first().def== typeHidden ... else if(commandQueue[itr].first().def == typeMemGateIn)

I don't know if it's still the case but in the past CUDA/OCL kernels would do all of the execution work for each path in the CFG and only write the results for the actual path to global memory.


wyldfire: good point I should have brought that up! For my use case (neural network design) there was no divergence between threads so each kernel ran exactly the same path (the for loop is unrolled at runtime by the compiler) but if my if/else if block had divergent paths you would be correct.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: