
CUDA 7.5: FP16, GPU Lambdas, GPU Instruction-Level Profiling - bsprings
http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/
======
krapht
CUDA has been on my list of things to learn for quite some time. The ecosystem
is just so much nicer than OpenCL... if I didn't have to support non-Nvidia
GPUs, I'd have already gotten on the bandwagon.

~~~
dman
Ive been using OpenCL recently, and have had a pretty nice experience
generally. Are there any things in particular from the CUDA ecosystem that you
miss?

~~~
krapht
I don't know much about CUDA, just what I see from my co-workers who do get to
use it. The Visual Studio integration looks amazing. The C++ api is cleaner. I
am told there are lots of little quality-of-life API things that are better
from people who have used both.

------
varelse
Lots of good stuff, and some not so good stuff:

"Deprecated: legacy (environment variable-based) command-line profiler. Use
the more capable nvprof command-line profiler instead."

IMO there are way too many use cases for the command-line profiler to get rid
of it, probably ever.

~~~
sipherhex
nvprof is still a command line profiler, and has even more features.

~~~
varelse
And there are a lot of cases where it doesn't work, specifically with
elaborate MPI scenarios and over a network/VPN. Specifically, I do not wish to
jump through hoops to enable remote profiling over heavily IT-restricted
networks.

For simple apps, nvprof is great. For real low-level blood and guts CUDA
optimization, the command-line profiler is still indispensable. Killing it is
enough reason for me to go code FPGAs in OpenCL instead of GPUs in CUDA.

~~~
bsprings
Hi varelse, can you tell me more about your profiling use case? nvprof should
support MPI profiling scenarios, but perhaps yours is different. I'd love to
know details so I can help improve the product. Feel free to contact me at
first initial last name at nvidia.com (name is Mark Harris).

