CUDA 7.5: FP16, GPU Lambdas, GPU Instruction-Level Profiling

krapht · on July 10, 2015

CUDA has been on my list of things to learn for quite some time. The ecosystem is just so much nicer than OpenCL... if I didn't have to support non-Nvidia GPUs, I'd have already gotten on the bandwagon.

dman · on July 10, 2015

Ive been using OpenCL recently, and have had a pretty nice experience generally. Are there any things in particular from the CUDA ecosystem that you miss?

krapht · on July 10, 2015

I don't know much about CUDA, just what I see from my co-workers who do get to use it. The Visual Studio integration looks amazing. The C++ api is cleaner. I am told there are lots of little quality-of-life API things that are better from people who have used both.

artmageddon · on July 10, 2015

I have the same desire to learn it too, but the only problem is I don't know what I'd use it for :/

krapht · on July 10, 2015

Any problem where you have large vectors or matrices of floats and must add, subtract, divide, or multiply on them in parallel for performance reasons. A GPU is like a miniature datacenter in your computer!

But, unlike a real datacenter, it's only good for floats. But many financial, scientific, and gaming applications are only bottlenecked by the need to crunch floats, not arbitrary data types.

My mental model of a GPU is a device that has a million billion miniature 4-function calculators inside it that can all do things at once.

bsprings · on July 13, 2015

"unlike a real datacenter, it's only good for floats." That's actually not true. You'll find integer throughput is rather high on GPUs also. And the memory bandwidth is very high too. "miniature 4-function calculators": make that 5, where the fifth function is a really fast special function unit that can do very fast sin, cos, sqrt, 1/sqrt, and many other functions.

artmageddon · on July 10, 2015

Yeah, all of that makes sense. I guess I'm just in search of a problem to work on that would make it worthwhile to learn CUDA :)

krapht · on July 10, 2015

If you are a web developer, you might try WebGL? Same ideas, different packaging focused on graphics. Once you get the hang of it, you've got the language paradigm to understand CUDA, OpenCL, etc.

If you write business software... I'm so sorry. But I've heard things about GPU-accelerated SQL queries? I don't know much. I've never programmed a database, ever, except small SQL queries here and there.

artmageddon · on July 10, 2015

Yeah I tend to work on the latter, trying to get more on the former. I used to work in a color science company, and I wanted to see if it were possible to use CUDA for math optimizations, but the kind we did wasn't all that intense and there wasn't much traction in doing any kind of GPGPU work. I'm back in Finance now and I'm sure there are opportunities for it here, but the amount of red-tape we have here is so ridiculous(i.e. we can't even get source control for our C++ programs we manage).

I'll just have to push harder to learn :)

enos_feedler · on July 10, 2015

Ugh, totally agree. They need to open source this stack and get all vendors on it. Its one of those things that is really holding back GPGPU innovation.

If you've been following Docker at all, they just did this by donating "runc" to a new effort called the open container project. NVIDIA should develop a reference x86 implementation of CUDA. Something that works great on multicore and then donate it to OpenCL or something. Or maybe OpenCL catches up soon.

varelse · on July 10, 2015

Lots of good stuff, and some not so good stuff:

"Deprecated: legacy (environment variable-based) command-line profiler. Use the more capable nvprof command-line profiler instead."

IMO there are way too many use cases for the command-line profiler to get rid of it, probably ever.

sipherhex · on July 10, 2015

nvprof is still a command line profiler, and has even more features.

varelse · on July 10, 2015

And there are a lot of cases where it doesn't work, specifically with elaborate MPI scenarios and over a network/VPN. Specifically, I do not wish to jump through hoops to enable remote profiling over heavily IT-restricted networks.

For simple apps, nvprof is great. For real low-level blood and guts CUDA optimization, the command-line profiler is still indispensable. Killing it is enough reason for me to go code FPGAs in OpenCL instead of GPUs in CUDA.

bsprings · on July 13, 2015

Hi varelse, can you tell me more about your profiling use case? nvprof should support MPI profiling scenarios, but perhaps yours is different. I'd love to know details so I can help improve the product. Feel free to contact me at first initial last name at nvidia.com (name is Mark Harris).

bsprings · on July 21, 2015

FYI, nvprof works quite well with MPI, as described in this blog post by Jiri Kraus: http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-profi...

To use nvprof with MPI, you just need to ensure nvprof is available on the cluster nodes and run it as your mpirun target, e.g. “mpirun ... nvprof ./my_mpi_program"

You can have it dump its output to files that the NVIDIA Visual Profiler (NVVP) is able to load. You can even load the output from multiple MPI ranks into NVVP to visualize them on the same timeline, making it easier to spot issues.