CUDA has been on my list of things to learn for quite some time. The ecosystem is just so much nicer than OpenCL... if I didn't have to support non-Nvidia GPUs, I'd have already gotten on the bandwagon.
Ive been using OpenCL recently, and have had a pretty nice experience generally. Are there any things in particular from the CUDA ecosystem that you miss?
I don't know much about CUDA, just what I see from my co-workers who do get to use it. The Visual Studio integration looks amazing. The C++ api is cleaner. I am told there are lots of little quality-of-life API things that are better from people who have used both.
Any problem where you have large vectors or matrices of floats and must add, subtract, divide, or multiply on them in parallel for performance reasons. A GPU is like a miniature datacenter in your computer!
But, unlike a real datacenter, it's only good for floats. But many financial, scientific, and gaming applications are only bottlenecked by the need to crunch floats, not arbitrary data types.
My mental model of a GPU is a device that has a million billion miniature 4-function calculators inside it that can all do things at once.
"unlike a real datacenter, it's only good for floats." That's actually not true. You'll find integer throughput is rather high on GPUs also. And the memory bandwidth is very high too. "miniature 4-function calculators": make that 5, where the fifth function is a really fast special function unit that can do very fast sin, cos, sqrt, 1/sqrt, and many other functions.
If you are a web developer, you might try WebGL? Same ideas, different packaging focused on graphics. Once you get the hang of it, you've got the language paradigm to understand CUDA, OpenCL, etc.
If you write business software... I'm so sorry. But I've heard things about GPU-accelerated SQL queries? I don't know much. I've never programmed a database, ever, except small SQL queries here and there.
Yeah I tend to work on the latter, trying to get more on the former. I used to work in a color science company, and I wanted to see if it were possible to use CUDA for math optimizations, but the kind we did wasn't all that intense and there wasn't much traction in doing any kind of GPGPU work. I'm back in Finance now and I'm sure there are opportunities for it here, but the amount of red-tape we have here is so ridiculous(i.e. we can't even get source control for our C++ programs we manage).
Ugh, totally agree. They need to open source this stack and get all vendors on it. Its one of those things that is really holding back GPGPU innovation.
If you've been following Docker at all, they just did this by donating "runc" to a new effort called the open container project. NVIDIA should develop a reference x86 implementation of CUDA. Something that works great on multicore and then donate it to OpenCL or something. Or maybe OpenCL catches up soon.
And there are a lot of cases where it doesn't work, specifically with elaborate MPI scenarios and over a network/VPN. Specifically, I do not wish to jump through hoops to enable remote profiling over heavily IT-restricted networks.
For simple apps, nvprof is great. For real low-level blood and guts CUDA optimization, the command-line profiler is still indispensable. Killing it is enough reason for me to go code FPGAs in OpenCL instead of GPUs in CUDA.
Hi varelse, can you tell me more about your profiling use case? nvprof should support MPI profiling scenarios, but perhaps yours is different. I'd love to know details so I can help improve the product. Feel free to contact me at first initial last name at nvidia.com (name is Mark Harris).
To use nvprof with MPI, you just need to ensure nvprof is available on the cluster nodes and run it as your mpirun target, e.g. “mpirun ... nvprof ./my_mpi_program"
You can have it dump its output to files that the NVIDIA Visual Profiler (NVVP) is able to load. You can even load the output from multiple MPI ranks into NVVP to visualize them on the same timeline, making it easier to spot issues.