RapidMind provided sort of a DSL in C++ (with a combination of C++ macros, operator overloading etc) for writing some types of highly parallel computations running on arrays.
RapidMind included a JIT compiler which converted the code into optimized multicore x86 code or to GPU code (initially they compiled to shader programs but not sure whether they finally compiled to something like CUDA code). The allure of RapidMind was that you write code once and it can run on CPU or GPU. By acquiring RapidMind, Intel is now controlling one of the APIs people may have used to program GPUs from Nvidia or AMD.
(Though I must also add that I didnt really like RapidMind API very much. It was not very flexible.)
edit : RapidMind was also portable to Cell processor.
Though I must also add that I didnt really like RapidMind API very much. It was not very flexible
It sounds like you work in this space. I've become interested in GPU parallelism. What's the best stuff out there? And how do you recommend to get started?
The future is certainly hardware-neutral APIs such as OpenCL and DirectX compute shaders. However, for learning, its best to start off with Nvidia CUDA because there is a ton of material for CUDA online and CUDA is also available here and now. Once you learn CUDA, switching to something like OpenCL is easy. For production code, there is no longer any reason to write CUDA.
And also a word of caution for new to GPUs: dont expect miracles from GPUs. Set reasonable expectations. GPUs are not magic.
I'd like to know more about reasonable expectations. What sort of algorithms are GPUs best for? I'm interested in using them for computation, not actual pixel shading. If you could give an example of something that would be in the sweet spot for GPU parallelism, and something that wouldn't, I'd appreciate it. Or a pointer to some good sources.
Two things: data parallel code is best, and you need to have a large enough amount of data to amortize the high cost of transferring data to the GPU. "Large enough" depends on your data access patterns.
From what I've seen, OpenCL is meant more to be a compiler target than something people program. CUDA, while tied to GPUs, has more abstractions that OpenCL.
(Though I must also add that I didnt really like RapidMind API very much. It was not very flexible.)
edit : RapidMind was also portable to Cell processor.