I'm definitely happy to see more languages with GPU support, but schedulers to distribute work between CPUs and GPUs are a particular interest of mine. The most full-featured I've seen is StarPU:
But there's still a lot of work to be done; it would be very interesting to remove the need for the developer to estimate time spent on CPU (or one type of processor) versus time spent on GPU and see the effects on developer productivity, for example.
Of course this new package is different because it uses both CPU/GPU...
I also found Accelerate programs hard to debug. You cannot use "trace" to print out stuff during computation because that is a CPU instruction.
Maybe they should have called it "snake oil" instead.
The reason they went with CUDA was to plug into Accelerate's existing framework without redeveloping the entire wheel. As meric mentioned, Accelerate is a pain to do anything with and you can bet dollars to do syntax that this package will generate the hard parts for you.
IIRC, ParFunk also has some nice framework in place for distributed computation (though I'm not certain it's completely in working order yet).
I also like how the blog post is available as a literate Haskell file. I think that's a great way to make an introduction more useful, and I wish more languages would take an approach like that for different articles.
Either OpenCL, or a freely available x86 CUDA, would really make CPU/GPU programming more useful in this case. We might already have what we need re: CUDA x86. On our TODO list is trying PGI Accelerator and GPU Ocelot for this purpose: