
DCompute: GPGPU with Native D for OpenCL and CUDA - ingve
http://dlang.org/blog/2017/07/17/dcompute-gpgpu-with-native-d-for-opencl-and-cuda/
======
axaxs
D has always interested me a lot. Sure it's been around a while, but the
community seems rather small - comparatively. That said, that small community
produces some really nice stuff - they are up to what 3 'official' compilers
now? I hope to see its adoption rise now that the reference compiler is open
source(iirc) - the community seems second to none as far as signal to noise
ratio goes.

------
ptrott2017
Nicholas Wilson has done an awesome job with DCompute - especially ability to
use D's lambda's and templates when writing compute kernels. Its going to be
fun seeing this evolve.

~~~
nicwilson
Thanks. Halide has recently caught my eye and I'd like to see how far I
replicate how they do things. It'll have to wait a bit though.

~~~
ptrott2017
Halide has some interesting ideas for image processing -especially regarding
algorithm separation and scheduling - so great to hear its on your radar and
be very interested to see what you come up with. My interest/focus is more on
stencil codes and that is certainly an area I hope to test with DCompute.
Congrats again (and thanks again) for an awesome project.

------
gravypod
I can't wait until compilers can start auto generating GPU kernels. That will
be when GPGPU really takes off for most people who's applications aren't
critical enough to spend hours writing these by hand but would benefit from
the significant speed up.

~~~
pcwalton
I'm not sure that will ever happen, at least without changing languages.

Autovectorization alone is difficult in C and C++ because the languages
provide almost no useful information about aliasing. Precise aliasing info is
just the tip of the iceberg regarding what you would need for GPU-based
autovectorization.

~~~
WalterBright
D has array expressions, which look like:

    
    
       a[] = c * b[];
    

The idea of them is they are parallelizable, and do not require an auto-
vectoring loop optimizer. Auto-vectorizing is fraught with problems, like the
user not being aware if the auto-vectorizing succeeded or not.

Another aspect of D that enables parallization is the use of ranges +
algorithms. Although currently unexploited, it has the potential to be able to
express parallel algorithms, and then take advantage of that.

~~~
VHRanger
> Another aspect of D that enables parallization is the use of ranges +
> algorithms.

Is that similar to the C++ std::algorithm execution policy features introduced
in C++17?

~~~
nicwilson
No. What Walter is talking about in D is if you have

Foo[] foo = ...;

foreach(f; foo) { f.bar.baz.quux; }

and the computation is parallelisable then you can write

import std.parallelism; foreach(f; parallel(foo)) { f.bar.baz.quux; }

and it will parallelised across threads.

I guess std.parallelism's parallel might be more analogous to an "instance" of
an executor.

~~~
VHRanger
So that's similar to OpenMP parallel for looping?

------
p0nce
Interesting. CUDA kernels are plagued by an explosion of entry points and
OpenCL C kernels by the lack of meta-programming.

