

Compiling Julia for NVIDIA GPUs - jostmey
http://blog.maleadt.net/2015/01/15/julia-cuda/

======
unfamiliar
Since I only have one GPU usually, I'm still waiting for the day I can do
this:

    
    
        using CUDA
    
        # define a kernel
        @kernel function kernel_vadd(a, b, c)
            i = blockId_x() + (threadId_x()-1) * numBlocks_x()
            c[i] = a[i] + b[i]
        end
    
        # create some data
        dims = (3, 4)
        a = round(rand(Float32, dims) * 100)
        b = round(rand(Float32, dims) * 100)
        c = Array(Float32, dims)
    
        # execute!
        @cuda kernel_vadd(CuIn(a), CuIn(b), CuOut(c))
    
        # verify
        @show a+b == c
    
    

i.e. no set up, no "cuda context" (whatever that is), and no tear down
afterwards. I understand that manual memory management is almost necessary
with this application, but it seems that most of it could be automated in the
most common case of "I have a couple of large arrays and a few operations I
want to perform."

~~~
ICWiener
Instead of @kernel and @cuda, you might want to use defkernel and with-cuda...
See the example files for cl-cuda:

[https://github.com/takagi/cl-
cuda/blob/master/examples/vecto...](https://github.com/takagi/cl-
cuda/blob/master/examples/vector-add.lisp)

I though that Julia had macros, so I don't understand why what you propose is
not possible (note to self: find time to learn Julia).

~~~
maleadt
It is definitely possible to do that in Julia. The reason a didn't yet is
purely a manner of priorities, I first focused on wrapping the basic
primitives (calling a kernel, marshalling arguments, etc) in a user-friendly
way.

------
kfor
The content of the project aside (which is very exciting but early stage), I'm
really impressed with the author's overview for how others can take up the
project and move it forward. I've seen too many projects die because when the
authors move onto other things they just leave an incomplete git repo and no
clear plan for what happens next. It'd be great of course if there were
someone lined up to take the reins, but the crucial thing is that in this case
someone could ascertain the project's state and possible next steps even
months after maleadt is out of the picture.

~~~
maleadt
Thanks for the kind words! This was exactly what I was aiming for: the code
(or insights) to be reusable without too much hassle. More so because part of
it was developed in the scope of my PhD; I wouldn't want to know how many
failed or unpublishable research results are stowed away on some grad
student's computer.

------
e12e
How does this compare/contrast vs opencl.jl[1]?

When working in julia, what are the benefits of tying oneself to CUDA (and not
running accelerated on on-die graphics or on amd gpus) -- or doesn't nvidia
work reliably/well with opencl?

[1]
[https://github.com/JuliaGPU/OpenCL.jl](https://github.com/JuliaGPU/OpenCL.jl)

~~~
maleadt
OpenCL.jl is purely the runtime part, ie. it still requires you to write
manual OpenCL code, after which you can use the julia wrapper to manage that
code.

My project also provides compiler support for lowering Julia to CUDA assembly,
so you don't need to write CUDA code yourself. Added to that, my runtime also
contains (PoC) higher-level wrappers, making it easier to call CUDA kernels,
upload data, etc.

Concerning tying yourself to the NVIDIA-stack: it's still the most mature and
versatile toolchain, which is why I picked it in the first place. My long term
plan was to switch over to SPIR (or some other cross-vendor stack) as soon as
possible. At that point, switching user-code over to that new back-end would
(theoretically) not require that much effort, since the kernels are written in
julia-code instead of CUDA C (except for the runtime interactions, of course).

~~~
e12e
Thank you for emphasising the part about kernels being Julia code -- I missed
that entirely!

As for Nvidia/CUDA being more mature -- that was what I feared -- it seems a
common sentiment in the discussions I've seen on OpenCL/CUDA.

------
monochromatic
Don't forget to use the /3.5GB compiler option if it's for a GTX 970!

