
Interactive GPU Programming, Part 1: Hello CUDA - disaster01
http://dragan.rocks/articles/18/Interactive-GPU-Programming-1-Hello-CUDA
======
Everlag
From the perspective of a CUDA beginner, this doesn't seem simpler than
writing CUDA with C(not C++, just C). If you're going to pick up CUDA,
starting with C means you get the best tooling support and community docs. Not
to mention that managing pointers and explicit types in C will genuinely help
your understanding of how CPU-GPU works.

If you already know Clojure, this is probably the best chance to extend
something you already love using. If you don't, you're probably better off
learning either CUDA or Clojure rather than both at the same time. Debugging
CUDA errors are already painful, I wouldn't add a new host language on top of
that.

For context, I'm currently taking my school's GPGPU course. We've just started
actually writing non-trivial code.

~~~
jplane
The development feedback loop is _incredibly_ tight when using a clojure (or
lisp in general) repl. The interactivity lets you interactively develop your
code, including ( it appears) the C/cuda code since you can call out to
compile it at the repl and then upload it to the GPU for execution.

------
tlarkworthy
I was irritated that it took 8 paragraphs to reveal it was clojure. And it was
extra annoying that the vagueness was entirely deliberate.

------
dragandj
The obligatory links to the GitHub of the libraries used:
[https://github.com/uncomplicate](https://github.com/uncomplicate)

------
shmerl
I hope CUDA will get replaced by Vulkan merged with whatever core OpenCL
features it still needs.

Looks like Khronos are looking into converging them in some way:
[https://www.pcper.com/reviews/Graphics-Cards/Follow-Neil-
Tre...](https://www.pcper.com/reviews/Graphics-Cards/Follow-Neil-Trevett-and-
Tom-Olson-Khronos-Group-Discuss-OpenCL-and-Vulkan-Roa)

CUDA unfortuantely is Nvidia's lock-in, so not a good way forward.

~~~
dragandj
FWIW, all my libraries work with _both_ CUDA and OpenCL.

While I agree with your sentiment, unfortunately Nvidia is the only vendor
that pays considerable number of people to develop the ecosystem. AMD
basically says "get lost" by refusing to put more than a handful of people on
the job of providing OpenCL libraries. And, BTW, they change their minds every
few years. I hope that HIP won't be abandonware...

~~~
shmerl
_> unfortunately Nvidia is the only vendor that pays considerable number of
people to develop the ecosystem. AMD basically says "get lost" by refusing to
put more than a handful of people on the job of providing OpenCL libraries. _

Vulkan itself is developed and supported well, and it already can be used for
compute as far as I know. But apparently there are some features that come
from the OpenCL world that need to be filled in. It wouldn't be AMD's
exclusive effort. So hopefully things will start moving.

~~~
dragandj
The language and basic platform is not a problem. OpenCL was and is OK.
However, the _libraries_ are far and between. CUDA offers cuBLAS, cuFFT,
cuDNN, cuSolve, etc. For OpenCL, even the decent BLAS library (CLBlast) had to
be written by a guy who did it for free, while AMD's clBLAS is more or less
stalled (and I never managed to build it on Linux in the first place), and
that's it...

~~~
ryanpepper
The ability just to swap in the cuFFTW header for FFTW3's making calls execute
on GPU (even though it doesn't give the best performance) is also nice for
beginners.

------
jweather
Followed along until I had to compile a kernel, now I'm facing a
java.lang.UnsatisfiedLinkError: Error while loading native library
"JNvrtc-0.9.0-windows-x86_64" This seems to be a dependency of ClojureCUDA,
but I don't see anything about it in their installation instructions. I have
the CUDA Toolkit installed. Everything worked up to this point.

------
Aardwolf
Why do things using CUDA often require older versions of compilers?

E.g. ccminer. Try to make it, but it finds my modern gcc or clang too modern
:(

~~~
shaklee3
The cuda compiler itself (nvcc) is far behind the features of more recent
compilers. For instance, c++11 is supported, but not the full standard. It
will take a while before 14/17 are supported.

~~~
Aardwolf
how does that stop it from using the latest version of clang++ or g++? they
are backwards compatible with older C++ versions. The context is linux and a
makefile failing with a message that your g++ or clang++ must be a version
older than something

------
splittingTimes
Are there similar resources/tutorials for GPU programming/CUDA/openCL for pure
Java?

~~~
agibsonccc
(Disclaimer: I created and maintain this library which is now apart of the
eclipse foundation):

[http://nd4j.org/](http://nd4j.org/) \- in built GPU garbage collector and
everything.

If you want raw cuda primitives (not generally recommendended and hard to do
right) - you can take a look at our javacpp based (we also maintain this) cuda
bindings: [https://github.com/bytedeco/javacpp-
presets/tree/master/cuda](https://github.com/bytedeco/javacpp-
presets/tree/master/cuda)

Unlike jcuda (which people typically recommend despite not being updated as
often) we actually depend on this for the nd4j and deeplearning4j projects.

These cuda bindings are meant to be a 1 to 1 mapping to the cuda api as well.
Hope this helps!

If you want a fairly small and minimalistic look at the underlying c code
which uses cuda take a look at:
[https://github.com/deeplearning4j/libnd4j](https://github.com/deeplearning4j/libnd4j)

All of this is published on maven central for you and runs on linux, windows
and even mac. It's also the same api. All you do is switch the backend.

~~~
splittingTimes
Thats great, thanks.

------
limaoscarjuliet
In Polish, "cuda" means "miracles". Always cracks me up!

~~~
programmer_dude
In Hindi it means garbage (not trying to be a jerk here, this is what it
means).

------
calebm
I'd love something like this with Python and ctypes.

~~~
deepnotderp
PyCUDA is more of a wrapper type, performance won't be good compared to native
cuda.

------
dragonwriter
Okay, so it's easier then directly using the the CUDA, etc., C toolchains,
perhaps, but why not compare to Python + Numba, which has been available with
GPU support for quite a while, and likewise avoids direct exposure to the
underlying C toolchains, provides interactive compilation, can be used with a
nice REPL (or, Jupyter Notebook), etc.?

~~~
metalock
The author wrote the tutorial as he pleased. Nobody stops you from submitting
a better link.

