
Ask HN: How to learn OpenCL - lettergram
OpenCL seems to only have minimal examples, and although there is a fair to good amount of documentation, I feel the need to ask if there are any good books&#x2F;tutorials on how to use&#x2F;learn OpenCL?
======
profquail
Intel and AMD both have some good documentation on their site for getting
going with OpenCL, so that's a good place to start. For example:

[http://developer.amd.com/tools-and-sdks/heterogeneous-
comput...](http://developer.amd.com/tools-and-sdks/heterogeneous-
computing/amd-accelerated-parallel-processing-app-sdk/introductory-tutorial-
to-opencl/)

The Khronos website has a huge page with a list of OpenCL tutorials and books:

[https://www.khronos.org/opencl/resources](https://www.khronos.org/opencl/resources)

Amazon has a number of OpenCL books available:

* [http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Comp...](http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Computations/dp/1617290173)

* [http://www.amazon.com/Heterogeneous-Computing-OpenCL-Second-...](http://www.amazon.com/Heterogeneous-Computing-OpenCL-Second-Edition/dp/0124058949)

* [http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi...](http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi/dp/0321749642/)

This book is available on Amazon but the previous edition is available for
free:

[http://www.fixstars.com/en/opencl/book/](http://www.fixstars.com/en/opencl/book/)

Intel's website also has some "Getting Started" articles and optimization
guides for OpenCL (for CPU, GPU, and Xeon Phi):

[http://software.intel.com/en-
us/vcsource/tools/opencl](http://software.intel.com/en-
us/vcsource/tools/opencl)

~~~
sitkack
This is just a small sample of the OpenCL kernels available in the AMD SDK (it
runs everywhere, at least it did last time I checked).

    
    
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/AESEncryptDecrypt_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BinarySearch_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BinomialOption_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BitonicSort_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BlackScholesDP_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BlackScholes_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BoxFilterGL_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BoxFilter_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BufferBandwidth_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/ConstantBandwidth_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/DCT_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/DeviceFission_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/DwtHaar1D_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/EigenValue_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FFT_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FastWalshTransform_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FloydWarshall_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FluidSimulation2D_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/GaussianNoise_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/GlobalMemoryBandwidth_Kernels.cl
      AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/HelloCL_Kernels.cl
    

I'd hook into OpenCL from the high level language of your choice. Look at

* [http://mathema.tician.de/software/pyopencl/](http://mathema.tician.de/software/pyopencl/)

* [http://www.drdobbs.com/open-source/easy-opencl-with-python/2...](http://www.drdobbs.com/open-source/easy-opencl-with-python/240162614)

Or JRuby or Jython with
[https://code.google.com/p/aparapi/](https://code.google.com/p/aparapi/)
(still have to write your inner kernel in .java and send that to aparapi)

If you go the "whole-stack-in-c" route much of your time will be spent doing
memory operations.

------
jaffee
If you're a complete beginner in data parallel programming, and you're having
trouble finding good intro material for OpenCL, it might almost be worthwhile
to check out CUDA instead. In terms of the programming model, OpenCL and CUDA
are identical - significant differences don't come about until you start
optimizing for specific devices.

I learned CUDA first on my own, and then took an OpenCL class and found that
the whole first section was completely redundant. There's also a pretty great
wealth of CUDA material online and a few published books if that's your sort
of thing.

~~~
jhdevos
To add some reasons why you'd want to learn CUDA first: It turns out that
simple things are a lot simpler, and take a lot less code, in Cuda than
OpenCL. With Cuda, your kernel and host code will be close together in the
same file. You'll need a /lot/ less boilerplate than are needed in OpenCL to
accomplish even the most simple things. OpenCL exposes you to a lot more
concepts, and a lot more extrinsic complexity, than Cuda. All this means just
playing around a lot easier to do in Cuda.

Just take a look at some simple examples in both, and you'll quickly see what
I mean. Even though I'm a fan of OpenCL because it is available on more
platforms, Cuda is a lot better suited as a learning platform.

------
exDM69
A word of warning: OpenCL, and heterogeneous computing in general, is very
very difficult. It will take a lot of effort to get even the simplest hello
world application working.

And when writing OpenCL, even you are using a single API to write code, if you
want to get high performance you will need to rewrite parts of your
application for each hardware you intend to run on if you want good
performance. This is obvious since the code may end up running on Intel x86's
or Intel, AMD or Nvidia GPU architectures which are all very different. If
you're lucky, it's enough to rewrite your kernel code (the code running on the
device). But you might also need changes to the host side code (running on the
cpu) and make changes to the way you manage the memory and DMA transfers, etc.

Finally, when it comes to the basics, OpenCL is not too different from CUDA
and there's a lot more material available on CUDA (because it's been around
longer and is perhaps used a bit more). You should be able to pick up a book
or a tutorial on CUDA and translate it to OpenCL without too much effort.

Finally, even though it may take quite a lot of learning to get started,
parallel programming on GPUs is quite fun and it is very rewarding to see your
code run with very high performance.

------
letzjuc
Disclaimer: this is a rant, it is obvious that I don't like OpenCL and that I
think that it was designed by monkeys so take it with a grain of salt.

In short: don't learn OpenCL. Both CUDA and C++AMP are good languages for
programming heterogeneous machines and nVidia's Thrust and Microsoft's PPL are
both excellent libraries to write efficient and reusable code. These language
extensions are also strongly typed and come with really good tools. My advice
is: learn any of them instead.

Why not OpenCL? AMD's Bolt library is the live proof that OpenCL is fxxxxx up
beyond all repair. It is not meant for humans to write, nor for machines to
understand.

Kernels are just character strings!!! This is just so wrong! Forget about
using functors and lambdas as kernels, and forget about mixing kernels with
templates. You will be better off using Python and PyOpenCL (which is great)
that using C and C++. In C++ generating kernels is really hard, and generating
kernels from expression templates is insanely hard.

Furthermore, this also means that the language is not typed at all!!
Forgetting a semicolon in C results in a runtime error! Do you want syntax
highlighting? Write your kernels in separate files! This is even worse than
the way people used to write functors far away from the call site in C++03, at
least they were in the same file!

As stated above my advice is don't learn it. Let it die. Your time is better
spent learning CUDA/C++AMP and their libraries. The design rules for OpenCL
have been "let's not learn anything from OpenGL" \+ "we need something, this
is something, let's standarize this". This of course has resulted in an
hilarious language that came after CUDA and was worse in every possible way.

------
kylelutz
If you're using C++, check out Boost.Compute [1]. It provides a high-level
STL-like API for OpenCL (without preventing you from directly using the low-
level OpenCL APIs). It simplifies common tasks such as copying data to/from
the device and also provides a number of built-in algorithms (e.g.
sorting/reducing/transforming, etc).

[1] [https://github.com/kylelutz/compute](https://github.com/kylelutz/compute)

------
pflanze
I've started with this[1] blog post, then worked quite a bit on adding proper
error checking to the example code to figure out why it failed :), now the
author has merged my changes. So perhaps it's a worthwhile example to start
from now. I haven't done much with OpenCL yet, though, in the end I figured
out that my ~7-8 year old laptop ran SIMD optimized C code faster on the host
CPU than on the GPU (I wrote this[2] with heavy SIMD optimization work, not
sure anymore whether this was what I tested against OpenCL though), which is a
reason why.

[1] [http://www.thebigblob.com/getting-started-with-opencl-and-
gp...](http://www.thebigblob.com/getting-started-with-opencl-and-gpu-
computing/) [2]
[https://github.com/pflanze/mandelbrot/tree/master/c](https://github.com/pflanze/mandelbrot/tree/master/c)

------
wsc981
Perhaps Apple's documentation and WWDC video might be of help:
[https://developer.apple.com/opencl/](https://developer.apple.com/opencl/)

To watch the video you need to be a registered Apple developer.

~~~
profquail
To clarify -- you can read the documentation and code samples without having
to register as an Apple Developer. You only need to do that to watch the
tutorial videos or access the developer forums.

------
melonakos
One great way to start is to use OpenCL libraries. We work on clMath
([https://github.com/clMathLibraries](https://github.com/clMathLibraries)) and
ArrayFire ([http://arrayfire.com](http://arrayfire.com)) which are both easy
to pick up. Once you get comfortable with libraries, you can start trying to
write your own kernels, and you'll know which things you'll need to write that
aren't already in a freely available library. Good luck!

------
mschlafli1
Mess around with open-source applications (such as Rodinia benchmark suite and
NAS parallel benchmarks) after going through the basic tutorials on AMD, Intel
and Nvidia webpages.

~~~
WizzleKake
A good open-source OpenCL application might be cgminer or bfgminer, might have
to get an older version though, it looks like the latest ones have abandoned
OpenCL/GPU support.

~~~
tibbon
Any idea why they abandoned it? CPU-only mining seems to be kinda pointless
these days...

~~~
dmm
Cgminer and bfgminer are focused on usb based ASIC miners now. Bfgminer still
supports cpu and gpu mining, it's just unsupported and not compiled by
default.

------
tjaerv
You might wish to check out the University of Illinois's "Heterogeneous
Parallel Programming" course, offered through Coursera:

[https://www.coursera.org/course/hetero](https://www.coursera.org/course/hetero)

The course is already currently ongoing, but it's not too late to enroll. The
course is mainly focused on CUDA (since it's easier to learn, the professor
believes), but covers OpenCL as well.

