

EasyOpenCL – The easiest way to get started with GPU programming - Gladdyu
https://github.com/Gladdy/EasyOpenCL

======
scott_s
There's actually quite a few of these kind of libraries floating around,
although I'm not sure how many are still actively supported.

Thrust: [http://thrust.github.io/](http://thrust.github.io/)

VexCL: [http://ddemidov.github.io/vexcl/](http://ddemidov.github.io/vexcl/)

Boost.Compute:
[http://boostorg.github.io/compute/](http://boostorg.github.io/compute/)

The author of VexCL provided a comparison of them two years ago:
[http://stackoverflow.com/questions/20154179/differences-
betw...](http://stackoverflow.com/questions/20154179/differences-between-
vexcl-thrust-and-boost-compute)

~~~
oneofthose
There is another library authored by me and some colleagues. It is called
Aura: [https://github.com/sschaetz/aura](https://github.com/sschaetz/aura)

I blogged about these kinds of libraries here (overview): [http://www.soa-
world.de/echelon/2014/04/c-accelerator-librar...](http://www.soa-
world.de/echelon/2014/04/c-accelerator-libraries.html)

A new addition is welcome as we still have not found the perfect API for
accelerator programming. EasyOpenCL seems very simple and easy to use but I
feel like it is very restricted.

For getting started with OpenCL development these days I would recommend
PyOpenCL. Since everything is in Python, data can be generated easily, results
can be plotted using well known Python tools which simplified debugging.
Kernels developed in PyOpenCL can directly be copied to other APIs (raw OpenCL
C API or some of the other C/C++ wrappers) and reused in production code.

~~~
Polytonic
I think the problem with (most of?) these libraries is that they don't solve
the fundamental problem, which is: the OpenCL API is awful to work with.
Various strategies have been attempted, mainly involving some form of "wrap
the C API in less verbose C++!"

------
bratsche
What's the license on this? There doesn't seem to be anything about that in
their Github repo.

~~~
nadams
I really don't know why you were downvoted - as this is a somewhat important
question.

Thought - by the github TOS you at least get to fork the repo [1].

[1] [https://help.github.com/articles/github-terms-of-
service/#f-...](https://help.github.com/articles/github-terms-of-
service/#f-copyright-and-content-ownership)

~~~
wyldfire
I find that it's easier to submit a PR than ask on HN. Usually omitting a
license is an unintentional error.

~~~
nadams
Why would you submit a PR - wouldn't you just create an issue (you don't want
to select a license for their project)? In either case - my experiences with
PRs have been really negative (and I'm not surprised).

~~~
wyldfire
Meh, they just need a push in the right direction. If they didn't care to post
a license in the first place they might not care which one I pick for them.
But you're right, in this case they took the PR as an opportunity to select a
license but decided not to take the one I suggested.

------
exDM69
This seems to be a library to make it really easy to invoke a single GPU
kernel on some input buffers that are copied from CPU (an std::vector).
Unfortunately, most practical GPGPU tasks aren't like that.

The latency of getting data from the CPU to the GPU and back is bad enough
that for a small quantity of data (low megabytes), it's better just to compute
it on the CPU. More practical tasks usually involve several kernel
invocations, and keeping the data at the GPU is essential for any kind of
decent performance.

But there are cases where executing a single kernel over some buffers would be
useful (especially in early development or prototyping). In those cases, I'd
like to write ZERO host-side code and use a CLI or GUI tool to run the code.
So what I'd like to see is something like:

    
    
        $ cl-cli --kernel=frobnicate.cl --input0=foos.bin --input1=bars.bin --output0=bazs.bin
    

Does such a tool exist already?

It would be even better if this would allow building proper pipelines of
multi-kernel programs by defining the inputs and outputs to kernels using a
directed acyclic graph.

I do not intend to dishearten you, OP, but think about this when you consider
future direction to take with your project.

~~~
Gladdyu
The framework allows for partial data updates - for instance for a 3D renderer
it suffices the push the new position to the GPU whilst the vertex data
remains in GPU memory. If you invoke the kernel function again it will not
recompile the kernel nor reupload the vertex data.

The DAG idea sounds fun to build and very useful - I have some spare time
anyway so I'll see what I can whip up. As for the command line interface - It
too sounds pretty useful and it should only be a bit of parsing as all the
OpenCL related code as been written already, but ufo-launch already performs
pretty much the same function so it's not very high on my todo list.

~~~
exDM69
> The framework allows for partial data updates

Good! Additionally, it would be useful to memory map buffers and allow using
raw pointers in addition to std::vectors. But more importantly, it would be
necessary to use the output of one kernel as the input of another kernel
invocation.

Anyway, build it to suit _your_ use case primarily. Happy hacking!

~~~
Polytonic
Sorry to keep plugging my stuff, but you might be interested in this ...
[https://github.com/Polytonic/Chlorine](https://github.com/Polytonic/Chlorine)

No raw pointers, but you can use C arrays (was that what you meant by raw
pointers?).

------
Polytonic
I wrote something similar a while back
([https://github.com/Polytonic/Chlorine](https://github.com/Polytonic/Chlorine)).
Always good to see more attention paid to OpenCL though!

------
gjulianm
Seems nice! I would use this to avoid all the OpenCL bloat code. However,
there's an inconvenient: why restrict the vector sizes to be all the same? I
see that it is used to set the workgroup size. I think that giving the
possibility to pass arrays of whatever size and allowing the client to set the
workgroup size wouldn't add much complexity to the code nor to the API.

Apart from that, really nice work, the code is well written and commented,
it's a joy reading things like that.

~~~
Gladdyu
I still have some plans to auto-derive the optimal work/global/local group
sizes, however, that still takes some work.

Therefore, I just implemented the most basic straightforward alternative
(which is indeed rather restrictive at the moment) as a temporary solution.

~~~
gjulianm
That's great! IIRC, OpenCL can calculate automatically the local workgroup
sizes so you only have to provide the global size.

------
pen2l
CUDA is probably the way to go, since especially if you're having to use a
gpu, might as well get one of the new nvidia gpu's

~~~
Gladdyu
CUDA has a nice toolchain, but the point of this is to remove all of the low-
level stuff. For performance (even on NVIDIA cards) it doesn't really matter
whether you use CUDA or OpenCL [1] + OpenCL runs everywhere as a bonus.

[1]
[http://pds.ewi.tudelft.nl/pubs/papers/icpp2011a.pdf](http://pds.ewi.tudelft.nl/pubs/papers/icpp2011a.pdf)

~~~
dr_zoidberg
Added bonus: OpenCL runs on multicore processors, so you don't even need a GPU
(though performance is obviosuly lower).

~~~
lfowles
Not a whole lot lower though, only in the 10-100x lower range from what I've
experimented with (couple generations old card vs couple generations old i7
with a straight up "process some floats" test[0]). Certainly enough to test
your kernels against.

[0]:
[https://github.com/krrishnarraj/clpeak](https://github.com/krrishnarraj/clpeak)

~~~
dr_zoidberg
Good work, specially on keeping the results from various platforms!

~~~
lfowles
Not mine! Just the tool I used

