

Android Renderscript from the perspective of an OpenCL/CUDA/C++ AMP programmer - compilercreator
http://codedivine.org/2013/02/01/renderscript-from-the-perspective-of-an-openclcudac-amp-programmer/

======
sparky
There was a talk at the 2011 LLVM dev meeting (cached slides here
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://llvm.org/devmtg/2011-11/Hines_AndroidRenderscript.pdf)
, llvm.org is down today) about Renderscript's design philosophy and LLVM-
based compilers.

In short, it's not an accident or incompetence that aspects of current desktop
GPU execution models (e.g., thread blocks, scratchpad shared memory) are not
exposed in Renderscript. It's a conscious decision to make sure you can get
decent performance on not only those GPUs, but ARMv5-v8 CPUs (with and without
SIMD instructions), x86, DSPs, etc. Getting good performance on these
platforms from a language that does expose these constructs (e.g., CUDA) is
still an open research problem (see MCUDA
<http://impact.crhc.illinois.edu/mcuda.aspx> and friends).

Though Renderscript aims to achieve decent performance on a huge variety of
platforms, even if they only cared about mobile GPUs, the major contenders
(Imagination, ARM, Samsung, Qualcomm, NVIDIA) have wildly different
architectures, and a language that is close to the metal on one will present a
huge impedance mismatch on the others. Note that things are sufficiently
different from desktop GPU design that we're just now seeing SoCs come out
that support OpenCL (in hardware, driver support seems to be lagging), and you
can't run CUDA on Tegra 4.

~~~
tmurray
Pretty much exactly this. Performance portability is our main concern, and we
are willing to trade off some peak performance to get it because of how badly
you will hurt yourself on different architectures. We are trying to solve low-
hanging problems first before attacking more complex algorithms.

~~~
varelse
So if I read this correctly, you're effectively trying to solve the hybrid
computing problem that everyone else is working on too (and the results so far
are pretty disappointing IMO).

To which I have to respond that better is often the enemy of good enough.

I'd personally rather have a relatively OK solution like OpenCL in my hands
today than a currently nonexistent ideal solution at some vague point in the
future. Smart programmers will overcome hardware limitations all on their own
and dumb programmers will trip you up no matter how much you rabbit-proof
their fences IMO.

------
wwalker3
It does look like mobile GPU vendors are about to start offering OpenCL
support. For example, ARM submitted OpenCL 1.1 Full Profile conformance test
results for the Mali-T604 last year
([http://blogs.arm.com/multimedia/775-opencl-with-arm-mali-
gpu...](http://blogs.arm.com/multimedia/775-opencl-with-arm-mali-gpu-
computingwith-no-compromises/)), and Imagination Technologies showed mobile
OpenCL demos last year at CES (<http://www.youtube.com/watch?v=sDrz-w1jzEU>).

It's easy to see why OpenCL hasn't rolled out fully on mobile GPUs yet:
writing and debugging a full OpenCL software stack is very expensive and time-
consuming, and there's still not that much real programmer demand for OpenCL
on mobile.

As for Renderscript, it's always sounded like a bit of "not invented here"
syndrome Google's part -- we've already got CUDA and OpenCL, and RS doesn't
really bring much new to the table. They've already deprecated the 3D graphics
part of Renderscript in Android 4.1, so perhaps they'll do the same to
Renderscript Compute soon.

~~~
cageface
I'd much rather see Google invest their time in an Android version of
something like the Accelerate API from iOS. It would be a lot more generally
useful.

------
varelse
I suspect that as soon as Apple exposes OpenCL in any way on IOS, Android will
shortly follow. Likewise, if Mozilla exposes WebCL in FireFox, Chrome will
shortly follow. What I don't expect is for them to take the lead in doing so.

Say what you want of OpenCL/CUDA, but what other language smoothly subsumes
SIMD, multi-threading, and multi-core awareness? I expected it to already be
available on smart phones by now. What's taking so long?

------
Osiris
If someone with that level of experience can find so many flaws so quickly,
why aren't people with that level of domain knowledge brought in when the API
is originally being developed? Or, if they are, why isn't there released
documentation on why the API isn't as good as they wish it could be?

~~~
tmurray
I worked on CUDA at NVIDIA for over four years and was the primary API
designer for a large part of that time. I started on RS at Google in
September.

Basically, he gives us too little credit for the execution model (it's young,
it's improving very quickly and is not at all designed to emulate anything
else that exists today) and assumes that GPU compute has the same tradeoffs on
mobile as desktop (it doesn't at all). You'll see more from us soon.

~~~
compilercreator
Hi. Author here. Your name is quite famous in the GPGPU community, and it is
great to hear that you now work on RSC. My experience does not compare to
yours and I do hope my post is seen in a positive light. Would love to discuss
the issues in depth sometime.

Anyway if you were to ignore everything in the post except one item, that
would be to please fix gather/scatter in RSC. A parallel computing API without
proper gather/scatter is simply not very useful, irrespective of whether it is
on desktop or mobile.

I will keep following RSC and look forward to the developments you are hinting
at.

~~~
tmurray
I implemented scatter back in October, but it just barely missed Android 4.2.
It's in the next release.

~~~
compilercreator
Awesome. Thanks. Is there an email ID where I can send you a small note at? Or
perhaps I can send you a PM on B3D forums?

~~~
tmurray
B3D PM is fine.

------
sippndipp
I think that Renderscript is not meant as replacement for native C++ code.
Rather it's an platform independent and easy way to give a programmer more
performance power (beyond Java). I guess that if you need real performance or
more control you'll have go the NDK route anyways. But if you just want to
write another Instagram clone then Renderscript is the way to go.

