
Does not GPU compute - alexvoica
http://www.alexvoica.com/does-not-gpu-compute/
======
manav
For actual mobile devices, yes there is no need. FP64 only has a use in
scientific research, maybe finance and a few other fields. Even there you
would do a lot of mixed precision stuff.

The reason that the support was there was probably that they wanted to design
a single chip and either remove/disable cores for truly mobile or general
purpose boards while having the logic available for customers that would
actually want it.

------
Animats
I once needed FP64 in a GPU for physics calculations. One reason that
impulse/constraint won out over spring/damper is that spring/damper has a
total loss of precision problem with 32-bit floats.

------
mon_insider
I don't remember there ever was a time were FP64 was considered a big deal for
mobIle GPUs. The article is trying to debunk a claim that was never popular to
begin with.

~~~
alexvoica
Allow me to refresh your memory then. How does the following sound to you?

"We also support Full Profile and 64-bit natively, in hardware. After years of
evangelising the benefits of such an approach it is nice to see other players
in the industry join down this avenue." [https://community.arm.com/groups/arm-
mali-graphics/blog/2013...](https://community.arm.com/groups/arm-mali-
graphics/blog/2013/06/03/mali-t622--bringing-full-profile-gpu-compute-to-mid-
range-devices)

"Mali-T622 was specifically tailored for this job. Mali-T622 also supports
OpenCL Full Profile and includes double-precision FP64 and full IEEE-754-2008
floating-point support which are essential features in order to enhance the
user experience" [https://community.arm.com/groups/arm-mali-
graphics/blog/2013...](https://community.arm.com/groups/arm-mali-
graphics/blog/2013/06/03/mali-t622--bringing-full-profile-gpu-compute-to-mid-
range-devices)

I could go on with the examples but I think there's no need to spam the thread
with tens of blog articles that say FP64 and "native 64-bit" (whatever that
means) are essential to the mobile experience.

------
DiabloD3
Is there any point in actually damning FP64 this hard anymore? There is no
reason, imo, for a modern GPU to get worse than 1/3rd performance on FP64 over
FP32.

Side note: Non-Quadro/Tesla "professional" GeForces have FP64 performance
specifically locked out, and AMD has started also doing this with GCN-era
chips.

~~~
alexvoica
My focus was on mobile, not desktop GPUs. However, I've noticed that energy
and area efficiency are starting to become more relevant in desktop too.

At the end of the day, I guess it's all about finding the right balance for
your target application.

------
vessenes
Even if you're doing fintech simulations, FP16 could well be plenty of
precision for a first pass, and then you'd get all those extra cores and
ops/watt.

FP64 seems like a very small use case for most of the parallelized workflows I
can imagine.

~~~
Robi395
For scientific workloads, double precision is a must have. The 7 digits of
FP32 is not enough. In my lab, we haven't updated our Kepler based GPU since
2013 for this reason.

~~~
bnegreve
I agree but the article has an answer to this.

 _Since I don’t plan to run any DNA analysis or fintech simulations on my
smartphone anytime soon, I am very satisfied having FP32 /FP16 precision in
mobile right now. And so should you._

~~~
Robi395
You are right, I thought vessenes meant: "FP64 seems like a very small use
case for most of the parallelized workflows I can imagine" for any platform.

------
sp332
It's not just mobile either. The original Titan card released in early 2013
has much higher FP64 performance than any later Nvidia card.

------
dibanez
It may be that mobile graphics gets by fine with FP32 or less, but I worry
that if FP64 gets sidelined then none of the effort going into this hardware
today will benefit applications that need real precision like science, weather
prediction, GPS, etc.

~~~
clevernickname
Not really a problem. NVIDIA sells cards based around 32-bit (and now
increasingly, 16-bit) ALUs for desktop usage, while offering more expensive
ones with more 64-bit-focused ALUs for workstations and compute. Compute is
important enough to their bottom line to justify it.

The real problem is that NVIDIA has compute locked down with CUDA. Mobile
chipset vendors can't expand into compute if they're barred from entry at the
API level.

~~~
techdragon
Given the quality of OpenCL and its cross platform nature, it's amazing that
everything is still written directly for CUDA...

There were no CUDA -> OpenCL toolchains last time I checked. Which is even
more frustrating.

~~~
mattkrause
There is CU2CL: [http://chrec.cs.vt.edu/cu2cl/](http://chrec.cs.vt.edu/cu2cl/)

The cross-platform nature is actually part of the problem--the whole point of
doing GPGPU work is that you're playing to the hardwares' strengths, which can
be difficult when the hardware can be nearly anything from a CPU to a GPU to
an FPGA.

It doesn't help that until recently, AMD hasn't tried to push OpenCL nearly as
hard as nVIDIA pushes CUDA.

~~~
Athas
Modern AMD and NVIDIA GPUs are fairly similar hardware-wise, and it is not
hard to write OpenCL code that executes efficiently on both. I agree that it
is pretty hopeless to write performance-portable OpenCL across entirely
different architectures, however.

~~~
mattkrause
Sure, but if you go with nVIDIA, you also get access to all the other goodies
they distribute (thrust, cudaFFT, cudaDNN, etc) and all the CUDA-compatible
stuff other people have written, like Theano and TensorFlow.

It does seem like people have gotten a little more interested in OpenCL
lately, but it still lags pretty far behind. As dharma1 says below, AMD seems
weirdly uninterested in catching up. If I were in change of AMD, I'd be
throwing money and programmers at this: "Want to port your library to OpenCL?
Here, have a GPU! We'll help."

------
mjevans
A more optimal approach might be to look at what you are trying to accomplish
and figuring out the scale that works best for what a local view appears to
be.

For example, if I imagine a game world where it's open, but there are natural
limits to useful render distance, then it is possible to define absolute
maximum scale sizes.

My new to this problems space view is that even if the world is larger than
those sizes, there is probably still some limited observer and scale that
makes sense. Building in some spare room and padding in to the scale and it
can then be transformed to center on different points. As movement towards one
of those points happens the new centering for each object could be pre-
computed in spare cycles (or at least spread out so it isn't a single
noticeable hit).

~~~
gue5t
> My new to this problems space view

I'm going to be downvoted because this isn't particularly on-topic, but
nonetheless I'd like to suggest you try hyphenating phrases like this to make
them easier to read, so you don't construct garden-path sentences where the
correct parse exists but isn't obvious.

Thanks!

~~~
__s
Garden path sentences make pleasent experiences in parsing that accentuate the
means to the end resulting communication. Smell the roses

~~~
dietrichepp
I like garden path sentences in my Joyce, but keep them off HN, please.

