
AMD reverses GPGPU trend, announces OpenCL SDK for x86 - habs
http://arstechnica.com/hardware/news/2009/08/amd-announces-opencl-sdk-for-x86.ars
======
yan
I guess I'm not understanding something. How are they reversing a trend?

Apple spent a lot of time and money developing the OpenCL standard and gaining
industry support. OpenCL isn't strictly about GPGPU computing, it's about
maximally utilizing available resources. AMD is adopting the standard, and
they just implemented the CPU portion of it for now and went ahead to release
it. There is no doubt in anyone's mind that they'll follow up with supporting
GPU target. So if anything, they're embracing the trend, just trying to bring
their tools to completion.

So yeah, the current SDK doesn't support GPU targets, but how is this in any
way shape or form reversing the trend?

~~~
plesn
Exactly, I would even argue that without OpenCL on CPU (which can serve e.g.
as a fall-back when your GPU can't do it), OpenCL wouldn't make sense.

------
dlevine
I feel like CPUs and GPUS are converging anyways. They are both going
multicore, and GPUs are becoming better suited to general-purpose processing.
Sure GPUs have lots of simple dedicated cores, while CPUs have fewer more
complex cores, but over time this will converge.

In the long run, you won't need dedicated logic for graphics processing, at
least on the low-end. Processors will just dedicate a few cores to graphics
processing.

If you doubt this, look at what people said a few years ago about graphics
integrated into the chipset. Now you have respectable solutions like NVidia's
9400. Within a few years, I bet that some relatively high-end solutions will
be integrated into the silicon, and the low-end processors will do everything
on the CPU. Over the long-term (probably when we have 32-64 cores per CPU),
the GPU will become irrelevant.

~~~
lutorm
I don't think they are converging at all. The fundamental difference is that
GPUs are using the vast amount of silicon real estate that in a CPU is used
for cache for computational units. That gives it much higher computational
peak performance, but it also means that your memory is uncached, which will
kill you in applications that do unstructured memory access.

I think of it as analogous to cars vs semitrucks. Trucks are efficient at
hauling massive amounts of stuff to the same place, but not at transporting
100 commuters to their different workplaces. They are optimized for different
problems and I don't think there's a better argument for saying that GPUS and
CPUs will converge than to say that semitrucks and cars will converge.

~~~
scott_s
But GPUs do have an internal memory hierarchy; there is a local cache with
much faster access times than global memory.

Personally, I agree with the parent. Aside from the obvious solution of
integrating a GPU-like accelerator on the chip, the multicore trend in general
purpose CPUs has more, simpler cores. I agree with your overall
characterization of what CPUs are good at versus what GPUs are good at, but I
still see some convergence.

------
pmorici
This looks to me like a publicity stunt since ATI's original GPGPU offering
was a flop. Nvidia's OpenCL SDK actually works on their graphics cards though
it is in limited beta last I checked.

~~~
lutorm
Not to mention that you can compile CUDA code in "emulation" mode for
execution on the CPU, too. The utility is largely for debugging, because the
CPU is darned slow executing it, so I'm not sure it has any use in production.

------
jacquesm
Interesting development! One thing I don't get is if you are developing stuff
for GPGPUs and you don't have one yourself that getting one is hardly a big
hurdle.

The cheapest cards that will work are in the $100 range, hardly a major
expense.

The GPGPU is probably one of the biggest developments in the last couple of
years, it puts enormous power in the hands of individuals with a surprisingly
small power bill and footprint.

~300W for a 480 core machine is not bad at all!

~~~
scott_s
Enormous, but extremely specialized power.

I'm not sure if most people understand the kinds of threads/cores that GPUs
actually have, and with that, the kinds of computations they're actually good
at. A GPU's threads are extremely simple execution pipelines - nothing like
the cores of an Intel Quad Core chip. Groups of GPU threads actually execute
together in lock-step, executing the same code (but with different
parameters).

Such devices are extremely good at handling data parallelism - where the same
computation is performed on a large chunk of data - seen in areas such as
graphics and scientific computing. But you're probably not going to, say,
speed up your webserver with a GPU. If you don't have a computation that is
extremely data parallel - that is, capable of having thousands of execution
contexts working on data at the same time - a GPU won't help.

~~~
apu
Things are changing extremely rapidly.

Researchers (especially in parallel computing) are already figuring out how to
port all sorts of algorithms over to the GPU -- and not just scientific or
graphics code -- we're talking general purpose algorithms like sorting,
ranking, cryptography, networking, database operations, etc:

[http://gpgpu.org/2009/04/13/efficient-acceleration-of-
asymme...](http://gpgpu.org/2009/04/13/efficient-acceleration-of-asymmetric-
cryptography-on-graphics-hardware)

[http://gpgpu.org/2009/04/28/fast-and-scalable-list-
ranking-o...](http://gpgpu.org/2009/04/28/fast-and-scalable-list-ranking-on-
the-gpu)

[http://gpgpu.org/2008/12/11/wait-free-programming-for-
genera...](http://gpgpu.org/2008/12/11/wait-free-programming-for-general-
purpose-computations-on-graphics-processors)

[http://gpgpu.org/2008/10/26/gnort-high-performance-
network-i...](http://gpgpu.org/2008/10/26/gnort-high-performance-network-
intrusion-detection-using-graphics-processors)

[http://gpgpu.org/2008/05/25/a-fast-similarity-join-
algorithm...](http://gpgpu.org/2008/05/25/a-fast-similarity-join-algorithm-
using-graphics-processing-units)

The biggest obstacle to preventing widespread adoption of GPGPU techniques
today is that things are still in so much flux that mature, stable APIs have
not yet emerged. There is fierce competition between Intel (Larrabee),
Apple/AMD (Grand Central/OpenCL), and NVIDIA (CUDA) to get their respective
APIs to be the dominant ones, and no one's yet come up with a mature wrapper
API that can target any of them.

This is in large part because the hardware itself is still in flux, morphing
away from the graphics-specific pipeline of 10 years ago and into a general
purpose SIMD architecture. This has meant a change in the kinds of operations
allowed on the GPU. For example, branches used to be a big no-no in graphics
hardware: the performance hit caused by a branch was disastrous on a
massively-multicore system. However, all hardware makers are now adding
supporting for branches, because it's almost impossible to port a lot of CPU
algorithms without them.

In any case, I'm confident that within a year or two, things will have settled
down somewhat, at which point there's going to be a mad dash by developers to
start using GPUs (probably systems- and library-level developers rather than
application-programmers).

~~~
scott_s
I _am_ a researcher in parallel computing. All of these papers represent good
work and are contributions to the field, but they still obey the limited model
I was referring to: ship a large amount of data to the GPU, do data parallel
computations on that data, and ship the results back. As I pointed out in a
post below, this model does not work well when you only have small quantities
of data at a time, yet there is parallelism to exploit.

~~~
apu
Ah sorry, I didn't realize that =)

I'll defer to your judgment about this stuff, then. (I'm a PhD student in
computer vision, and we've recently been using a GPU SVM library that's been
amazing for cutting down our processing times, so I guess I've been a little
dazzled by this stuff.)

Anyways, since you're in this field, what's your feeling about the future of
parallel computing, with regards to the different vendors? Which of
CUDA/OpenCL/Larrabee will win out? Or none of the above? When will APIs settle
down?

~~~
scott_s
Honestly, I don't know, and anyone who claims to know is selling you
something.

Your question is _the_ question in parallel computing right now. And it
effects all sizes and scales, from processor architecture (look at the
different architectures of an Intel Quad Core, Cell, GPUs and upcoming Larabee
and Fusion) to supercomputers (BlueGene style thousands of slow cores with
fast interconnect, RoadRunner style of typical multicore processors with Cells
as accelerators, Nvidia's giant GPU box, or just lots of SMPs). We don't know
what the future will look, which makes this an interesting time to be in the
field. People at all levels are experimenting with different architectures. We
don't know what will win, if any _one_ thing will win, and when we'll know.

With that said, I don't think APIs at the processor level will settle down
until the hardware does. My understanding of OpenCL is to have a programming
model that would work on architectures as different as GPUs, Cell and
Larrabee, and that this will supplant Cuda. That sounds like a great idea, but
lots of great ideas haven't worked in practice before.

I think it's going to be several at least several years of experimentation
before the hardware settles down. My own belief (that is, opinion not based on
experimental data) is that we'll end up with a heterogeneous chip with lots of
simple cores for parallelism, a small number of sophisticated cores for
sequential computation, all part of an integrated memory hierarchy.

------
Keyframe
Do not forget that intel released a C++ Larrabee Prototype Library for anyone
wanting to experiment. Just sayin'...

------
zandorg
Does this come with an emulator for Windows, so you can test it without a GPU?

