

Engineers boost AMD CPU by 20% with software alone; no overclocking - 11031a
http://www.extremetech.com/computing/117377-engineers-boost-amd-cpu-performance-by-20-without-overclocking

======
wwalker3
It sounds like the NCSU guys are using the CPU as a prefetcher to speed up GPU
kernel execution, not using the GPU to speed up normal CPU programs as the
ExtremeTech article implies.

The CPU parses the GPU kernel and creates a prefetcher program that contains
the load instructions of the GPU kernel. This prefetcher runs on the CPU, but
slightly ahead of kernel execution on the GPU. This warms up the caches, so
that when the GPU executes a load instruction, the data is already there.

~~~
profquail
_It sounds like the NCSU guys are using the CPU as a prefetcher to speed up
GPU kernel execution, not using the GPU to speed up normal CPU programs as the
ExtremeTech article implies._

The article says the same thing you are -- that the CPU is used as a
prefetcher for the GPU; read the 3rd paragraph:

 _To achieve the 20% boost, the researchers reduce the CPU to a fetch/decode
unit, and the GPU becomes the primary computation unit. This works out well
because CPUs are generally very strong at fetching data from memory, and GPUs
are essentially just monstrous floating point units. In practice, this means
the CPU is focused on working out what data the GPU needs (pre-fetching), the
GPU’s pipes stay full, and a 20% performance boost arises._

------
teamonkey
This is only tangentially related, but with a title like that I was expecting
a brainless regurgitation of a press release, or some kind of extrapolation
from a paper that wasn't claiming that meaning at all.

Instead, I see a news article with a clear description, caveats and
constraints clearly listed, and a portion of how this relates to the parent
company. It's a shame that I find this surprising.

------
bryanlarsen
The fact that it's only a 20% increase makes it sound promising. Normally
press releases will boast about "100x" increases in speed when they switch to
using the GPU. And you can get that sort of increase for highly parallel tasks
with low memory pressure. BitCoin mining, for example. But the low 20% speedup
implies that they're doing this for general purpose computing.

------
faragon
That's hilarious. Using a _whole_ CPU for prefetching data because of poor
shared bus performance for both the CPUs and GPUs (?!). Instead of such crazy
"software solution", I would rather prefer to use a portion of its L2 or L3
cache size (e.g. 1MB for a 3MB L2/L3 cache) for the GPU itself, and reduce the
bus saturation with DMA transfers (e.g. just like the SPE units of the Cell
CPU work).

------
pessimist
So by using a custom compiler someone speeded up an unspecified benchmark by
20%. Is this news?

~~~
elemeno
If that was all it was then no.

However, what they did was demonstrate a novel way of making use of two
different processing cores that exist on the chip (namely using both the CPU
and an integrated GPU) to improve the performance of their benchmark - which
certainly is both interesting and news.

Of course, a proof of concept is a long long way from it being of practical
benefit!

~~~
Someone
A very, very long way, I would guess. A 20% performance gain is nice, but
having to power a GPU to get it is not. I would expect that adding a second
CPU instead of that GPU almost always will give you more than that 20%
performsnce and less heat, for less money.

~~~
EvanKelly
It depends on the application. If the application, as the article puts it,
"pushes polygons around", then I imagine the APU concept may have the
advantage.

Though, as previously noted, this APU concept is highly dependent on tailored
software (compilers, etc.) and AMD has been banking their strategy on the fact
that these critical pieces will take advantage of the APU.

I think the NCSU research (co-sponsored by AMD) is a move in the right
direction for determining whether these APUs are an effective solution when
compared to the multi-CPU architectures.

------
afhof
GPUs are pretty tailored and aren't really good for general purpose computing.
Branching and cache coherency are much easier in the CPU compared to the GPU.
I doubt that any of the advertised gains would be realized by normal users.

~~~
cbsmith
It was the GPU that ran 20% faster by leveraging the CPU, not the other way
around.

------
nivertech
I hope this has something to do with HSAIL virtual ISA. For example general
purpose code in C compiled to HSAIL and then CPU makes intelligent decisions
which parts of code to JIT-compile to CPU and which to GPU ...

------
KeyBoardG
Hopefully we can get this into drivers sooner than later. AMD has already been
working with Microsoft to get a large performance gain out of BullDozer chips
in Windows 8 simply by the way threads are prioritized.

------
overshard
The title is deceiving as per usual. It's mostly using the GPU and using CPU
for prefetching. Nothing too new here, we know the GPU is faster.

~~~
sliverstorm
_we know the GPU is faster_

More specifically, the GPU is more parallel.

------
gcb
summary: they send all the instructions to the CPU to simply encode them for
the GPU, and let the GPU do the heavy lifting.

and end up saying that AMD is dying as the news love to do.

