

Debunking the 100X GPU vs. CPU myth (PDF) - nkurz
http://portal.acm.org/ft_gateway.cfm?id=1816021&type=pdf&coll=GUIDE&dl=GUIDE&CFID=11111111&CFTOKEN=2222222

======
jacquesm
A couple of notes:

1) the GTX280 is not exactly top of the line anymore, it is now fully two
generations behind. The sweet spot cost/performance wise is the GTX295.

2) you can stick 4 top notch cards in a machine, with 2 GPUs each, giving you
a net 8 fold increase over the results in the paper, or if you take their 2.5
times figure as a baseline for a 20 fold increase over a regular CPU (if you
believe the rest of the claims the paper makes).

3) GPU speedups can not be 'averaged', you either have something that maps
well to the GPU or you don't, if you do extreme speedups are possible, if you
don't you'll end up spending a large amount of time trying to adapt your
problem to the architecture of a GPU, if your time is cheap or of the problem
is large enough that your time spent is a small fraction of the total cost
this may be worthwhile.

4) So, if you have a problem you need to solve only once and the trade-off is
your time vs cpu time then most of the times cpu time will win out, but if the
problem is recurring in nature and matches the model well then GPUs are pretty
much unbeatable for FLOPS/$ and FLOPS/W (at the moment, and no doubt this will
change at some point in the future).

~~~
nkurz
Your points are excellent, but number 2 seems a weak: why think of it as a
single "machine"? If that's the metric, dual- and quad-processing machines
would counter any advantage.

Plus or minus some overclocking, a GTX280 and a Core i7 960 both consume
roughly the same number or watts (130 for the processor vs 200 for the card?)
and cost roughly the same (at least they did when I bought comparables). So
while you can stick (and probably should) stick multiple cards in the same
machine, it seems a little silly measure this in terms of multiples of single
CPU performance.

NVidia's blog response is interesting, and the some of the comments are
actually worth reading: [http://blogs.nvidia.com/ntersect/2010/06/gpus-are-
only-up-to...](http://blogs.nvidia.com/ntersect/2010/06/gpus-are-only-up-
to-14-times-faster-than-cpus-says-intel.html)

To me, the interesting question here is the degree to which the code has been
optimized in each case, and the difficulty of doing so. A while ago there was
a competition to attempt to generate SHA1 hash collision:
[http://www.engineyard.com/blog/2009/programming-contest-
and-...](http://www.engineyard.com/blog/2009/programming-contest-and-the-
winners-are/)

The GPU's trounced every other approach. I participated in the contest
minimally (running slightly tweaked versions other people's code on both an
Nvidia GPU and a Core i7) and I was amazed at how much more efficient the GPU
solution was. I don't recall the exact ratios, but 'orders of magnitude' was
definitely not an exaggeration.

But that's sort of a contrived example. So what I'd love to see here is a
framework that incentivizes each side to come up with their best solution to
some real world problem. My instinct is that there are definitely problems for
which 100x is realistic. Knowing which problems these are (and aren't) would
be a more effective 'debunking' than a one side proclaiming an 'average'
result.

~~~
jacquesm
I think of the GPU as an 'extra', not in place of, so whatever advantage a
regular CPU has you always get more bang for the buck by adding one or more
GPUs to a machine, assuming your workload is a good match. If not then it's
money thrown away.

True that dual and quad processor machines would work even better, but there
too it is extra, and especially quad machines are not nearly as cheap as
adding GPU cards, CPUs get more expensive quickly as the number of CPUs on the
motherboard increases.

Personally I'd like to see research like this come out of different corners
than either Nvidia or Intel.

For those jobs that I used GPUs for (image recognition) they outperformed the
CPU on the same task better than 50 to 1, the trick is to keep communications
down to a minimum and maximize your use of GPU memory. And that's a lot easier
said than done, not all of it is intuitive.

------
nvoorhies
Forgive me if I take a debunking of GPU performance advantages written up by
Intel with a fist sized grain of salt.

~~~
Estragon
Thank you. I came over to the comments precisely in order to find out what the
author's interests might be.

------
strebler
I would say that CUDA is part of their speed problem.

My company has been doing GPU programming for years and we often get
100X-300X+ speedup over the CPU, but we do almost all of our GPU work in
shaders. CUDA/OpenCL is much nicer to deal with, since not all problems map
nicely to pixel/vertex shaders, but we've tended to have a very difficult time
getting CUDA to go from "it's working" to "wow, that's fast". Luckily, in our
space most operations map well onto shaders.

------
rythie
I saw a talk the other day done by an independent researcher saying a similar
thing, comparing a high end Nvidia card to a Intel quad core chip. The speed
up was really only 1-2x most of the time and researcher thought maybe 10x
could be achieved for some codes but 100x was not acheivable.

~~~
mrb
Yes, 100x _is_ achievable. But only with embarrassingly parallel workloads and
when comparing a high-end GPU against a modest dual-core CPU.

For example the AMD HD 5970 can execute exactly 100x more 32-bit integer or
floating point operations per second than a dual-core 2.9GHz CPU with 128-bit
SSE code.

Embarrassingly parallel workloads such as password hash cracking scale exactly
linearly with this 100x difference.

~~~
Andys
In terms of raw FLOPs, the current top ATI card (the $2k FireStream) is 4x the
double precision FLOPs of the current top Intel 6-core ($1k).

~~~
mrb
The Opteron 6176 SE provides 38% more DP FLOPS (110.4) than the "current top
Intel 6-core" i7-980X (80). So the former reduces the gap even more.

------
ableal
Relevant chart here:
[http://realworldtech.com/page.cfm?ArticleID=RWT090909050230&...](http://realworldtech.com/page.cfm?ArticleID=RWT090909050230&p=2)
("shows the peformance per watt and performance/mm2 of silicon for various
CPUs and GPUs.")

