

Ask HN: A is faster than B on GPU but vice versa on CPU - sirtel

Is there an example that:<p><pre><code>    on CPU, Instruction A is faster than Instruction B
    However, 
    on GPU, Instruction B is faster than Instruction A
    (Instructions are assumed to be math operations)</code></pre>
======
dottrap
There is a constant arms race between GPUs and CPUs to be faster. What is
faster today on one, might be slower tomorrow.

GPUs are very good at doing lots of floating point math. Historically, CPUs
have been better at dealing with branching, multiple instruction issue, out of
order execution, integer math, and really pile on the cache architectures.
CPUs have SIMD too so they are no slouch for lots of floating point
calculations either.

Since memory (I/O) is now one of the largest bottlenecks for both GPUs and
CPUs because memory bus speeds are much slower than both, this will often be
your dominant factor. Since most of your data comes from main RAM, your CPU
often lives closer to the data and tends to have aggressive cache
architectures (L1, L2, L3 caches) to help, thus giving the CPU an advantage
when data is not local.

I don't know if it still holds true, but NxM matrix math used to be faster on
CPUs for very large values of N,M because for cache locality, the CPU had an
easier time keeping values that needed to be reused in the matrices in cache.
But GPUs tend to be really good at 4x4 matrices since that is what graphics
primarily uses.

------
dalke
I don't understand the question, as I don't know what "instruction" means in a
portable way. AMD/Intel chips have instructions like LZCNT and CRC32, that
don't exist as an instruction on R700-Family Instruction Set Architecture (nor
other GPUs?).

Even if two functions do mostly the same thing (eg, multiply two floats),
doesn't the Intel architecture have more complete support for the optional
alternate exception handling of IEEE 754? If so, then they aren't really
identical.

So, which instructions do you think are equivalent enough for your comparison?

Performance is driven by economics. Find where the economics between GPUs and
CPUs are different, and you'll likely find where the performance inversion is.

------
Nadya
GPU's are parallel, which means they are better if you perform the same
operation multiple times. GPU's also are better optimized for floating point
arithmetic, but not always integer arithmetic.

This makes sense since GPU's are optimized to calculate graphic-related
things, which is often a bunch of floating point operations.

------
MichaelCrawford
Disk I/O. The GPU would have to ask the CPU to do much of the work. I expect
that in principle a storage controller can DMA into the GPU's VRAM, but I
don't know that anyone actually does that. maybe for texture maps.

