Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If a GPU has K_1 times the integer or floating-point operations per second of a CPU, and K_2 the memory bandwidth to its on-GPU memory than a CPU to overall system memory, then the absolute best improvement you can reasonably hope for is somewhere between K_1 and K_2. For a typical CPU and typical corresponding GPU (note the highly exact terminology here...) K_1 and K_2 are give-or-take, 10x or so.

Now, it might be a lot less than 10x if your use of the GPU is suboptimal in any way, which means you don't really have a lot of leeway for non-optimality in your design. Plus, there's the handicap I mentioned in main-memory bandwidth - while the GPU's on-board memory is much smaller.

On the other hand - whoever said that the system you're measuring against is making optimal use of CPU resources? It may very well not, in which case the computation and memory throughput ratios are not upper bounds at all.

Just remember that if someone tells you "I got a 50x speedup by porting this to a GPU!" - then more likely than not, their baseline was a massively sub-optimal system. Which is not to say their work is without merit: Improving the performance of a real-life system does make actual work go faster, today, rather than pursuing a dream of future optimality.



Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: