You’re right that PCI-E bandwidth is the bottleneck for CPU-to-GPU communication...

NotCamelCase · on June 2, 2020

Number of drawcalls is usually not a problem because it saturates memory, though. It is often because of CPU overhead it incurs and (possibly redundant) commands-processing work on GPUs front-end causing bubbles in the whole pipeline. Where DX12/Vulkan-esque APIs help immensely is the first part mostly -- CPU overhead.

Memory interfacing of GPU commands-streaming/processing front-ends and system memory are very efficient, employing pre-fetching, etc.

fulafel · on June 3, 2020

You can also easily see this in game benchmarks, eg in https://www.gamersnexus.net/guides/2488-pci-e-3-x8-vs-x16-pe... "From a quick look, there is a little below a 1% [game FPS] difference in PCI-e 3.0 x16 and PCI-e 3.0 x8 slots".

Also an important thing to remember is that the vast majority of GPUs (and GPU end-users) out there are iGPUs that share memory with the host and can also be programmed to take advantage of this fact.

freeqaz · on June 2, 2020

Ah, that makes sense. Thank you for clarifying!

willis936 · on June 2, 2020

PCIe 3.0 x16 is a 16 GB/s link, which ain’t bad. By comparison, CPU dual channel DDR4-2400 main memory is 38.4 GB/s.

All processors are memory bandwidth starved. CPUs don’t benefit from wide, high latency access to large main memory as much as GPUs, due to the nature of SISD vs. SIMD. Naturally, GPUs put more focus on a fatter pipe between the processor and main memory (GDDR and 500 GB/s pipes). CPUs operate on fewer data, so you can crank the speed of computation up if you crank the memory speed and keep latency low. This is why so much of CPU dies are dedicated to cache (I think L1 is often single cycle, which results in absurd throughout numbers).

rayiner · on June 2, 2020

L1 is typically 3-4 cycles in modern processors (latency) or 2-4 accesses per cycle (throughout).

freeqaz · on June 2, 2020

What is the difference between bandwidth and throughput for memory?

willis936 · on June 2, 2020

In the context of memory, none. However, bandwidth is usually an analog term: the electromagnetic spectrum that is within 3 dB of the minimum insertion loss.

p_l · on June 2, 2020

Also memory access patterns on GPU are more like batch access - you set up parameters for a large transfer then stream it with high bandwidth (texture, VBO, FBO, etc. etc.)