It seems like unified memory has to be the goal. This all just feels like a klud...

LorenDB · 2024-12-15T00:32:27 1734222747

Is unified memory practical for a "normal" desktop/server configuration though? Apple has been doing unified memory, but they also have the GPU on the CPU die. I would be interested to know if a discrete GPU plugged into a PCIe slot would have enough latency to make unified memory impractical.

selimnairb · 2024-12-15T00:42:31 1734223351

It’s clearly not practical now, but that doesn’t mean it won’t be at some point.

bsder · 2024-12-15T02:52:08 1734231128

Why is RBAR insufficient? It's been pretty much supported for at least 5 years now.

gavinsyancey · 2024-12-15T08:51:01 1734252661

GPUs benefit from extremely high memory bandwidth. RBAR helps, but it's a lot worse than a fat bus to a bunch of on-card GDDR6X.

A PCIe 4.0x16 link gives 32 GB/s bandwidth; an RTX 4090 has over 1 TB/s bandwidth to its on-card memory.

bsder · 2024-12-16T00:12:49 1734307969

Sure, but we're talking about RPC/syscall for disk and network transfers from the CPU side. Almost nothing on the CPU side can sustain 1 TB/s anyway--you can only do GPU->GPU transforms for that--and even then for very specific workloads. And the only reason you are reaching off the GPU is because you either need new data or you need the CPU to chew on something that the GPU can't manage due to branchiness.