Hacker News new | past | comments | ask | show | jobs | submit login

It seems like unified memory has to be the goal. This all just feels like a kludgy workaround until that happens (kind of like segmented memory in the 16-bit era).





Is unified memory practical for a "normal" desktop/server configuration though? Apple has been doing unified memory, but they also have the GPU on the CPU die. I would be interested to know if a discrete GPU plugged into a PCIe slot would have enough latency to make unified memory impractical.

It’s clearly not practical now, but that doesn’t mean it won’t be at some point.

Why is RBAR insufficient? It's been pretty much supported for at least 5 years now.

GPUs benefit from extremely high memory bandwidth. RBAR helps, but it's a lot worse than a fat bus to a bunch of on-card GDDR6X.

A PCIe 4.0x16 link gives 32 GB/s bandwidth; an RTX 4090 has over 1 TB/s bandwidth to its on-card memory.


Sure, but we're talking about RPC/syscall for disk and network transfers from the CPU side. Almost nothing on the CPU side can sustain 1 TB/s anyway--you can only do GPU->GPU transforms for that--and even then for very specific workloads. And the only reason you are reaching off the GPU is because you either need new data or you need the CPU to chew on something that the GPU can't manage due to branchiness.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: