Afaik those unified memory architectures are mostly neither cache coherent nor do they support virtual addresses efficiently (you have to trap into privileged code to pin/unpin the mappings) which means that the relative cost is lower than a dedicated GPU accessed via PCIe slots, but still to high. Only the "boring" old Bobcat based AMD APUs supported accessing unpinned virtual memory from the L3 (aka system level) cache and nobody bothered with porting code to them.
> Afaik those unified memory architectures are mostly neither cache coherent nor do they support virtual addresses efficiently (you have to trap into privileged code to pin/unpin the mappings) which means that the relative cost is lower than a dedicated GPU accessed via PCIe slots, but still to high. Only the "boring" old Bobcat based AMD APUs supported accessing unpinned virtual memory from the L3 (aka system level) cache and nobody bothered with porting code to them.
Other way around, bobcat was the era of “onion bus”/“garlic bus” and today things like apple silicon don’t need to be explicitly accessed in certain ways afaik.