This is a good point. If the cores in the non-stacked CCD miss in local L3, it's still quicker to ask the stacked CCD instead of going to RAM. But GP's question is valid, how do you prefer the non-stacked CCD for certain high-intensity tasks without hardcoding their names/IDs?
So one might look at the stacked cores as extravagant "L4" controllers serving their faster peers that might occasionally throw in some compute of their own for particularly parallel workloads? Seems rather wasteful, but perhaps it's actually less bad than the performance per watt tradeoffs at high clock rates.
I'd love to see a benchmark with the 3d vcache ccd cores disabled and benched against the non 3d part it is based on with one ccd disabled.
Would be interesting to see at comparable clocks what the performance uplift of hitting the huge cache more often on the other ccd vs hitting main memory.
Can it actually hit the cache on the other chiplet? I thought that Zen's L3 caches were somewhat private to the local CCX/CCD, such that threads running on one chiplet have no way to cause new data to be prefetched or spilled in to someone else's L3.
My assumption was the IO die had a directory of cache lines and would route a request to the other CCD if it were present there. You can't evict to a remote L3 but you can snoop it, I think.
Now, thinking about it... you can't evict to the distant L3, though. So, it really depends on the workload and whether the remote big L3 is "warmed" suitably for you.
Yea... Bummer about the hybrid layout, voltage control will be disbled anyway to prevent killing the 3D die so I don't really care for high boost CCD. If it weren't for the EDC VID bug on Zen 3 taming PBO, I'd be running a manual OC to keep high load voltage and temps in check. Might as well have two tamed 3D CCDs.