Right, that's the trade off. Note that it's the same one both Intel and AMD make with the L2, and also what happens between sockets in multi-socket systems. And separation reduces the cache latency a bit because it costs a couple of cycles to unify the cache. But it's not as good when you have multiple threads fighting over the same data.
> I would probably buy as a desktop, but not for the servers. Also no avx512 which besides the wider instructions the real gain seems to be in an improved instruction set for them.
If you're buying multiple servers the thing to do is to buy one of each first and actually test it for yourself. We can argue all day about cache hierarchies and instruction sets, and that stuff can be important when you're optimizing the code, but it's a complex calculation. If you have the workload where a unified cache is better, but so is having more cores, which factor dominates? How does a 2S Xeon compare with a 1S Epyc with the same total number of cores? What if you populate the second socket for both? How much power does each system use in practice on your actual workload? How does that impact the clock speed they can sustain? What happens with and without SMT in each case?
When it comes down to it there is no substitute for empirical testing.