As with all things, it's a tradeoff. HBM on servers is similar to Apple's choice, and Xeon, EPYC, Nvidia H100, and some other designs incorporate it. There are good things (performance) and bad things (price/non-upgradeability) about it. Best of both worlds would he chip on module plus expansion slots, so the fast RAM is like L4 cache.
That's essentially how things are likely to go with CXL, though the latency isn't likely to be quite as good as on a dedicated DIMM connection or even as "good" as it was with IBM's OMI. The future (imminent in enterprise and sometime around when PCIe 6.0 hits consumer machines) looks to be mostly a combination of HBM and CXL memory.