Pretty obvious stuff, right? I mean, you don't even need HBM for that, you just need a TON of memory channels. Sure, that kind of setup would only be efficient for highly coalesced reads/writes, but that's what you need these days for inference - highly coalesced reads and writes. You could even get by with 64GB of DDR5. DDR5-4800 (rather modest) is 38.4GB/s per channel. To get 1TB/s you'd only need 26 channels. With the more expensive DDR5-6400 you'd only need 20. That doesn't at all sound insurmountable for a company of Intel's caliber. Heck, break up the dies (and the channels) across several chiplets even, if the interconnect is decent it'll still run really well.