A slow-end DDR4 speed older generation Xeon system is unlikely to be used by Intel for this benchmark. It's far more likely they used an expensive DDR5 modern Xeon with as many memory channels as they could. Single user LLM inference is memory bandwidth bottlenecked. I just can't see Intel using old/deprecated hardware. And if someone not Intel were to build a Xeon DDR4 system it wouldn't reach the DDR5 tokens/s speeds reported here.
The reason they used a Xeon is memory channels. Non-server CPUs only have 2 but modern Xeons have 8 to 12 depending on generation/type. And the Xeons with the most are the most $$$$ and it ends up cheaper to just get a GPU or dedicated accelerator.
The reason they used a Xeon is memory channels. Non-server CPUs only have 2 but modern Xeons have 8 to 12 depending on generation/type. And the Xeons with the most are the most $$$$ and it ends up cheaper to just get a GPU or dedicated accelerator.