Hacker News new | past | comments | ask | show | jobs | submit login

So... I was struggling with this for a while. I would says anywhere from 2x to an order of magnitude faster with a GPU. (I've been looking at a lot of GPU benchmarks lately, and they are REALLY hard to compare since they are all so specific)

I do think long term there gets to be more hope for CPUs here with inference largely because memory bandwidth becomes more important than the gpu. You can see this with reports of the MI-300 series outperforming h100, largely because it has more memory bandwidth. MCR dimms give you close to 2x the exiting memory bw in intel cpus, and when coupled with AMX you may be able to exceed v100 and might touch a100 performance levels.

HBM and the general GPU architecture gives it a huge memory advantage, especially with the chip to chip interface. Even adding HBM to a CPU, you are likely to find the CPU is unable to use the memory bw effectively unless it was specifically designed to use it. Then you'd still likely have limited performance with things like UPI being a really ugly bottleneck between CPUs.




If someone releases DDR5 or DDR6 based PIM, then most of the memory bandwidth advantage of GPUs evaporates overnight. I expect CPUs to be king at inference in the future.


But then you'll get GDDR6 delivered via HBM5 or whatever. I don't think CPUs will ever really keep up with the memory bandwidth, because for most applications it doesn't matter.

MCR DIMM is like 1/2 the memory bandwidth that is possible with HBM4, plus it requires you to buy something like 2TB of memory. It might get there, but I'd keep my money on hbm and gpus.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: