That sounds like 'running on CPU considered to be competitive to running on GPU' and my BS alarms are going off...
...buuuuut, to be fair, I don't either a) particularly care, or b) really have that much knowledge about the Ryzen AI 9 HX 37 CPU.
Anyone care to point me vaguely in the direction of something meaningful that would suggest that CPU based inference would, in general, be even remotely comparable to dedicated GPU performance for AI?
I remain pretty skeptical.
(That reddit thread kicks off with someone saying 'LLM isn't particularly compute intensive workload', which is obviously a) false, and b) gives me no confidence in their review. If LLMs were just memory intensive, we would all just be getting machines with more RAM and no one would be going crazy with 4 parallel 24GB GPUs just to eek out the performance for inference; they'd just slap a 64GB sim in their machine and be done with it).
LLM inference it is memory intensive, but it IS compute intensive too.
Being able to run on a CPU at 1 token per century is not what anyone wants.
> GPD Pocket 4 uses LPDDR5x memory with a speed of 7500MT/s, available in 16GB or 32GB or 64GB capacities. It can allocate up to 16GB of memory to the GPU, allowing AI applications that require large amounts of VRAM to perform optimally.
...buuuuut, to be fair, I don't either a) particularly care, or b) really have that much knowledge about the Ryzen AI 9 HX 37 CPU.
Anyone care to point me vaguely in the direction of something meaningful that would suggest that CPU based inference would, in general, be even remotely comparable to dedicated GPU performance for AI?
I remain pretty skeptical.
(That reddit thread kicks off with someone saying 'LLM isn't particularly compute intensive workload', which is obviously a) false, and b) gives me no confidence in their review. If LLMs were just memory intensive, we would all just be getting machines with more RAM and no one would be going crazy with 4 parallel 24GB GPUs just to eek out the performance for inference; they'd just slap a 64GB sim in their machine and be done with it).
LLM inference it is memory intensive, but it IS compute intensive too.
Being able to run on a CPU at 1 token per century is not what anyone wants.