Well the Llama.cpp running on CPUs with decent speed and fast development improv... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

tommek4077 on Sept 11, 2023 | parent | context | favorite | on: Nvidia’s AI supremacy is only temporary

Well the Llama.cpp running on CPUs with decent speed and fast development improvements, hints towards CPUs. And there the size of the model is less important as the RAM is the limit. At least for interference this is now a viable alternative.

redox99 on Sept 11, 2023 | [–]

Outside of Macs, llama.cpp running fully on the cpu is more than 10x slower than a GPU.

tommek4077 on Sept 11, 2023 | | [–]

But having 32 real cores in a cpu is so much cheaper than having mumtiple gpus. RAM is also much cheaper as VRAM.

kirill5pol on Sept 11, 2023 | [–]

For local yes, but at data center level the parallelization is still often worth it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact