Do LLMs have a way around the high end GPU requirements, or can CPU code potenti...

m1el · on April 15, 2023

If you're doing inference on neural networks, each weight has to be read at least once per token. This means you're going to read at least the size of the entire model, per token, at least once during inference. If your model is 60GB, and you're reading it from the hard drive, then your bare minimum time of inference per token will be limited by your hard drive read throughput. Macbooks have ~4GB/s sequential read speed. Which means your inference time per token will be strictly more than 15 seconds. If your model is in RAM, then (according to Apple's advertising) your memory speed is 400GB/s, which is 100x your hard drive speed, and just the memory throughput will not be as much of a bottleneck here.

rahimnathwani · on April 15, 2023

Your answer applies equally to GPU and CPU, no?

The comment to which you replied was asking about the need for a GPU, not the need for a lot of RAM.

ratg13 · on April 15, 2023

There will be LLM specific chips coming to market soon which will be specialized to the task.

Tesla already has already been creating AI chips for their FSD features in their vehicles. Over the next years, everyone will be racing to be the first to put out LLM specific chips, with AI specific hardware devices following.

brucethemoose2 · on April 15, 2023

The next generation of Intel/AMD IGPs operating out of RAM should be quite usable.

dan_mctree · on April 16, 2023

What exactly is the ideal sort of hardware to be able to run and train large models? Do you basically just need a high end version of basically everything?

verdverm · on April 15, 2023

Check out LLAMA-CPP

cahoot_bird · on April 15, 2023

Looks like this was hacked together pretty quickly. This in CPU is exactly what needs optimized to run on more devices, if that's even possible..

I guess it will take hardware and software a while to catch up to compete with ChatGPT..

verdverm · on April 15, 2023

If you look at the news, yes it came together quickly, but it has also gotten a lot of performance upgrades which have made significant improvements.