Yes my mistake, I read your answer to mean that you think that the model could f...

buu700 · 2024-07-23T23:50:33 1721778633

Got it, thanks, that makes sense. I was aware that memory was the primary bottleneck, but wasn't clear on the specifics of how model sizes mapped to memory requirements or the exact implications of quantization in practice. It sounds like we're pretty far from a model of this size running on any halfway common consumer hardware in a useful way, even if some high-end hardware might technically be able to initialize it in one form or another.

Klaus23 · 2024-07-24T01:18:57 1721783937

GPU memory costs about $2.5/GB on the spot market, so that is $500 for 200GB. I would speculate that it might be possible to build such a LLM card for $1-2k, but I suspect that the market for running larger LLMs locally is just too small to consider, especially now that the datacentre is so lucrative.

Maybe we'll get really good LLMs on local hardware when the hype has died down a bit, memory is cheaper and the models are more efficient.