How does this compare to vLLM or exllama? Can it run llama2 30B on one 3090 24G ...

harph · on Sept 26, 2023

Llama2 was not released with 30B parameters, or was it?

lhl · on Sept 27, 2023

While the llama2-34b base model hasn't been released, CodeLlama2 is effectively a fine-tuned version of 34b and there are some people working with that.

As Ollama uses a llama.cpp fork on the backend, I'd expect its memory usage to be very similar to that.

aftbit · on Sept 26, 2023

Oh nope you are 100% correct, I was thinking of the first llama. My buddy is running the 70B llama 2 on two 3090s and the 30B llama 1 on one 3090.