Hacker News new | past | comments | ask | show | jobs | submit login

How does this compare to vLLM or exllama? Can it run llama2 30B on one 3090 24G or 70B on two 3090 24G?

https://github.com/vllm-project/vllm

https://github.com/turboderp/exllama

https://github.com/turboderp/exllamav2




Llama2 was not released with 30B parameters, or was it?


While the llama2-34b base model hasn't been released, CodeLlama2 is effectively a fine-tuned version of 34b and there are some people working with that.

As Ollama uses a llama.cpp fork on the backend, I'd expect its memory usage to be very similar to that.


Oh nope you are 100% correct, I was thinking of the first llama. My buddy is running the 70B llama 2 on two 3090s and the 30B llama 1 on one 3090.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: