For llama, the 4bit quantized ones, small models like the 7b one. The ggml forma... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

tayo42 9 months ago | parent | context | favorite | on: Azure ChatGPT: Private and secure ChatGPT for inte...

For llama, the 4bit quantized ones, small models like the 7b one. The ggml format. That will run on your local cpu. Google those terms too. you can look on hugging face for the actual model to download then load it and send prompts to it

stavros 9 months ago [–]

Thanks, maybe it's as easy as downloading the ggml and running it with Llama.cpp. I'll try that, thanks!

tayo42 9 months ago | [–]

there is also a python wrapper that has a web ui built in for llama.cpp, if it wasnt easy enough already

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact