Hacker News new | past | comments | ask | show | jobs | submit login

For llama, the 4bit quantized ones, small models like the 7b one. The ggml format. That will run on your local cpu. Google those terms too. you can look on hugging face for the actual model to download then load it and send prompts to it



Thanks, maybe it's as easy as downloading the ggml and running it with Llama.cpp. I'll try that, thanks!


there is also a python wrapper that has a web ui built in for llama.cpp, if it wasnt easy enough already




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: