Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Local LLM's
15 points by Ms-J 9 months ago | hide | past | favorite | 12 comments
I've been wanting to run LLM's locally and it looks like there is a huge amount of interest from others as well to finally run and create our own chat style models.

I came across https://github.com/jmorganca/ollama in a wonderful HN submission a few days ago. I do have a Macbook Pro M1 that was top of the line in 2022, the only problem is I have Debian on it as I use Linux.

Could someone point me in the right direction for a beginner like my self on how to run for example Wizard Vicuna Uncensored locally on Linux? I would very much appreciate it, thanks for reading.




Llama.cpp and you can download one of the quantized models directly from "thebloke" on HF. I can't 100% vouch for it because I have no idea how it builds under linux on apple silicon, I'd be very interested to know if there are any issues and how well it uses the processor.

https://github.com/ggerganov/llama.cpp https://huggingface.co/TheBloke

You should be able to at least run the 7B and probably the 13B.

For reference, I can run the 7B just fine on my 2021 Lenovo laptop with 16GB ram (and ubuntu 20.04)


Thanks, yes I've seen someone else mention trying Llama.cpp. I'll see if I can set it up, I'm new to this and will look for a guide on how to use Llama.cpp and report back as if it builds and runs well on Apple Silicon. I think it would be a nice write up for the community as there isn't too much out there about Linux on AS in general.


Ollama does work on Linux, it's just that we haven't (yet) made it work with GPUs other than Metal. We'll get there soon, but we're a small team and wanted to make sure everything was working well before adding more platforms.

You can build it yourself with `go build .` if you've cloned the repository.


This is great to know! Thanks for the update, I will take a look at the Linux version.


Thanks all for replying, I'm sorry I didn't realize that there were replies until now. The advice is great and I'll see if I can get some of the models I referenced running under Linux now and will report back with a write up on how it was achieved if successful.


Koboldcpp (a nice frontend for llama.cpp) is The Way.

You really want to run OSX though, as its not very fast without Metal (or Vulkan). Also, you need a relatively high memory M1 model to run the better llama variants.


I'll take a look into Koboldcpp, a frontend is always nice, thanks! I do have the max specs on this M1.


I believe to get the M1 efficiency for LLMs they use the Metal API which I don't think will work on Linux. You may have to dual boot to use it for ML.


I use Ubuntu only on my computer, and Oobagooba text generation web ui really helped. I hope this helps you!


I've seen this mentioned in some guides I have read (I believe this is the one you are referencing: https://github.com/oobabooga/text-generation-webui) and will definitely look into it, thanks!


There shouldn't be any real roadblocks in your setup. If you can find an inferencing tool with ARM support, you should be good to go.


Do you have any suggestions? I'm new to using LLM's in general, thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: