Ask HN: Local LLM's

version_five · 2023-08-09T23:12:41

Llama.cpp and you can download one of the quantized models directly from "thebloke" on HF. I can't 100% vouch for it because I have no idea how it builds under linux on apple silicon, I'd be very interested to know if there are any issues and how well it uses the processor.

https://github.com/ggerganov/llama.cpp https://huggingface.co/TheBloke

You should be able to at least run the 7B and probably the 13B.

For reference, I can run the 7B just fine on my 2021 Lenovo laptop with 16GB ram (and ubuntu 20.04)

Ms-J · 2023-08-17T04:58:36

Thanks, yes I've seen someone else mention trying Llama.cpp. I'll see if I can set it up, I'm new to this and will look for a guide on how to use Llama.cpp and report back as if it builds and runs well on Apple Silicon. I think it would be a nice write up for the community as there isn't too much out there about Linux on AS in general.

Patrick_Devine · 2023-08-11T15:12:27

Ollama does work on Linux, it's just that we haven't (yet) made it work with GPUs other than Metal. We'll get there soon, but we're a small team and wanted to make sure everything was working well before adding more platforms.

You can build it yourself with `go build .` if you've cloned the repository.

Ms-J · 2023-08-17T04:54:41

This is great to know! Thanks for the update, I will take a look at the Linux version.

Ms-J · 2023-08-17T05:06:53

Thanks all for replying, I'm sorry I didn't realize that there were replies until now. The advice is great and I'll see if I can get some of the models I referenced running under Linux now and will report back with a write up on how it was achieved if successful.

brucethemoose2 · 2023-08-10T03:39:25

Koboldcpp (a nice frontend for llama.cpp) is The Way.

You really want to run OSX though, as its not very fast without Metal (or Vulkan). Also, you need a relatively high memory M1 model to run the better llama variants.

Ms-J · 2023-08-17T04:59:52

I'll take a look into Koboldcpp, a frontend is always nice, thanks! I do have the max specs on this M1.

fsmv · 2023-08-10T03:22:07

I believe to get the M1 efficiency for LLMs they use the Metal API which I don't think will work on Linux. You may have to dual boot to use it for ML.

gorenb · 2023-08-10T02:52:38

I use Ubuntu only on my computer, and Oobagooba text generation web ui really helped. I hope this helps you!

Ms-J · 2023-08-17T05:04:38

I've seen this mentioned in some guides I have read (I believe this is the one you are referencing: https://github.com/oobabooga/text-generation-webui) and will definitely look into it, thanks!

smoldesu · 2023-08-09T22:48:01

There shouldn't be any real roadblocks in your setup. If you can find an inferencing tool with ARM support, you should be good to go.

Ms-J · 2023-08-17T05:00:59

Do you have any suggestions? I'm new to using LLM's in general, thanks.