Saving you some time, if you have a Macbook pro M1/M2 with 32GB of RAM (I presume a lot of HN folks would), you can comfortably run the `34B` models on CPU or GPU.
And... If you'd like a more hands on approach, here is a manual approach to get llama running locally
- https://github.com/ggerganov/llama.cpp
- follow instructions to build it (note the `METAL` flag)
- https://huggingface.co/models?sort=trending&search=gguf
- pick any `gguf` model that tickles your fancy, download instructions will be there
and a little script like this will get it running swimmingly
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata.
And... If you'd like a more hands on approach, here is a manual approach to get llama running locally
and a little script like this will get it running swimmingly Enjoy the next hours of digging through flags and the wonderful pit of time ahead of you.NOTE: I'm new at this stuff, feedback welcome.