Hacker News new | past | comments | ask | show | jobs | submit login

Saving you some time, if you have a Macbook pro M1/M2 with 32GB of RAM (I presume a lot of HN folks would), you can comfortably run the `34B` models on CPU or GPU.

And... If you'd like a more hands on approach, here is a manual approach to get llama running locally

    - https://github.com/ggerganov/llama.cpp 
    - follow instructions to build it (note the `METAL` flag)
    - https://huggingface.co/models?sort=trending&search=gguf
    - pick any `gguf` model that tickles your fancy, download instructions will be there
and a little script like this will get it running swimmingly

   ./main -m ./models/<file>.gguf --color --keep -1 -n -1 -ngl 32 --repeat_penalty 1.1 -i -ins
Enjoy the next hours of digging through flags and the wonderful pit of time ahead of you.

NOTE: I'm new at this stuff, feedback welcome.




With your M1 or M2 Max with 64 GB of ram and up you can run using llama.cpp the original llama from Facebook the 65B model.

Here is the starting output of running Llama 65B in a gist

https://gist.github.com/zitterbewegung/4787e42617aa0be6019c3...


> GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata.


You can do the first step a lot faster with nix: `nix shell github:ggerganov/llama.cpp -c llama`


Note that the OP repo doesn't yet support GGUF format.



Yes it does. Or do you mean the OP's github repo?


Yeah I was referring to OP, oops

>We currently support models in GGML format. However, the GGML format has now been superseded by GGUF.

>Future versions of OnPrem.LLM will use the newer GGUF format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: