Saving you some time, if you have a Macbook pro M1/M2 with 32GB of RAM (I presum...

zitterbewegung · on Sept 7, 2023

With your M1 or M2 Max with 64 GB of ram and up you can run using llama.cpp the original llama from Facebook the 65B model.

Here is the starting output of running Llama 65B in a gist

https://gist.github.com/zitterbewegung/4787e42617aa0be6019c3...

MuffinFlavored · on Sept 7, 2023

> GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata.

leoh · on Sept 7, 2023

You can do the first step a lot faster with nix: `nix shell github:ggerganov/llama.cpp -c llama`

seaal · on Sept 7, 2023

Note that the OP repo doesn't yet support GGUF format.

kgwgk · on Sept 7, 2023

What do you mean?

https://github.com/ggerganov/llama.cpp/releases/tag/master-6...

keyle · on Sept 7, 2023

Yes it does. Or do you mean the OP's github repo?

seaal · on Sept 7, 2023

Yeah I was referring to OP, oops

>We currently support models in GGML format. However, the GGML format has now been superseded by GGUF.

>Future versions of OnPrem.LLM will use the newer GGUF format.