Hacker News new | past | comments | ask | show | jobs | submit login
Llamafile – The easiest way to run LLMs locally on your Mac (ppaolo.substack.com)
27 points by paolop on Dec 4, 2023 | hide | past | favorite | 17 comments



Why?

It's unsafe and it takes all the choice and control away from you.

You should, instead:

1) Build a local copy of llama.cpp (literally clone https://github.com/ggerganov/llama.cpp and run 'make').

2) Download the model version you actually want from hugging face (for example, from https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGU..., with the clearly indicated required RAM for each variant)

3) Run the model yourself.

I'll say this explicitly: these llamafile things are stupid.

You should not download arbitrary user uploaded binary executables and run them on your local laptop.

Hugging face may do it's best to prevent people from taking advantage of this (heck, they literally invented safetensors), but long story short: we can't have nice things because people suck.

If you start downloading random executables from the internet and running them, you will regret it.

Just spend the extra 5 minutes to build llama.cpp yourself. It's very, very easy to do and many guides already exist for doing exactly that.


It only takes away choice if you use the demo files with the models baked in. There are versions of this under the Releases->Assets that are only the actual llama.cpp OS portable binaries that you pass the model file path to as normal.

Compiling llama.cpp is relatively easy. Compiling llama.cpp for GPU support is a bit harder. I think it's nice this OS portable binaries of llama.cpp applications like main, server, and llava exist. Too bad there's no opencl ones. The only problem was baking in the models. Downloading applications off the internet is not that weird. After all, it's the recommended way to install Rust, etc.


> Compiling llama.cpp is relatively easy. Compiling llama.cpp for GPU support is a bit harder.

It is not.

For Mac using M1 as per this specific post, you run “make”.

If you have an older Intel Mac and have to run using cpu, you run “make”.

> Downloading applications off the internet is not that weird. After all, it's the recommended way to install Rust, etc.

Downloading applications from trusted sources is not that weird.

Eg. Rust, from the rust organisation.

Downloading and running user uploaded binaries is a security nightmare.


While in general I agree with your security concerns, here the links are from very trusted sources (Mozilla Internet Ecosystem and Mozilla's innovation group) and the user is well known (present on X too with a large following).

Re: "simplicity", sure for you and I it's simple to compile llama.cpp, but it's like asking a regular user to compile their applications themselves. It's not that simple for them, and should not be required if we want to make AI and OSS AI in particular more mainstream.


The command to run is:

Open terminal

curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/ma...

chmod 755 …

./…

Open localhost:8080 in browser

To make this accessible to a broader cohort you would package it into an app and put it somewhere with provenance, eg. A well known GitHub account or App Store.

The solution, as shown, doesn’t solve either of the problems you’ve said are problems it attempts to solve.

It is a bad solution.


Totally agreed it's not yet ideal - absolutely. But I feel we are expanding the pie of users with this step, which is just an intermediate step. Do you want to work on that packaging ;-)?


See also,

Llamafile is the new best way to run a LLM on your own computer (simonwillison.net)

https://news.ycombinator.com/item?id=38489533

And

https://news.ycombinator.com/item?id=38464057


Thanks! Macroexpanded:

Llamafile is the new best way to run a LLM on your own computer - https://news.ycombinator.com/item?id=38489533 - Dec 2023 (45 comments)

Llamafile lets you distribute and run LLMs with a single file - https://news.ycombinator.com/item?id=38464057 - Nov 2023 (286 comments)


Do you think it would be useful to explain how to macroexpand whenever you do it so the folks you are responding to can learn and do it themselves next time?

(myself included)


I just say "macroexpanded" as a fun metaphor. The Arc code that generates the formatted references is at https://news.ycombinator.com/item?id=35723423.

The rest consists of going through HN Search results and the relevant threads with the help of a lot of keyboard shortcuts (https://news.ycombinator.com/item?id=35668525).

One of these years I do want to make all this available!


I'd like to train one of the provided LLM's with my one data, I heard that RAG can be used for that. Does anyone have any pointers on how this could be achieved with llamafiles all locally on my server?


What's your experience with open source LLMs like LLaVA 1.5 or Mistral 7B?


The cognitivecollective fine-tune for Mistral-7B is by far the best small model I've found. https://huggingface.co/TheBloke/CollectiveCognition-v1.1-Mis...

The llava multi-modal models are fun. I find requesting json formatted output lets you overcome the limited response length baked in. https://huggingface.co/mys/ggml_bakllava-1 (a CLIP+Mistral-7B instead of CLIP+llama2-7B) is my favorite.


The fine-tunes of Mistal 7B, open-Hermes-2.5 and OpenOrca are good. Zephyr is underwhelming.


Why does this keep popping up on here?


Because, people on hackernews are interested more in the prompt engineering. Convenience and satisfaction 》5 minutes of git pull and make


Agreed - and ultimately, you start removing the need to have a git app and git knowledge to pull and compile... it's not just 5 minutes, but you open up the market to way more people. Now, ideally it should just be as installing an app, but it's a good step in that direction.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: