Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Recommendations for Local LLMs in 2024: Private and Offline?
48 points by j4ckie 8 months ago | hide | past | favorite | 29 comments
I'm in search of a local LLM that can run completely offline for processing personal documents. Key requirements include privacy (no data leaves my machine) and performance (efficient with large datasets). Any recommendations for open-source / commercial solutions that fit the bill in 2024? Also, what's the current state of local LLMs—are: Are they practical and useful, or still facing significant limitations?



> Are they practical and useful, or still facing significant limitations?

They are. Working on a product using a fine-tuned Mistral-7B-Instruct-v0.2 model and it's pretty mind-blowing. Works flawlessly on my RTX3090 and serviceable on my M1 MBP as well. I'm building in Rust (using the candle crate), but for personal usage Python is probably the better choice since it's easier to get up and running.


Mistral-7B-Instruct-v0.2 is amazing. I've set it as the default model of RecurseChat (a local AI chat app).

It works great, until recently WizardLM 2 came out and I tried it with local RAG. The difference is quite significant: https://twitter.com/chxy/status/1780101542311579942

Considering switching the default model after they put it back on HF.


Mistral-7B-Instruct-v0.2 - I'm using this exact model too, and it is mind blowing, but to get the most out of it, make sure you use llama.cpp and turn on self-extend (I'm not sure if support for self-extend has been merged into main yet, I manually merged a dev branch)


I asked this another place in this thread, but curious to know how you plan to run and deploy this model. What's the cheapest or I guess most cost efficient way to do this without burning money. I'm a college student for reference so I don't have a lot of money lying around to experiment.


You’re right it’s expensive,

I have infra in my house, not gonna lie it cost a lot, I have a rack with 30k of equipment in it (including 5 GPUs)

But this would probably run on an AWS P2 instance with is 0.90 USD an hour, or there’s lambda labs which is also pretty cheap (no affiliation, just satisfied customer)


Thanks for the info! Appreciate it


What's the cheapest way to run and deploy a fine-tuned model today? If I understand correctly I'll need to run cloud GPUs right


+1 interested to know this as well.


What’s the cheapest machine I could Buy to do this?


Shooting from the hip here I’d say $800 would be a good start for all new parts. You might get down to $500 with used parts.

The most expensive piece in either build is a video card. You want to be able to load the LLM file into your video card RAM.

I just got a 16gb new card for this. I can load up to 34b models on it but poorly. Anything 13b or less runs perfectly. A 12gb card would be able to run 7b models and with the right training I think 7b models can be awesome.

New 4060ti Used 3090


Mistral-7B will run on a Raspberry Pi at q04 and a little bit of patience. For good acceleration though, you'll want an Nvidia machine with enough VRAM to comfortable store the model.


We recently added support for local document chat in RecurseChat (https://recurse.chat), including chatting with PDFs and markdown. You can see a demo here: https://twitter.com/chxy/status/1777234458372116865

RAG happens all locally (local embedding model and local vector db).

The app is secured by Mac App Sandbox, meaning it only have access to your selected file in the system dialog or drag and dropped files. If you use a local LLM, everything works offline.


I run local LLM's (Mistral-7B-Instruct-v0.2) using LM Studio (Ollama works well too I believe) and host a local server on my Mac. I can hit the endpoints the same way you would with OpenAI's chat completions API, and can trigger it inline across my other applications using MindMac.


what about hosting? how would you recommend doing that


> I'm in search of a local LLM that can run completely offline for processing personal documents. Key requirements include privacy (no data leaves my machine) and performance (efficient with large datasets). Any recommendations for open-source / commercial solutions that fit the bill in 2024? Also, what's the current state of local LLMs—are: Are they practical and useful, or still facing significant limitations?

We've added support for it in our app if you wanna give it a try: https://curiosity.ai


You need suitable hardware (ideally a 3090, 4090 or an Apple M device with a decent amount of mem).

Then set up software - ollama for easy mode (but less control) or text-generation-webui for more control.

After that you can just try models. The subreddit /r/localllama has whatever is flavour of the week. The Mixtral model at like Q3 quantization is probably a good starting point


With ROCm 6 I think new-ish AMD GPUs are pretty good too.

https://ollama.com/blog/amd-preview

With that said, I dont think you need anything special to run LLMs these days. I can run 7B models on a 4 year old AMD or Intel CPU (no GPU), for programming tasks.


Indeed...bought AMD shares on the bet that they'll catch up.

However this is still true only in a fairly narrow space - inference - once you stray off that very narrow path they're still a fair bit behind


So I can run local LLMs on my apple M1 air?


Yes you can. Download LM studio and then search for a model such as Mistral-7B-Instruct-v0.2. LM studio will suggest variants of that model that suit your hardware to download. Here's a demonstration: https://www.youtube.com/watch?v=VXHryjPu52k


You can do this with Lamma2. There are multiple ways to compile it unless you use python. If you don't have familiarity with cpp i would just stick to python and save yourself time. Buy a big PC that can handle it.


Use ollama and browse the available models, download some, and try them out. Ollama is a llama.cpp front end.

https://ollama.ai


That domain redirects to the new one, which should probably be linked directly instead now.

https://ollama.com


How does that compare to gpt4all?


Are there any which can generate consistent characters (especially faces)?


My kingdom for a local LLM supporting my trusty Intel MacPro and AMD RX6900 XT!


This video demonstrates using LM Studio to run an LLM on an RX 7600XT 16gb so you should be able to run it on an RX 6900XT: https://www.youtube.com/watch?v=VXHryjPu52k


You should be able to run something via openCL and llama.cpp on it


jan.ai




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: