Local LLM's to run on old iMac / Hardware

smoldesu · on Nov 11, 2023

Your hardware should be fine for inferencing, as long as you don't bother trying to get the GPU working.

My $0.02 would be to try getting LocalAI running on your machine with OpenCL/CLBlas acceleration for your CPU. If you're running other things, you could limit the inferencing process to 2 or 3 threads. That should get it working; I've been able to inference even 13b models on cheap Rockchip SOCs. Your CPU should be fine, even if it's a little outdated.

LocalAI: https://github.com/mudler/LocalAI

Some decent models to start with:

TinyLlama (extremely small/fast): https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGU...

Dolphin Mistral (larger size, better responses: https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF

umtksa · on Nov 11, 2023

Thank you so much, actually I tried to run mistral and TinyLlama over ollama without success. But I've never limit inferencing process or anything. Let me try LocalAI

wannabeKazakh · on Nov 12, 2023

when you say run what actually do you mean?

if you can install transformers to your imac, the easiest way is to just check in code real quick, so just run the transformers code to load the model and tokenizer. Start with some basic bert models and move up. if you try to use a model loader app esp with full interface app like textgen or lm studio, you're adding overhead. ollama model server/loader is quite fast to me likely because it only comes with cli interface out of the box but it doesn't support all models and the smallest model is a quantized orca-mini which is unlikely to work for you.