Hacker News new | past | comments | ask | show | jobs | submit login
Local LLM's to run on old iMac / Hardware
1 point by umtksa on Nov 11, 2023 | hide | past | favorite | 3 comments
I have an old iMac on which I run Syncthing and Node-RED. I use an M2 for my daily tasks. I SSH into the iMac whenever I need to, and if I require a GUI, I use screen sharing.

My iMac is quite old, dating back to 2012. While I can run most of the LLM models on my M2, I'm struggling to run even the smallest LLMs on my iMac. I've attempted Ollama and LLM.cpp without success. So, is there any local LLM suitable for running on older hardware? I enjoy experimenting with LLMs and writing small shell scripts to interact with them. I'm not looking a replacement for GPT-4; I'm looking for something fun to play with.

iMac's hardware:

- 2.7 GHz Quad-Core Intel Core i5

- 16 GB 1600 MHz DDR3

- NVIDIA GeForce GT 640M 512 MB




Your hardware should be fine for inferencing, as long as you don't bother trying to get the GPU working.

My $0.02 would be to try getting LocalAI running on your machine with OpenCL/CLBlas acceleration for your CPU. If you're running other things, you could limit the inferencing process to 2 or 3 threads. That should get it working; I've been able to inference even 13b models on cheap Rockchip SOCs. Your CPU should be fine, even if it's a little outdated.

LocalAI: https://github.com/mudler/LocalAI

Some decent models to start with:

TinyLlama (extremely small/fast): https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGU...

Dolphin Mistral (larger size, better responses: https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF


Thank you so much, actually I tried to run mistral and TinyLlama over ollama without success. But I've never limit inferencing process or anything. Let me try LocalAI


when you say run what actually do you mean?

if you can install transformers to your imac, the easiest way is to just check in code real quick, so just run the transformers code to load the model and tokenizer. Start with some basic bert models and move up. if you try to use a model loader app esp with full interface app like textgen or lm studio, you're adding overhead. ollama model server/loader is quite fast to me likely because it only comes with cli interface out of the box but it doesn't support all models and the smallest model is a quantized orca-mini which is unlikely to work for you.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: