Hacker News new | past | comments | ask | show | jobs | submit login

Have a repo anywhere?

It's just the same llama.cpp repo everyone else is using. You just git clone it to your android phone in termux and then run make and you're done. https://github.com/ggerganov/llama.cpp

Assuming you have the model file downloaded (you can use wget to download it) these are the instructions to install and run:

pkg install git

pkg install cmake

pkg install build-essential

git clone https://github.com/ggerganov/llama.cpp

cd llama.cpp

make -j


Yeah, I’ve already been running llama.cpp locally, but not found it to perform at the level attested in the comment (30B model as a chat bot on commodity hardware). 13B runs okay, but inference appears generally too slow on to do anything useful on my MacBook. I wondered what you might be doing to get usable performance in that context.

You can change the number of threads llama.cpp uses with the -t argument. By default it only uses 4. For example, if your CPU has 16 physical cores then you can run ./main -m model.bin -t 16

16 cores would be about 4x faster than the default 4 cores. Eventually you hit memory bottlenecks. So 32 cores is not twice as fast as 13 cores unfortunately.

Thanks! Will test that out!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact