It's just the same llama.cpp repo everyone else is using. You just git clone it to your android phone in termux and then run make and you're done. https://github.com/ggerganov/llama.cpp
Assuming you have the model file downloaded (you can use wget to download it) these are the instructions to install and run:
Yeah, I’ve already been running llama.cpp locally, but not found it to perform at the level attested in the comment (30B model as a chat bot on commodity hardware). 13B runs okay, but inference appears generally too slow on to do anything useful on my MacBook. I wondered what you might be doing to get usable performance in that context.
You can change the number of threads llama.cpp uses with the -t argument. By default it only uses 4. For example, if your CPU has 16 physical cores then you can run ./main -m model.bin -t 16
16 cores would be about 4x faster than the default 4 cores. Eventually you hit memory bottlenecks. So 32 cores is not twice as fast as 13 cores unfortunately.