If you want to run on GPU, use the official Python-based stack, which BTW takes about 10GB of runtime binaries on disk, and only supports nVidia GPUs because CUDA.
The link from GP is the CPU only one implemented in C++.
The python + GPU one can be found on the official facebook repo: https://github.com/facebookresearch/llama (Presumably GP thought this was already known to everyone so they pasted the other link)
If you want to run on GPU, use the official Python-based stack, which BTW takes about 10GB of runtime binaries on disk, and only supports nVidia GPUs because CUDA.