The link from GP is the CPU only one implemented in C++.
The python + GPU one can be found on the official facebook repo: https://github.com/facebookresearch/llama (Presumably GP thought this was already known to everyone so they pasted the other link)