It can not, that particular library is CPU-only. If you want to run on GPU, use ...

kid64 · on March 12, 2023

Do you have a link for that?

Const-me · on March 12, 2023

Source code: https://github.com/ggerganov/llama.cpp

kid64 · on March 12, 2023

My apologies @Const-me, are you referring to the module "convert-pth-to-ggml.py" in that repo? That appears to be the only Python in there.

hnfong · on March 12, 2023

The link from GP is the CPU only one implemented in C++.

The python + GPU one can be found on the official facebook repo: https://github.com/facebookresearch/llama (Presumably GP thought this was already known to everyone so they pasted the other link)

Const-me · on March 12, 2023

It seems that question can be interpreted in two ways:

1. Do you have a link which proves the implementation being discussed is CPU-only?

2. Do you have a link to the official Python-based implementation?

When I wrote the above answer, I was only aware about the first interpretation.