Hacker News new | past | comments | ask | show | jobs | submit login

It can not, that particular library is CPU-only.

If you want to run on GPU, use the official Python-based stack, which BTW takes about 10GB of runtime binaries on disk, and only supports nVidia GPUs because CUDA.




Do you have a link for that?



My apologies @Const-me, are you referring to the module "convert-pth-to-ggml.py" in that repo? That appears to be the only Python in there.


The link from GP is the CPU only one implemented in C++.

The python + GPU one can be found on the official facebook repo: https://github.com/facebookresearch/llama (Presumably GP thought this was already known to everyone so they pasted the other link)


It seems that question can be interpreted in two ways:

1. Do you have a link which proves the implementation being discussed is CPU-only?

2. Do you have a link to the official Python-based implementation?

When I wrote the above answer, I was only aware about the first interpretation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: