This is not surprising, because the GPU support in GGML is said to be preliminary and it is optimized for being run on CPUs.
Seeing the times reported by other people, it seems that using the GPU with GGML, instead of the CPU, still provides a speed improvement, but it is small.
Nevertheless, I have appreciated that after following exactly the instructions of this project everything was up and running after a few minutes and it could be tested.
Past attempts to install all the environment needed to run such models have required much more work.
Seeing the times reported by other people, it seems that using the GPU with GGML, instead of the CPU, still provides a speed improvement, but it is small.
Nevertheless, I have appreciated that after following exactly the instructions of this project everything was up and running after a few minutes and it could be tested.
Past attempts to install all the environment needed to run such models have required much more work.