The point of llama.cpp is most people don't have a GPU with enough RAM, Apple unified memory ought to solve that
Some people have it working apparently:
https://github.com/remixer-dec/llama-mps
before I found the repo above I had a naive attempt to get llama running with mps and it didn't "just work" - bunch of ops not supported etc
The point of llama.cpp is most people don't have a GPU with enough RAM, Apple unified memory ought to solve that
Some people have it working apparently:
https://github.com/remixer-dec/llama-mps