The unified memory ought to be great for running LLaMA on the GPU on these Macbo...

rnk · on April 12, 2023

Thank you, that's exactly what I was looking for, specific info on perf.

anentropic · on April 12, 2023

I think the GPU performance for inference is probably limited currently by immaturity of PyTorch MPS (Metal) backend

before I found the repo above I had a naive attempt to get llama running with mps and it didn't "just work" - bunch of ops not supported etc