> The ability to run generative AI models like Llama 2 on devices such as smartp...

> The ability to run generative AI models like Llama 2 on devices such as smartphones, PCs, VR/AR headsets

Maybe it's an upcoming feature for the Quest 3?

To that end, I've been pretty amazed by how far quantization has come. Some early llama-2 quantizations[0] have gotten down to ~2.8gb, though I haven't tested it to see how it performs yet. Still though, we're now talking about models that can comfortably run on pretty low-end hardware. It will be interesting to see where llama crops up with so many options for inferencing hardware.

[0] https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/ma...