> The ability to run generative AI models like Llama 2 on devices such as smartphones, PCs, VR/AR headsets
Maybe it's an upcoming feature for the Quest 3?
To that end, I've been pretty amazed by how far quantization has come. Some early llama-2 quantizations[0] have gotten down to ~2.8gb, though I haven't tested it to see how it performs yet. Still though, we're now talking about models that can comfortably run on pretty low-end hardware. It will be interesting to see where llama crops up with so many options for inferencing hardware.
Maybe it's an upcoming feature for the Quest 3?
To that end, I've been pretty amazed by how far quantization has come. Some early llama-2 quantizations[0] have gotten down to ~2.8gb, though I haven't tested it to see how it performs yet. Still though, we're now talking about models that can comfortably run on pretty low-end hardware. It will be interesting to see where llama crops up with so many options for inferencing hardware.
[0] https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/ma...