Locally it's actually quite easy to setup. I've made an app https://recurse.chat/ which supports Llava 1.6. It takes a zero-config approach so you can just start chatting and the app downloads the model for you.
nope. I am self-hosting. support is pretty good actually. llama.cpp supports it (v1.6 too; and in openai API server as well). ollama supports it. open-web-ui chat too.
using it now on desktop (I am in China, so no OpenAI here) and in cloud cluster on project.