Funny you say that, because nobody serious about AI is actually using Nvidia unless they're already locked in with CUDA.
Highest performing inference engines all use Vulkan, and are either faster per dollarwatt on the CDNA3 cards or (surprisingly) the RDNA3 cards, not Lovelace.
Meta has an in-house accelerator that the Triton inference engine supports (which they use almost exclusively for their fake content/fake profiles project). Triton is legacy software and, afaik, does not have a Vulcan backend, so Meta may be locked out of better options until it does.
That doesn't stop Meta's Llama family of models running on anything and everything _outside_ of Meta, though. Llama.cpp works on everything, for example, but Meta doesn't use it.
Highest performing inference engines all use Vulkan, and are either faster per dollarwatt on the CDNA3 cards or (surprisingly) the RDNA3 cards, not Lovelace.