Support in for high level backends almost always comes down to c++ either way.
That goes for NPUs (which are really fragmented anyway, i.e. there is no uniform API), and and also for JAX' TPU backend (which iirc is using XLA).
C++ stings less as an API used in the low level machinery under the hood as long as you as an application author don't have to write code in it.
I haven't done an in depth look but most matrix math accelerators (eg AMD, Intel and Apple) seem to provide C/C++/Python APIs for describing the computations but the code executing on the NPU is not compiled from user C++ code.
"Will be interesting to see if other high level accelerator supporting languages like Chapel or Futhark or JAX end up getting NPU backends, it might give them a nice boost over the proprietary C++ inspired language."
As you say, the GPU (or NPU, TPU,..) don't run C++ or anything derived from it.
The "runtime" (~backend) will usually emit some kind of hardware dependent format (or again, an IR) like SPIR-V, PTX, etc.
But the backend itself is usually written in C++ (due to performance reasons), and there is really no way to get around that.
Interacting with that from Python (or Jax) is a usability win, but there is zero difference in functionality. I.e. there is no proprietary C++ inspired language in play here. Hence no way to get a boost.
Right.I was focusing more on the CUDA-kernels-on-NPU line of thought from patrikthebold's message and its alternatives, which as you say, is not a reality now either.
In the Jax style implementation scenario the compiler part of JAX is better inspiration, maybe along the lines of this case study of a path tracer running on a TPU: https://blog.evjang.com/2019/11/jaxpt.html - I don't think Chapel or Futhark would adopt the same approach as such but it's at least some kind of existence proof of a compiler targeting it from a high level language for a non-machine learning code.