I bet these can all run on ANE. I’ve run gpt2-xl 1.5B on ANE [1] and WhisperKit [2] also runs larger models on it.
The smaller ones (1.1B and below) will be usably fast and with quantization I suspect the 3B one will be as well. GPU will still be faster but power for speed is the trade-off currently.
the ANE seemed to be optimised for small vision models like you might run on an iPhone a couple of years ago
these will be running on the GPU