The Neural Engine is intended solely or almost solely for inference, not training. For instance, in a post from last November [1], Apple mentioned their tensorflow_macos fork can use the "the GPU in both M1- and Intel-powered Macs for dramatically faster training performance", but didn't mention the Neural Engine. (Incidentally, I think that is the same fork used for the benchmark here.)
[1] https://machinelearning.apple.com/updates/ml-compute-trainin...