The Raspberry Pi provides documentation for their GPU architecture, so it would be possible to provide support for that within open source machine learning frameworks. It would involve quite a bit of work, though, and the RPi is not really competitive with modern hardware in performance-per-watt terms, even when using GPU compute.
I believe, Idein did that. At least they regularly post impressively (for the Pi) fast examples to /r/raspberry_pi like https://redd.it/a5o6ou. It seems the result isn't available individually or as open source but only in the form of a service (https://actcast.io/)
There are some well optimised libraries, for example a port of darknet that uses nnpack and some other Neon goodies. You can do about 1fps with tiny yolo. Not sure if it used anything on the gpu though.
Yes, I know. My point was that CPU-only deep learning is possible on the Pi if you don't need real-time inference. What I wasn't sure of is whether that specific port does anything on the GPU at all, or if it's only using NEON intrinsics.