Reading through some (admittedly very early) MLX docs and it seems that convolut...

Reading through some (admittedly very early) MLX docs and it seems that convolutions (as used heavily in GANs and particularly stable diffusion) are not really seeing meaningful uplifts on MLX at all, and in some cases are slower than on the cpu.

Not sure if this is a hardware limitation or just unoptimized MLX libraries but I find it hard to believe they would have just ignored this very prominent use case. It's more likely that convolutions use high precision and much larger tile sets that require some expensive context switching when the entire transform can't fit in the gpu.