He is not wrong, convolutions between an image and a small kernel can be done fa...

He is not wrong, convolutions between an image and a small kernel can be done faster by direct multiplication than by padding the kernel and performing FFT + iFFT. This is what tensor cores are aiming to do really fast. However, doing a convolution betwen an image and a kernel with the similar size is the general use case for the convolution theorem and is the thing that is currently implemented in VkFFT.