You may be right, but I think in practice, people use direct convolution for small kernels, and FFT convolution for large kernels (blocked FFT if the signal is substantially longer than the kernel). I didn't look into the math behind this intuition though.