
Understanding Convolution in Deep Learning (2015) - teptoria
https://timdettmers.com/2015/03/26/convolution-deep-learning/
======
nabla9
Everyone must first get over the terminology confusion. Convolution in DL is
actually cross-correlation, not convolution. In practise it does not matter,
signal is just flipped, but it can be very confusing when you try to learn and
go trough examples.

~~~
r_c_a_d
The terminology comes from signal processing, where a convolution in the
frequency domain is equivalent to a multiplication in the time domain. I don't
think anyone is thinking about the frequency domain in deep-learning, but they
still call the operators convolution kernels.

~~~
qppo
I mean ultimately it comes from functional analysis and differential equations
(not signal processing).

It's a binary operator on functions that yields a third function. It has a lot
of useful properties and equivalences, like that it can be described as the
product of two Fourier transforms (although that's very roundabout).

You're actually introduced to convolution in middle school when you're taught
how to multiply monomials to build a polynomial (at my middle school they
called it "FOIL").

~~~
elcritch
It appears to be a discrete Fourier, no? Does it apply to all convolutions or
just a specific instance or subset? As in id there a proof showing that as
sample size N goes to a limit it approaches a continuous limit? I still
natively think in continuous convolutions from Physics. The whole
discretization of these operators is oddly harder for me despite it
technically being simpler to compute.

~~~
qppo
No, it's true of both continuous and discrete time/domain Fourier. Convolution
in time is multiplication in frequency, and vice versa. You don't need to
prove this with limits directly, just use the definition of the convolution
integral and Fourier transform integral.

> technically being simpler to compute.

They're equivalent, since the only meaningful way to "compute" a continuous
convolution is symbolically, and discrete convolutions obey most of the same
identities.

If one can place a lower bound on the time step resolution of a simulation
then continuous convolutions are evaluated using discrete convolutions, which
can represent the continuous case exactly via the Nyquist-Shannon sampling
theorem.

Interestingly enough, to prove the Sampling Theorem you need to rely on the
identity that multiplication in frequency is convolution in time, and to prove
that it can't be realized in a physical system (breaks causality, since you
multiply by a superposition of Heavisides which of course are infinitely long
sinc functions in both directions of time).

And more interesting is that signals and systems is mostly applied dynamics
and statistics, so it shouldn't be surprising if there's overlap.

------
lacker
IMO calling it "convolution" in deep learning is extra confusing, because the
word "convolution" means many fairly different things in other contexts.

The idea behind convolution in deep learning is that, if a particular pattern
of pixels is meaningful, then it is probably also meaningful if you shift the
whole thing in some direction. So you can force some layers of the network to
be the same under translation, and it'll be faster to pick up some sorts of
patterns.

~~~
Der_Einzige
You didn't explain why its faster though.

It's faster because its reduces the dimensionality of the inputs down to
something manageable (hundreds or low thousands). You can replace convolutions
with most other types of dimensionality reduction (including other types of
layers) and outside of image tasks you'll get very similar or even better
performance.

~~~
elcritch
I wonder if that’d work by doing a 2d Fourier transformer on an image before
hand, but you’re not reducing dimensionally that way.

------
timkofu
Thanks for sharing this.

------
ellisv
Published in 2015

