
Understanding Convolution in Deep Learning - p1esk
https://timdettmers.wordpress.com/2015/03/26/convolution-deep-learning/
======
therobot24
It's nice to see basic signal processing [finally] entering machine learning.
As an EE in the machine learning field, it hurts to read "sliding window" when
you can do the same thing with a few FFTs.

The author mentions correlation, but doesn't properly give the intuition of
what's happening:

> When we perform convolution of an image of a person with an upside image of
> a face, then the result will be an image with one or multiple bright pixels
> at the location where the face was matched with the person.

Well yea, but the best way to explain it is that when you flip the signal and
convolve, you're getting the largest summation when the two align. Simple as
that.

I'm sure we'll see the Circular Harmonic Transform for rotation invariant (up
to a point) Deep Nets, and maybe even the Mellon Transform for scale invariant
(again, up to a point) Deep Nets. <\- Note i haven't done the math per say
showing that these will work, but i can't think of a reason they wouldn't.

~~~
fchollet
> I'm sure we'll see the Circular Harmonic Transform for rotation invariant
> (up to a point) Deep Nets

A fairly easy way to introduce rotation invariance in DCNNS is to perform
random rotations on the inputs during training. Likewise for scale invariance.
Translation invariance is already introduced by the convolution operation
itself.

The thing about deep learning, is that transform kernels are _learned_ , not
pre-computed as in classical signal processing. A DCNN will learn whatever
convolution kernels it needs to perform the task at hand. I wouldn't be
surprised if a DCNN trained on a classical signal processing task ended up
rediscovering some well-known transform kernels originally derived from
physical first principles...

~~~
Nvn
> A fairly easy way to introduce rotation invariance in DCNNS is to perform
> random rotations on the inputs during training. Likewise for scale
> invariance.

It is a bit silly to call these invariances, as different filter/kernel
combinations will be activated when a rotated or scaled input is encountered,
the individual filters are not rotation or scale invariant. The entire network
can only deal with rotations and scales it encountered during training, whilst
having to learn 'redundant features' to a certain extent.

It will get the job done for many tasks, but it's a brute force sort of
approach that will complicate the learning process (i.e. more scales and
rotations require more filters, thus needing a more complex network that is
harder to train).

I think there's definitely a lot that can be learnt from (classical) signal
processing in order to come up with a much more elegant and efficient
solution.

------
flipp3r
I've been working on a project for matching template images myself for a
couple months. I'm using a self made Java library to match static images with
pixels on a user's screen. Mainly my project is about finding stuff that looks
like static images, on a user's screen, fast ( i.e. <10ms ).

The post here is a really good resource, are there any Java libraries (
excluding OpenCV bindings ) that there exist for this kind of template
matching?

~~~
therobot24
Check out correlation filters - there's a ton out there - OTSDF, MMCF, MOSSE,
ZACFs, etc. They're basically designed do template matching but in such a way
that the input statistics are considered to refine the output result for
better matching (less errors, improved separation between classes, etc). I
don't know of any Java libraries, but here is a MATLAB library of different
types
([https://github.com/vboddeti/CorrelationFilters](https://github.com/vboddeti/CorrelationFilters))
and here is a very basic implementation of OTSDF using C++ via the Eigen
library
([https://github.com/jsmereka/PatchBasedCorrelation/tree/maste...](https://github.com/jsmereka/PatchBasedCorrelation/tree/master/Linux/ApplyFilters)).

------
anupshinde
I am an "average" ML researcher and an enthusiast, have worked on quite a few
small/large works myself. Every time I see the kind of engineering complexity
involved in stuff like deep learning - I wonder, how evolution could ever lead
to general intelligence like human-intelligence OR if learning is a logical
process at all

