
How Convolutional Neural Networks Work - eternalban
http://brohrer.github.io/how_convolutional_neural_networks_work.html
======
drewm1980
I am supportive of clear explanations of some of the building blocks, but I
worry repeatedly describing things as being grade school math level gives the
wrong impression about the actual learning curve for getting up to speed on
working with CNN's. Yes, the building blocks are easy to understand, but
actually understanding why a given network structure, or optimization
technique isn't working, is a black art. And if you don't have a workstation
with a $2k gpu or two, you're probably not going to have a good time.

~~~
lettergram
I setup an automated script to setup a AWS g2 instance, train my neural net
using tensorflow, copy my model to my personal computer, and spin down. It
costs like $5-$10 to train/test most neural network models. My most expensive
model cost like $100 and required a ton of time and resources. It took like 4
days or something.

You really dont need $2k workstation..

Of course, for personal use I do have a gtx 1080 because I like to game and
play tensorflow/caffe

~~~
anantzoid
Spot instances cost way cheaper. The only downside is you need to create an
AMI everytime before termination. But, also, AWS g2 has NVIDIA Grid K50 with
4GB memory, so it's not very good with performance.

~~~
alien_
The AMI creation is only needed if you store data on the machines, which you
shouldn't do anyway, not even with on-demand ones.

You should always try to keep the instances stateless and store any data
outside the instances, such as on S3 or EFS.

------
spiderfarmer
Question: I have a database with 1.000.000 vehicle pictures, organized by make
and model. What would be the easiest way to play with this data, so that I can
train it to predict the make / model? I don't want to reinvent the wheel now
so much tutorials are written and software is being released. What would be
the easiest way to start?

~~~
polite_cancer
Oh, if you don't want to get into the nitty gritty of it, you should use
Digits ([https://github.com/NVIDIA/DIGITS](https://github.com/NVIDIA/DIGITS)).

This is the easiest way to setup a CNN and train it with your sample images
(at least compared to Caffe, Tensorflow, and Theano). I say that because it's
all GUI based! Real convenient.

~~~
mrfusion
Wow that is cool! What if I wanted to make a model to detect cats? Can I just
load a bunch of pictures from the web? Do I need negative examples?

~~~
polite_cancer
Good question. Detecting just one class (in this case, cats) will require
negative examples. Finding good negatives is somewhat of challenging task
because they should be pretty comprehensive, but if you create an account on
image-net ([http://image-net.org/](http://image-net.org/)), then you can
download thousands of images.

Any questions, feel free to pm me.

------
nilved
Good post, but the author needs to read this article. I interpret the tone in
some places to be condescending.

[https://css-tricks.com/words-avoid-educational-writing/](https://css-
tricks.com/words-avoid-educational-writing/)

------
dicroce
This video is a great introduction to convolutions and pooling.

The other best resource, IMHO, is
[http://karpathy.github.io/neuralnets/](http://karpathy.github.io/neuralnets/).

------
j1vms
> CNNs can be used to categorize other types of data too. The trick is,
> whatever data type you start with, to transform it to make it look like an
> image.

This is an interesting point, and I assume that 'make it look look like an
image' means the same thing as 'think of it as an image'. Can others here who
works with CNNs regularly or professionally, comment on whether the author's
intuition is essentially correct (give or take some details of course)?

~~~
arketyp
It comes down to the characterizing architecture of convolutional nets, that
is weight sharing, and the assumption on data this makes. If by image one
means something where you can expect any pattern (at some level in the
hierarchy) being equally likely to occur anywhere across an input dimension,
then yes this is true. Personally I would say that this is too narrow of a
definition of an image (too great of an assumption), and, interestingly
enough, perhaps too broad too. I am not a pro.

[Edit] Too broad in the sense that, intuitively, there is perhaps an implied
assumption of continuity of the input function defining the image. Note that
such assumptions can be made explicit with various so-called statistical
priors incorporated in the network.

------
partycoder
I think the most intuitive example of a neural network in action is this:
[http://swaption.net](http://swaption.net)

This is not convolutional though.

------
elcct
This is brilliant, so far one of the easiest to understand explanations.

------
jimkittridge
Great explanation. Thanks for sharing.

------
oh_sigh
Really great write up. Ive been trying to wrap my not too mathematically
talented head around convolutional filters and this really helped in
visualizing what is happening.

------
armandtamzarian
Great post. I like how he didn't go into too much detail on the math of
backprop etc. I find the conceptual understanding of ML is more interesting as
a lay person.

~~~
banned4life
If you google "Hinton machine learning" on youtube, you will find hinton's
lecture's they are non-mathematical, he is a psychologist/math guy, and he is
the inventor of almost all this stuff, backprop, drop-out,

You will find his lectures to be very entertaining and easy to understand,
being a psychologist whose desire is to make a computer operate like a human
brain, he's more interested in how the brain actually works, than hacking ML
code.

Hinton describes backprop, why he invented it, and exactly how it emulates the
way the human brain works.

Hinton now works at microsoft, he is considered the modern day 'godfather' of
DeepLearning/ML

~~~
eli_gottlieb
>You will find his lectures to be very entertaining and easy to understand,
being a psychologist whose desire is to make a computer operate like a human
brain, he's more interested in how the brain actually works, than hacking ML
code. > >Hinton describes backprop, why he invented it, and exactly how it
emulates the way the human brain works.

Of course, basically no actual neuroscientists or cognitive scientists think
the brain actually works via supervised backpropagation. So he actually has a
bit of a holy war going on with the people who properly work on human learning
rather than machine learning.

------
MrFeynmannsJoke
So ist works just like i thought it would. Why are CNN so hyped? Wasnt all
this already known decades ago? Or is it just because we can afford the
computing power?

~~~
dougabug
The basic CNN structure was in place, but as the saying goes, "The Devil's in
the details." Early CNN's were applied to problems such as handwritten
character recognition with rows of small grayscale image cells as inputs, and
were much shallower, smaller models. Today's CNN's operate on full resolution,
multi-channel images and video, and can be orders of magnitudes deeper and
larger. For instance, ResNets have been proven to demonstrate monotonic
performance improvements out to 1200 layers on benchmark datasets. This would
have been unthinkable even a couple years ago. By way of comparison, even the
state of the art VGG network architecture of a couple years ago originally had
to be trained in stages to reach 16 and 19 layers for submission to ILSVRC
2014 (Xavier / MSRA initialization makes this unnecessary now). At the time,
VGG and GoogleNet (22 layers) were considered to be extraordinarily deep
CNN's.

------
nstj
Did someone say burritos?

