
Neural Network Architectures - billconan
http://culurciello.github.io/tech/2016/06/04/nets.html
======
rustyfe
Could anyone recommend a starting point on Neural Networks for the
uninitiated? The parts of this I understood were fascinating, but I quickly
realized I was looking up every third word, and not really absorbing much.

If I could only read one thing to gain the technical grounding for this
history, what should it be?

~~~
trophygeek
This was the one that flipped the lightbulb for me.

`Hacker's guide to Neural Networks`
[http://karpathy.github.io/neuralnets/](http://karpathy.github.io/neuralnets/)

~~~
trophygeek
Actually, read that one 2nd. Start here:
[https://ujjwalkarn.me/2016/08/11/intuitive-explanation-
convn...](https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/)

~~~
rustyfe
Hey, the premise was I could only read one thing! I kid, much obliged.

~~~
trophygeek
lol. point

Read the 2nd one then.

------
gok
A more accurate title might be "Convolutional Neural Network Architectures" or
"Neural Network Architectures for Computer Vision" but still a nice overview!

~~~
AgentME
One thing I'm confused about is that everyone seems to treat "Convolutional
Neural Networks" as synonymous with or as being the thing that enabled "Deep
Learning", but convolutional neural networks are only for image processing,
right? Are many layer ("deep") networks useless outside of image processing?
Were there other break-through techniques besides convolutional nets that are
necessary for deep networks to work well?

~~~
gok
Convolution neural networks and many-layered networks are useful for things
outside of image processing. CNNs are used for acoustic modeling in speech
recognition, and character-convolutional layers are used in language modeling.
And pretty much all neural networks in use today anywhere are many layered.

As mentioned in the article, using convolutional layers in ANNs was an idea
from the 1980s, but networks that could be trained on the hardware available
at the time were never all that competitive until recently. Once we figured
out how to train big/deep networks (use GPUs, have lots of data, maybe use
pre-training), CNNs started to perform really well. This did make a positive
feedback loop: as CNNs started to work better, deeper networks in general
started to get more attention, which got more people into CNNs, etc.

~~~
AgentME
Are there many-layered deep networks that aren't convolutional neural nets, or
are CNNs practically necessary to make deep networks work? Are there specific
extra techniques not necessary for CNNs that are necessary to make deep non-
convolutional networks work well?

~~~
nl
In natural language processing tasks you see a lot of non-CNN architectures.
These usually are designed to be able to deal with sequential data, so some
kind of "memory" is needed.

Sometimes you see this combined with a CNN. There has been a few question
answering systems that have one or more CNN layers. In don't entirely
understand these designs, but presumably the convultional layers are an
attempt to understand the different orders of words.

There are lots of techniques that people use to try to make deep networks work
well. Mostly theses are about making errors backprog better. One of the most
successful recent innovations is the ResNet architectures
([https://arxiv.org/abs/1512.03385](https://arxiv.org/abs/1512.03385)), and
the related highway networks.

------
Question1101
Isn't data and processing power the most important thing with neural networks?
Even if I knew how they worked I would have no idea what to do with them that
hasn't been done already as a hobbyst without access to huge amounts of data
like companies do.

~~~
p1esk
No. As a researcher, you can make it your goal to find/invent a smallest
possible architecture for a given task (in terms of number parameters, or
number of operations). Alternatively, you can try to invent an architecture to
learn faster from data (or require less data to achieve state of the art
results).

~~~
tree_of_item
> As a researcher

But the post you replied to specifically said "as a hobbyist", so it doesn't
really sound like there's much hope.

~~~
Joof
Hobbyist researcher? Seems like a more accessible plan anyway; fuck bothering
with huge datasets and long training times and just focus on optimizing small
architecture.

------
taliesinb
> The NiN architecture used spatial MLP layers after each convolution, in
> order to better combine features before another layer. Again one can think
> the 1x1 convolutions are against the original principles of LeNet

Why would it be against the original principles of LeNet?

~~~
troyastorino
As far as I understood from the description, in LeNet the convolutional layer
lets you avoid training parameters that will effectively be doing the same
thing as a convolution. Adjacent pixels are highly correlated, so convolutions
can capture most of the information in groups of adjacent pixels without
having to train a fully-connected layer of neurons. Effectively, you're kinda
downsampling the image without losing information.

So, if you're using 1x1 convolutions, I think you're basically having a neuron
per pixel, so you're forcing your fully-connected layers to learn the spacial
correlations of pixels, instead of capturing that information in a
convolutional layer. In other words, you're wasting training on capturing
spacial correlations of adjacent pixels instead of other correlations.

~~~
taliesinb
> So, if you're using 1x1 convolutions, I think you're basically having a
> neuron per pixel, so you're forcing your fully-connected layers to learn the
> spacial correlations of pixels, instead of capturing that information in a
> convolutional layer.

Saying "a neuron per pixel" doesn't mean anything, really, that way of
thinking isn't helpful unless you're looking at small multi-layer perceptrons.
The right way to think about things is that you have tensors and layers that
compute new tensors from old tensors.

A 1x1 convolution only 'sees' the feature channels of a pixel, and does the
same thing to each pixel. So a 1x1 convolution on a grayscale input (e.g. a
1x28x28 tensor in the case of MNIST) does nothing, basically, other than scale
and bias every pixel by the same linear function. It doesn't "force the
network to learn" anything, it's just totally pointless.

One of the uses of 1x1 convolutions is to collapse the feature dimension when
you're deeper in the network (e.g. 100 channels to 10 channels) to reduce
number of parameters subsequent layers need operate on. It's a "channelwise
fully connected layer".

I think you're thinking of (and perhaps what the author was thinking of) is
the practice prior to convnets of collapsing the image into a vector and then
doing a fully connected layer on it. That indeed doesn't exploit translation
invariance of natural images, requires the net to learn the same features in
every required spatial position at great expense, and so on. But that has
nothing to do with 1x1 convolutions.

~~~
troyastorino
Ah yes, you're right, I was thinking of it that way. Thanks a bunch for your
clear and thorough explaination, it makes a lot of sense! So if I understand
what you're saying, a 1x1 convolutional layer for collapsing 100 channels to
10 channels would take a 100x512x512 tensor and collapse it to a 10x512x512
tensor?

[Also, sorry for attempting to answer your quesiton incorrectly. I was
thinking of putting a disclaimer saying I hadn't worked with CNNs and so might
be misunderstanding what the convolutions are doing; probably should have
haha]

Maybe when the author was saying 'one _can_ think the 1x1 convolutions are
against the original principles of LeNet', he was anticipating my kind of
confusion? :)

~~~
lightcatcher
> So if I understand what you're saying, a 1x1 convolutional layer for
> collapsing 100 channels to 10 channels would take a 100x512x512 tensor and
> collapse it to a 10x512x512 tensor?

Correct. As I understand it, this would be applying a 1x1 covolution with 10
filters to a 100x512x512 tensor.

------
partycoder
Not directly related, but never a time waste:
[http://www.scholarpedia.org/article/Encyclopedia:Computation...](http://www.scholarpedia.org/article/Encyclopedia:Computational_neuroscience)

------
phodo
Check out: [http://lazyprogrammer.me](http://lazyprogrammer.me) [his courses
and books are accessible, good, and relatively cheap]

~~~
phodo
Oops that was meant as a reply to "rustyfe" on a starting point resource! Can
i edit the response?

