
Deep learning - joeyespo
http://neuralnetworksanddeeplearning.com/chap6.html
======
idunning
I was really impressed that the author included this caveat:

> A word on procedure: In this section, we've smoothly moved from single
> hidden-layer shallow networks to many-layer convolutional networks. It's all
> seemed so easy! We make a change and, for the most part, we get an
> improvement. If you start experimenting, I can guarantee things won't always
> be so smooth. The reason is that I've presented a cleaned-up narrative,
> omitting many experiments - including many failed experiments. This cleaned-
> up narrative will hopefully help you get clear on the basic ideas. But it
> also runs the risk of conveying an incomplete impression. Getting a good,
> working network can involve a lot of trial and error, and occasional
> frustration. In practice, you should expect to engage in quite a bit of
> experimentation.

There is a lot of "magical thinking" amongst people not actively doing
research in the area (and maybe a bit within that community too), and I think
it at least partly stems from mainly seeing very successful nets, and never
seeing the many failed ideas before those network structures and
hyperparameters were hit upon - a sampling bias type thing, where you only
read about the things that work.

~~~
jacek
Yes, difficulty of finding right hyperparameters is often overlooked. And it
is a very frustrating part of creating a model. And methods like grid search
just don't work, because of number of parameters to tune and time to train a
network.

~~~
DavidSJ
Actually, random search works a lot better than grid search for hyperparameter
optimization. Usually, only a small number of hyperparameters actually matter,
the trick is figuring out which ones. Grid search wastes time on irrelevant
dimensions.

That said, any sort of hyperparameter optimization is extremely
computationally intensive so random search is far from a panacea.

~~~
anantzoid
So when you search randomly and reach up to a set of optimised parameters, how
do you know if it can't be optimised any further, since you haven't looked up
all possible sets like in a grid?

~~~
shazeline
You generally don't know if you've reached a suitable maxima, which is why it
is good to run a nondeterministic optimizer a few times (if computation power
allows) and see if there are any reliable parameters form there.

There are also somewhat better-than-random strategies such as Bayesian
optimization and particle swarm optimization that can help you to search more
efficiently.

------
jcr
As a different chapter, this is not exactly a dupe, but it's not the first
time links to parts of this book have been posted. Over the last two years,
there have been a lot of HN discussions on the various chapters of this book.
Here are the ones with comments:

16 days ago -
[https://news.ycombinator.com/item?id=9863832](https://news.ycombinator.com/item?id=9863832)

8 months ago -
[https://news.ycombinator.com/item?id=8719371](https://news.ycombinator.com/item?id=8719371)

a year ago -
[https://news.ycombinator.com/item?id=8258652](https://news.ycombinator.com/item?id=8258652)

a year ago -
[https://news.ycombinator.com/item?id=8120670](https://news.ycombinator.com/item?id=8120670)

a year ago -
[https://news.ycombinator.com/item?id=7920183](https://news.ycombinator.com/item?id=7920183)

a year ago -
[https://news.ycombinator.com/item?id=7588158](https://news.ycombinator.com/item?id=7588158)

two years ago -
[https://news.ycombinator.com/item?id=6794308](https://news.ycombinator.com/item?id=6794308)

~~~
zo1
How did you do that? Genuinely curious.

~~~
film42
HN search:
[https://hn.algolia.com/?query=neuralnetworksanddeeplearning....](https://hn.algolia.com/?query=neuralnetworksanddeeplearning.com&sort=byPopularity&prefix&page=0&dateRange=all&type=story)

~~~
visarga
Is there a similar search engine for reddit? I can't access my old reddit
posts by search because reddit has a cutoff point at 1000 results.

~~~
mryan
How about "site:reddit.com visarga"?

------
return0
It's worth reading Nielsen's essay "Will neural networks and deep learning
soon lead to artificial intelligence?" which was added today

[http://neuralnetworksanddeeplearning.com/chap6.html#AI](http://neuralnetworksanddeeplearning.com/chap6.html#AI)

~~~
sampo
And his answer is:

 _I believe that we are several decades (at least) from using deep learning to
develop general AI._

~~~
goodness
I think this is a better summary of his conclusions from that same paragraph:

 _I conclude that, even rather optimistically, it 's going to take many, many
deep ideas to build an AI._

The appendix linked there doesn't seem to be ready yet though. In any case, I
like how this is phrased. I'd like to see some of the hype around deep
learning calm down.

~~~
michael_nielsen
The appendix is at
[http://neuralnetworksanddeeplearning.com/sai.html](http://neuralnetworksanddeeplearning.com/sai.html)

------
thearn4
I've sort of run adjacent to the field of machine learning in the last few
years, but haven't really dove in to the existing literature. This seems to be
a pretty interesting overview.

Out of curiosity, do many implementations of convolutional neural networks
take advantage of FFT, DCT, or some other fast orthonormal transform to
compute the transition between layers, or are the kernel sizes small enough
that there isn't a great advantage to that?

~~~
Houshalter
Facebook actually does something like that:
[https://research.facebook.com/blog/879898285375829/fair-
open...](https://research.facebook.com/blog/879898285375829/fair-open-sources-
deep-learning-modules-for-torch/)

They have a patent on it but did open sourced the code. They claim it's up to
24x faster than standard. But that is only true for an extreme use case, its
only 2x faster on average.

~~~
varelse
The bigger the convolution, the faster it gets because a convolution in real
space is a multiplication in frequency space.

It breaks even at 5x5 or so and gets dramatically better shortly thereafter.
However, most of the convolutional nets in use rely on 3x3 convolutions
because I guess reasons:

[http://arxiv.org/pdf/1409.1556.pdf](http://arxiv.org/pdf/1409.1556.pdf) (all
3x3)

[http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf](http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf)
(3x3 and 5x5)

There's probably a new Imagenet winner in this somewhere IMO...

~~~
onnoonno
One thing I wonder about is whether it is possible to somehow reflect symmetry
in the input data in the structure of the neural network?

For example, the usual way to have a DNN learn rotation/scaling/translation is
to do data augmentation and simply learn with all the data
rotated/translated/shifted.

But there must be a way to have these input space symmetries reflect somehow
in the structure of the network?

I tried googling this a bit but wasn't really successful - does anyone know
whether this has been done?

~~~
kmavm
So, convolution is by itself an attempt to exploit translation-invariance in
the visual world, and typical deep convnets end up picking up a certain amount
of scaling _tolerance_ (though I would not call it invariance) by having
features that are sensitive to larger and larger patches of the input as you
go up the hierarchy of features. This is not real scale-invariance, and many
people run a laplacian pyramid of some sort at test time to get it real scale-
invariance when eking out the best possible numbers.

Rotation-invariance is probably not really a thing you want. The visual world
is not, in fact, rotation-invariant, and the "up" direction on Earth-bound,
naturally-occurring images has different statistics than the "down" direction,
and you'd like to exploit these. Animal visual systems are not rotation-
invariant either; an entertainingly powerful demo of this is "the Thatcher
Effect"
([https://en.wikipedia.org/wiki/Thatcher_effect](https://en.wikipedia.org/wiki/Thatcher_effect)).

Reflection across a vertical axis, on the other hand, often is exploitable, at
least in image recognition contexts (as opposed to, say, handwriting
recognition). If you look at the features image recognition convnets are
learning they are often symmetric around some axis or other, or sometimes come
in "pairs" of left-hand/right-hand twins. As far as I know nobody has tried to
exploit this architecturally in any way other than just data augmentation, but
it's a big world out there and people have been trying this stuff for a long
time.

~~~
onnoonno
I was thinking more about a machine vision context with e.g. different parts
coming in at any rotation angle.

I know that some translation invariance comes from e.g. the usual conv+maxpool
layer structure, but there must still be several representations existing in
the first hidden layer of the network stack, for the different translation
shifts?

Especially rotation looks like something that should produce a lot of symmetry
and shared parameters, but it also looks difficult enough for me that I rather
would like to know about someone with mad math/group theory(?) skills who
looked at that.

But thank you for the detailed reply anyways!

------
billconan
This is the best neural network tutorial out there. I have been waiting for
the missing deep learning chapter for so long, and it finally comes!

my reading for today, thanks for sharing!

------
MrBra
I just wish the code was in Ruby... but having the author released all this
material free I don't feel like actually complaining :) it's more of a subtle
hint than anything else.. ;)

~~~
MBlume
Read the material, skip the code, write your own code in Ruby, that's what
I've been doing in Clojure.

~~~
MrBra
That would be a highly effective way of learning but not everybody has the
time for it... and also it's not just the ready made code that I envy, but
also the added value of someone having already picked the relevant, working
libraries, tools and so on..

------
buserror
Thank you for this, even if it might be a repost, I had missed it, and it's
very, very interesting.

~~~
dang
Not a repost; apparently this chapter was just released.

