
Neural nets typically contain smaller “subnetworks” that can often learn faster - pizza
http://news.mit.edu/2019/smarter-training-neural-networks-0506
======
sanxiyn
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks,
[https://arxiv.org/abs/1803.03635](https://arxiv.org/abs/1803.03635)

This is in fact the most interesting hypothesis on why neural networks work I
have ever read.

~~~
amelius
This makes me wonder, if a lottery ticket network is roughly 10% of the size
of the original network, then I would expect that statistically you'd have to
train a _randomly_ initialized verion of it 10 times to get the same result
once. Is that true? Or is the actual topology of importance here?

~~~
mdda
~No. Suppose that 5 of 50 channels in a particular network layer made up the
'lottery ticket' for that layer. The number of 'combos' of 5 channels that
were trained at the same time is 50C5 (i.e. ~2million [0]).

Whereas training 5C5 (=1) 10 times only gives you 10 chances to get the right
5 together.

At least, that's one way to think about it.

[0]: : [https://www.mathway.com/popular-
problems/Finite%20Math/60182...](https://www.mathway.com/popular-
problems/Finite%20Math/601828)

------
ineedasername
_" after training it on a huge amount of data it magically works"_

The magic part can be problematic. I think there have been some recent(ish)
advances in introspection of neural networks to better understand how the
initial input features influence the output, but the "black box" is still a
problem in some practical environments where, for example, deployment of a
model may need to show there's no disproportionate impact on effected sub
populations.

Then there's things like the most recent Tesla Autopilot Crash [0] where
comprehension of the failure mode & cause can be difficult to obtain.

[0]
[https://news.ycombinator.com/item?id=19932918](https://news.ycombinator.com/item?id=19932918)

------
atian
This layout is known as autism. Higher aplasticity. Local hyperplasticity.

~~~
Jedi72
I feel like you mean this this is a joke but it I also feel one of the great
unaddressed problems of strong AI - the human brain, optimized through
evolution, still has a pretty large defect rate. On the march of progress it
seems pretty likely that we will accidentally create an AI which qualifies as
"insane" far before we can create a stable, adult-like being.

~~~
SubiculumCode
He might not be btw

~~~
SubiculumCode
To expand on this, ASD has been characterized as having hyper local
structural/functional connectivity, but hypo long-range structural/functional
connectivity. I think the grandparent was basing his comment on this type of
evidence, but I may be wrong.

------
AstralStorm
Interesting but not unexpected.

How to find winning tickets without actually generating zillions of ANNs? I
smell a genetic algorithm...

~~~
poppis
Tried it. It’s actually faster to prune a very oversized neural network (also
stated in the paper)

~~~
AstralStorm
I didn't mean making the ANN itself, that is well described. I meant to
optimize pruning rather than pruning at random or using ancient "optimal"
algorithms. (They do not optimize for retraining ability. The goal is
robustness.) The pruning algorithm used in the study is rudimentary.

Interesting bit of biology is that humans go through some sort of pruning
phase during adolescence. It is not random. It is not one shot either and real
neurons cannot be resurrected or reset either.

~~~
sdenton4
I haven't heard of particularly compelling advantages for one pruning
algorithm over another... So at this point, it seems best to just keep it dead
simple.

Assuming the lottery ticket hypothesis, you're just looking for the strong sub
network, which should stand out under a number of approaches. The vestigial
neurons shouldn't be doing much...

~~~
AstralStorm
They had to train the various pruned nets 15+ times each to know the cutoff to
be a success...

Also what is a strong subnetwork? What we get to know it's that their pruning
algorithm produces a network that is vulnerable to weight randomization but
better performing with original weights. If there is a better pruning
algorithm, it could tell us more about structure of such subnetworks.

------
asplake
Wondering if there’s a way to iterate between large scale and localised
optimisation. Years ago I did a bit with multigrid for PDE solving which
iterates through ranges of resolution - not quite the same but maybe something
analogous is possible?

------
Aardappelkroket
Why won't you just try training on a smaller network? That way you 'force' the
network to train on the small amount of filters you have right?

~~~
sanxiyn
Empirically, training smaller network fails. This paper is an attempt to
explain why.

------
m0zg
Sparsity is antithetical to performance, unfortunately, in any kind of
realistic computer architecture unless you get _really_ sparse, or at least
block-sparse (s.t. entire blocks of computation could be skipped), and you're
not doing sgemm()-like things. So while this looks great on paper (and the
paper is quite nice), in practice quantized, dense MobileNets will continue to
dominate.

~~~
sanxiyn
This paper is mostly about understanding what's going on. It doesn't claim to
propose any new method of training, or any practically applicable result
whatsoever, because it doesn't.

