
The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks - jonbaer
https://arxiv.org/abs/1803.03635
======
opwieurposiu
Working on DNNs is more similar to gambling on the slot machines then it is to
traditional programming. I gather training data, write a python script, and
then pay around $1 for GPU time to see if it's a winner. Instead of watching
spinning reels I watch loss function graphs. 95% of the time it is a bust, but
every once in a while you get a big payoff. I feel very smart when I win, but
deeper investigation inevitably reveals I have no idea why A works and B does
not.

~~~
taneq
That's the difference between you and evolution, though, right? You care about
the outcome and the reasons behind it. Evolution doesn't care about either.

~~~
okket
Evolution does have a strong bias towards offspring survival, though it is in
some cases OK with the law of big numbers and statistical probability.

~~~
taneq
I'd say that's less something evolution _has_ and more something it _is_.
Although "overall surviving fecundity" or something probably describes what
it's optimising for pretty well.

~~~
chx
Evolution doesn't optimize for anything but the result are such that, to
quote:

> The prevalent genes in a sexual population must be those that, as a mean
> condition, through a large number of genotypes in a large number of
> situations, have had the most favourable phenotypic effects for their own
> replication.

~~~
taneq
It optimizes for _something_ , not in the sense of intent but in the sense
that there's something that increases as evolution takes place. It's just
pretty hard to describe exactly what. Darwin called it by the delightfully
circular name "fitness", where evolution is the survival of the fittest and
the fittest are those that survive.

(Of course, fitness is relative to an environment and the environment changes
as things evolve, so it's not like there's some global 'fitness' property that
increases. Just that, all else held constant, the thing which evolves tends to
become better suited to reproducing in its environment.)

------
dmichulke
A very cynic and provocative follow-up to this hypothesis (if it turns out to
be true) would be:

DNNs are nothing more than a huge set of NNs where at least one happens to
solve your problem and the DNN finds out which one.

The approach is then essentially the same as a Random Forest with bagging but
with the decision trees substituted by Neural Networks.

This in turn would mean that:

1\. The "Deep <Your Algorithm here>" revolution is in fact not a revolution
but just throwing many models (and thereby resources) at the same problem
while obscuring that you don't have any idea which one will work because the
DNN will do it for you.

2\. The age old problem of initializing the neural network is not at all
solved and can lead to drastically better results if finally somehow
addressed.

~~~
skinner_
"DNNs are nothing more than a huge set of NNs"

As in deep neural networks are nothing more than a huge set of shallow neural
networks? That's directly contradicted by tons of evidence. See for example
the visualizations in [https://distill.pub/2017/feature-
visualization/](https://distill.pub/2017/feature-visualization/)

Your claim 2 is of course correct, supported by the phenomenon of transfer
learning.

------
Mathnerd314
Discussion on Reddit:
[https://www.reddit.com/r/MachineLearning/comments/85eo8v/r_t...](https://www.reddit.com/r/MachineLearning/comments/85eo8v/r_the_lottery_ticket_hypothesis_training_pruned/)

------
rdlecler1
I think the authors are confused about what’s happening here. It’s not
revealing a subnetwork, you’re removing _spurious_ interactions in the fully
connected ANN thereby visually revealing the functional circuit topology
that’s driving the behavior.

It’s helpful to understand the effect of pruning on artificial gene regulatory
network, another class of ANN but mathematically they follow the same rules:

[https://www.researchgate.net/publication/23151688_Survival_o...](https://www.researchgate.net/publication/23151688_Survival_of_the_sparsest_Robust_gene_networks_are_parsimonious)

------
cosmic_ape
I should say its very easy to construct synthetic datasets on which it is very
clear that the only role of large layers is to supply more opportunities for
the initialization to get it right.

So, not much surprise there. But they claim they can extract the smaller
subnet, which could be useful. Except they only provide experiments with very
small nets so far, as the comments on reddit point out.

