
Show HN: DeepSwarm – Optimising CNNs Using Swarm Intelligence - Pattio
https://github.com/Pattio/DeepSwarm
======
dimatura
I'm old enough to have started doing machine learning before the deep learning
revolution. When it happened, I predicted we'd see a resurgence in
metaheuristics (of which ACO, used here, is an example), which just like
neural networks in the pre-deep learning era, have a poor reputation among
researchers. And it pretty much happened, though at first disguised as
"bayesian optimization" [1].

I took an undergrad course in evolutionary optimization, and became briefly
excited about it, so I'm fairly familiar with the ideas in that area. I think
that similarly to neural networks, they don't really have a great mathematical
basis to say what will work and what won't -- it's a lot of empirical
experimentation. Do they work? Yeah, kinda. There's really few other options
when you have to deal with large, combinatorial spaces such as neural network
architectures. I do think a lot of research in the metaheuristics area, at
least a few years ago (I haven't really kept up with it) is pretty bogus -- I
lampooned it in a couple of "papers"
([http://oneweirdkerneltrick.com/spectral.pdf](http://oneweirdkerneltrick.com/spectral.pdf)
and
[http://oneweirdkerneltrick.com/catbasis.pdf](http://oneweirdkerneltrick.com/catbasis.pdf)).
Yes, all the citations are real.

[1] Bayesian optimization is great, though I find it amusing that people who
wouldn't touch a genetic or swarm algorithm are totally fine with BO when it's
really not that different.

~~~
_raoulcousins
There are some people fighting the good fight for more disciplined
metaheuristics research. Metaheurstics - the Metaphor Exposed is a nice read
([https://www.cs.ubc.ca/~hutter/EARG.shtml/stack/2013_Sorensen...](https://www.cs.ubc.ca/~hutter/EARG.shtml/stack/2013_Sorensen_MetaheuristicsTheMetaphorExposed.pdf)).
The author is spot on that a lot of the nature-inspired algorithms don't even
make sense as metaphors and they're just obfuscating the descriptions of
algorithms unnecessarily.

~~~
dimatura
Looks like an interesting paper - I agree with your single-sentence summary,
at least. I'll check it out.

------
p1esk
Link to paper with results on Imagenet and comparison with related work?

~~~
Pattio
The paper is not published yet, as this work was done for my dissertation,
however we should publish the paper in coming few weeks. As for the results
they are not as good as state-of-the-art methods, but they seem to be pretty
competitive when compared to other open source libraries. The current problem
is that complex nodes like add and skip nodes are not implemented yet, so it
can only generated sequential structures. Once the paper is published I will
update GitHub readme file.

~~~
p1esk
What about compute resources needed?

~~~
Pattio
Runtime on CIFAR-10 dataset for different ant counts:
[https://edvinasbyla.com/assets/images/deepswarm-
runtime.pdf](https://edvinasbyla.com/assets/images/deepswarm-runtime.pdf))

Runtime compared to genetic architecture search (using similar settings):
[https://edvinasbyla.com/assets/images/devol-deepswarm-
runtim...](https://edvinasbyla.com/assets/images/devol-deepswarm-runtime.pdf)

The error rate on CIFAR-10, before the final training (meaning that topologies
weren't fully trained and no augmentation was used):
[https://edvinasbyla.com/assets/images/ant-before-
train.pdf](https://edvinasbyla.com/assets/images/ant-before-train.pdf)

The error rate on CIFAR-10, before the final training compared to genetic
architecture search (using similar settings):
[https://edvinasbyla.com/assets/images/devol-deepswarm-
cifar....](https://edvinasbyla.com/assets/images/devol-deepswarm-cifar.pdf)

The 2 main factors that contribute to faster search are (1) ants search for
architectures progressively (meaning that early architectures can be evaluated
really fast), (2) ants can reuse the weights as they are associated with the
graph.

All test were done using Google Colab. Even though results might not seem that
impressive, I am still really excited to see what will happen when ants will
be allowed to search for more complex architectures which use multi-branching.

~~~
p1esk
I just looked up the ant colony algorithm, and intuitively the idea of
pheromones and path reinforcement makes a lot of sense. Are you the first one
to try it for NN search?

~~~
Pattio
To best of my knowledge there are no published papers that use ACO for CNNs
neural architecture search. However, I found three published papers that used
ACO for different NAS, but they used static graphs and no heuristics.

------
conmarap
This is an interesting project. Is there a way to save the produced model,
because I didn't see anything in the examples?

~~~
Pattio
Yes, once you start the search a directory corresponding to the date of your
run will be created inside the saves directory. Inside this directory all your
models and progress will be saved automatically (you can later resume the
search by changing save_folder value inside configuration file). Once the
search is done you will find best_topology file which will containt the
topology with its weights. Furthermore, inside the deepswarm.log file you will
see all the previously evaluated models with their loss/accuracy values.

------
raghavsub
I usually just optimize CNN architectures using Google's intelligence, i.e.
let the big companies with infinite compute find good architectures for me.

~~~
Pattio
That's fair, but I think it is nice to have some alternatives that are free
and open. Furthermore, some people might have sensitive data which they
wouldn't feel comfortable uploading to the cloud.

Also, I wanted to ask: were you using Gooogle Vision, as when I was doing the
research it seemed that they do not allow you to export the model.

