Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: DeepSwarm – Optimising CNNs Using Swarm Intelligence (github.com)
73 points by Pattio 18 days ago | hide | past | web | favorite | 21 comments

I'm old enough to have started doing machine learning before the deep learning revolution. When it happened, I predicted we'd see a resurgence in metaheuristics (of which ACO, used here, is an example), which just like neural networks in the pre-deep learning era, have a poor reputation among researchers. And it pretty much happened, though at first disguised as "bayesian optimization" [1].

I took an undergrad course in evolutionary optimization, and became briefly excited about it, so I'm fairly familiar with the ideas in that area. I think that similarly to neural networks, they don't really have a great mathematical basis to say what will work and what won't -- it's a lot of empirical experimentation. Do they work? Yeah, kinda. There's really few other options when you have to deal with large, combinatorial spaces such as neural network architectures. I do think a lot of research in the metaheuristics area, at least a few years ago (I haven't really kept up with it) is pretty bogus -- I lampooned it in a couple of "papers" (http://oneweirdkerneltrick.com/spectral.pdf and http://oneweirdkerneltrick.com/catbasis.pdf). Yes, all the citations are real.

[1] Bayesian optimization is great, though I find it amusing that people who wouldn't touch a genetic or swarm algorithm are totally fine with BO when it's really not that different.

There are some people fighting the good fight for more disciplined metaheuristics research. Metaheurstics - the Metaphor Exposed is a nice read (https://www.cs.ubc.ca/~hutter/EARG.shtml/stack/2013_Sorensen...). The author is spot on that a lot of the nature-inspired algorithms don't even make sense as metaphors and they're just obfuscating the descriptions of algorithms unnecessarily.

Looks like an interesting paper - I agree with your single-sentence summary, at least. I'll check it out.

Update: read the paper. It's spot on, great read.

What do you mean they “kinda” work? NAS is all the rage these days. SOA on Imagenet [1], SOA for mobile [2]. Still needs a ton of gpus, but search algorithms getting smarter every month.

PS I have to admit, your papers made me laugh :)

[1] https://arxiv.org/abs/1811.06965

[2] http://www.arxiv-sanity.com/1807.11626v2

Oh yeah, can't argue with results. Similar to deep learning. I've used both deep learning and metaheuristics a fair amount -- I don't care too much about mathematical rigor ;). I just mean, it's the sort of thing that usually needs experimentation, domain knowledge and maybe a bit of luck.

I want to know what 1's CIFAR transfer results are w/o cutout.

FYI, they compare their cifar results to [1], which is more effective than plain cutout.

[1] https://arxiv.org/abs/1805.09501

Heh, I'm familiar with this one too. It implies that, for instance, the Shake-Shake and Shake-Drop papers employ cutout, which they don't report. It's hard to make apples to apples comparisons when they're changing lots of things at the same time.

"The normal (left) and paranormal (right) distributions."

Your papers are fantastic.

:) I forgot to say "our papers" -- credit also goes to the coauthor.

Link to paper with results on Imagenet and comparison with related work?

The paper is not published yet, as this work was done for my dissertation, however we should publish the paper in coming few weeks. As for the results they are not as good as state-of-the-art methods, but they seem to be pretty competitive when compared to other open source libraries. The current problem is that complex nodes like add and skip nodes are not implemented yet, so it can only generated sequential structures. Once the paper is published I will update GitHub readme file.

What about compute resources needed?

Runtime on CIFAR-10 dataset for different ant counts: https://edvinasbyla.com/assets/images/deepswarm-runtime.pdf)

Runtime compared to genetic architecture search (using similar settings): https://edvinasbyla.com/assets/images/devol-deepswarm-runtim...

The error rate on CIFAR-10, before the final training (meaning that topologies weren't fully trained and no augmentation was used): https://edvinasbyla.com/assets/images/ant-before-train.pdf

The error rate on CIFAR-10, before the final training compared to genetic architecture search (using similar settings): https://edvinasbyla.com/assets/images/devol-deepswarm-cifar....

The 2 main factors that contribute to faster search are (1) ants search for architectures progressively (meaning that early architectures can be evaluated really fast), (2) ants can reuse the weights as they are associated with the graph.

All test were done using Google Colab. Even though results might not seem that impressive, I am still really excited to see what will happen when ants will be allowed to search for more complex architectures which use multi-branching.

I just looked up the ant colony algorithm, and intuitively the idea of pheromones and path reinforcement makes a lot of sense. Are you the first one to try it for NN search?

To best of my knowledge there are no published papers that use ACO for CNNs neural architecture search. However, I found three published papers that used ACO for different NAS, but they used static graphs and no heuristics.

This is an interesting project. Is there a way to save the produced model, because I didn't see anything in the examples?

Yes, once you start the search a directory corresponding to the date of your run will be created inside the saves directory. Inside this directory all your models and progress will be saved automatically (you can later resume the search by changing save_folder value inside configuration file). Once the search is done you will find best_topology file which will containt the topology with its weights. Furthermore, inside the deepswarm.log file you will see all the previously evaluated models with their loss/accuracy values.

I usually just optimize CNN architectures using Google's intelligence, i.e. let the big companies with infinite compute find good architectures for me.

That's fair, but I think it is nice to have some alternatives that are free and open. Furthermore, some people might have sensitive data which they wouldn't feel comfortable uploading to the cloud.

Also, I wanted to ask: were you using Gooogle Vision, as when I was doing the research it seemed that they do not allow you to export the model.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact