Evaluating the Search Phase of Neural Architecture Search

circuithunter · on Nov 22, 2019

This is a really nice paper which asks some critical questions for the future of NAS research.

However, it's important to note that this paper doesn't show that NAS algorithms as originally designed, with completely independent training of each proposed architecture, are equivalent to random search. Rather, it shows that weight sharing, a technique introduced by ENAS [1] which tries to minimize necessary compute by training multiple models simultaneously with shared weights, doesn't outperform random baselines. Intuitively, this makes sense: weight sharing dramatically reduces the number of independent evaluations, and thereby leads to far less signal for the controller, which proposes architectures.

The paper itself makes this fairly clear, but I think it's easy to misinterpret this distinction from the abstract.

[1] ] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. ICML, 2018.

hmwhy · on Nov 22, 2019

Thank you for pointing that out! I did misinterpret that part of the abstract!

On a perhaps related note, it seem a bit surprising to me because when I first started with neural networks about a year ago, I tried to shortcut hyperparameter search by reusing weights and noticed that independently trained models with the same hyperparameters would produce model with different performance. I naively assumed that such correlation is something I don't want and it was something that everyone knows about so I just moved on.

Edit: typo (pointed --> pointing)

heyitsguay · on Nov 22, 2019

I've noticed this too. I've got a paper coming up on the arxiv soon that discusses this phenomenon, and structured random architecture search, in the context of semantic segmentation networks.

bjornsing · on Nov 22, 2019

> On average, the random policy outperforms state-of-the-art NAS algorithms.

Ouch... I guess random policy is the state-of-the-art then, no?

corporateslave5 · on Nov 22, 2019

The problem with neural architecture search is that we don’t know what to optimize on. We need some kind of topological understanding of the data to help build algorithms that can optimize performance. Clearly passing in error rates is not doing the trick

machinelearning · on Nov 22, 2019

I agree, I've been thinking about this too. Do you want to work together on this?

dchichkov · on Nov 22, 2019

Should this be a cause to correct affected papers? If not corrected, maybe even retract?

I understand that there is still clear value in these papers (defining search space, achieving state of the art).

But it does seem, there is some partially broken science there (not evaluating against baselines, algorithm that performs same as a random baseline).

Roark66 · on Nov 22, 2019

I wish they wrote more about how the search space is created. There is appendix A that talks a little bit about search space representation, but doesn't give any details about how the search space is created in the first place. Is it choosing the architecture from a set of given manually created architectures or is it actually creating them at random?

There are some references to other papers so perhaps the details are those papers.

sbierwagen · on Nov 22, 2019

Yo dog, we heard you like neural architecture search, so we wrote a NAS for your NAS so you can optimize your hyperhyperparameters while you optimize your hyperparameters.