
Evaluating the Search Phase of Neural Architecture Search - reedwolf
https://arxiv.org/abs/1902.08142
======
circuithunter
This is a really nice paper which asks some critical questions for the future
of NAS research.

However, it's important to note that this paper doesn't show that NAS
algorithms as originally designed, with completely independent training of
each proposed architecture, are equivalent to random search. Rather, it shows
that weight sharing, a technique introduced by ENAS [1] which tries to
minimize necessary compute by training multiple models simultaneously with
shared weights, doesn't outperform random baselines. Intuitively, this makes
sense: weight sharing dramatically reduces the number of independent
evaluations, and thereby leads to far less signal for the controller, which
proposes architectures.

The paper itself makes this fairly clear, but I think it's easy to
misinterpret this distinction from the abstract.

[1] ] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural
architecture search via parameter sharing. ICML, 2018.

~~~
hmwhy
Thank you for pointing that out! I did misinterpret that part of the abstract!

On a perhaps related note, it seem a bit surprising to me because when I first
started with neural networks about a year ago, I tried to shortcut
hyperparameter search by reusing weights and noticed that independently
trained models with the same hyperparameters would produce model with
different performance. I naively assumed that such correlation is something I
don't want and it was something that everyone knows about so I just moved on.

Edit: typo (pointed --> pointing)

~~~
heyitsguay
I've noticed this too. I've got a paper coming up on the arxiv soon that
discusses this phenomenon, and structured random architecture search, in the
context of semantic segmentation networks.

------
bjornsing
> On average, the random policy outperforms state-of-the-art NAS algorithms.

Ouch... I guess random policy _is_ the state-of-the-art then, no?

------
corporateslave5
The problem with neural architecture search is that we don’t know what to
optimize on. We need some kind of topological understanding of the data to
help build algorithms that can optimize performance. Clearly passing in error
rates is not doing the trick

~~~
machinelearning
I agree, I've been thinking about this too. Do you want to work together on
this?

------
dchichkov
Should this be a cause to correct affected papers? If not corrected, maybe
even retract?

I understand that there is still clear value in these papers (defining search
space, achieving state of the art).

But it does seem, there is some partially broken science there (not evaluating
against baselines, algorithm that performs same as a random baseline).

------
Roark66
I wish they wrote more about how the search space is created. There is
appendix A that talks a little bit about search space representation, but
doesn't give any details about how the search space is created in the first
place. Is it choosing the architecture from a set of given manually created
architectures or is it actually creating them at random?

There are some references to other papers so perhaps the details are those
papers.

------
sbierwagen
Yo dog, we heard you like neural architecture search, so we wrote a NAS for
your NAS so you can optimize your hyperhyperparameters while you optimize your
hyperparameters.

