However, it's important to note that this paper doesn't show that NAS algorithms as originally designed, with completely independent training of each proposed architecture, are equivalent to random search. Rather, it shows that weight sharing, a technique introduced by ENAS  which tries to minimize necessary compute by training multiple models simultaneously with shared weights, doesn't outperform random baselines. Intuitively, this makes sense: weight sharing dramatically reduces the number of independent evaluations, and thereby leads to far less signal for the controller, which proposes architectures.
The paper itself makes this fairly clear, but I think it's easy to misinterpret this distinction from the abstract.
 ] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via
parameter sharing. ICML, 2018.
On a perhaps related note, it seem a bit surprising to me because when I first started with neural networks about a year ago, I tried to shortcut hyperparameter search by reusing weights and noticed that independently trained models with the same hyperparameters would produce model with different performance. I naively assumed that such correlation is something I don't want and it was something that everyone knows about so I just moved on.
Edit: typo (pointed --> pointing)
Ouch... I guess random policy is the state-of-the-art then, no?
I understand that there is still clear value in these papers (defining search space, achieving state of the art).
But it does seem, there is some partially broken science there (not evaluating against baselines, algorithm that performs same as a random baseline).
There are some references to other papers so perhaps the details are those papers.