They do a little study in section 4.1 comparing batchnorm to adaptive gradient clipping for resnets over a range of hyperparameters, and they also compare perf to batchnorm versions in table 6. The results indicate AGC does give a real boost over batchnorm
They do a bunch of manual hyperparameter tuning that seems necessary to get the state of the art results, from my reading it doesn’t seem like they actually used NAS. Just that the baseline they compare to was found with NAS