Hacker News new | past | comments | ask | show | jobs | submit login

Do what do attribute to the gains? Adaptive clipping? Or $$$ spent on NAS??

They do a little study in section 4.1 comparing batchnorm to adaptive gradient clipping for resnets over a range of hyperparameters, and they also compare perf to batchnorm versions in table 6. The results indicate AGC does give a real boost over batchnorm

They do a bunch of manual hyperparameter tuning that seems necessary to get the state of the art results, from my reading it doesn’t seem like they actually used NAS. Just that the baseline they compare to was found with NAS

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
