On the Difficulty of Extrapolation with NN Scaling

nerdponx · on Jan 26, 2022

It seems like fancy hyperparameter optimization techniques (e.g. Bayesian black-box optimization) probably don't help here either, because they don't solve the problem of extrapolating outside the range of hyperparameter values have have already been tried. Is that a valid conclusion?

cgearhart · on Jan 26, 2022

I think in theory those techniques should still work in the sense that they give you the best prediction (for some definition of “best”) of the next point to test given all the previous information, but the more hyperparameters you can vary and the further you extrapolate from observations the more likely it is that something surprising happens. You should not expect a fancy tuning algorithm to anticipate surprises—they’re designed to do the opposite by exploiting predictable trends.