Yep, I've been guilty of that one lately. That and solving problems by simply ov...

godelski · 2023-07-17T20:04:10

Definitely. Problem is that doing this helps you get published, not hurts. I think this is why there's often confusion when industry tries to use academic models, as they don't generalize well due to this overfitting. But also, evaluation is fucking hard, and there's just no way around that. Trying to make it easy (i.e. benchmarkism) just adds up creating more noise instead of the intended decrease.

eyegor · 2023-07-17T22:09:10

What about: add more dropout or noise layers and train an ensemble of models. Submit the best one. Is this considered dirty?

godelski · 2023-07-20T01:07:44

It is unclear what you are actually saying. An ensemble of models combines the prediction of multiple models, so I'm not sure how you submit "the best one." But what is standard practice is to do some hyper-parameter search and submit the best one. It is status quo that "the best one" is determined by test performance rather than validation performance (proper, no information leakage). These hyper-parameters do also include the number of dropout layers and the dropout percentage. For "noise layer" I'm also not sure what you mean. Noise injection? I don't actually see this common in practice though it definitely is a useful training technique.

But if in general, you're talking about trying multiple configurations and parameters and submitting the best one, then no that's not dirty and it is standard practice. Realistically, I'm not sure how else you could do this because that's indistinguishable from having a new architecture anyways. People should put variance on values but this can also be computationally expensive and so I definitely understand why it doesn't happen.