It is unclear what you are actually saying. An ensemble of models combines the p...

It is unclear what you are actually saying. An ensemble of models combines the prediction of multiple models, so I'm not sure how you submit "the best one." But what is standard practice is to do some hyper-parameter search and submit the best one. It is status quo that "the best one" is determined by test performance rather than validation performance (proper, no information leakage). These hyper-parameters do also include the number of dropout layers and the dropout percentage. For "noise layer" I'm also not sure what you mean. Noise injection? I don't actually see this common in practice though it definitely is a useful training technique.

But if in general, you're talking about trying multiple configurations and parameters and submitting the best one, then no that's not dirty and it is standard practice. Realistically, I'm not sure how else you could do this because that's indistinguishable from having a new architecture anyways. People should put variance on values but this can also be computationally expensive and so I definitely understand why it doesn't happen.