I'm aware of multiple papers by top labs where their state-of-the-art results are really the result of some very minor change (e.g., some pre-processing technique, bigger images) that was swept under the rug in their paper. All the math and the complex model had minimal impact on the actual numeric result, with the actual reason being unpublishable in a top venue.
Is this more evidence we have really entered the "click baiting" era? Where the end result of the research is now secondary to getting published and into big conferences to get some notoriety?
all of the things you mentioned goes against
everything I was taught in terms of being
rigorous with your research.
If you're a pro cyclist who isn't doping, good for you! But there's no medal for finishing fourth; you'll be rewarded with medals and sponsorship cash for doping undetectably.
Academics whose rigorous papers aren't quite getting into top-tier journals are in a similar situation.
I strongly disagree here. Researchers, please keep trying to prove that your proposed optimizers are correct, even if you can only manage to prove it for a subclass of problems, and even if it risks publishing an accidentally incorrect proof.
>>and even if it risks publishing an accidentally incorrect proof.
No. If a researcher has even a slightest doubt about the correctness of their proof, they should either spend the time to verify it or talk to an expert. There is no lack of experts in convex optimization in particular. Or, instead of publishing the proof, one could just say "we believe this is provable by these methods".
But one can not try and shift the burden of verification
to the readers, while claiming the results. This is not how trust is built.
It smells of laziness...
I think everyone is looking for gains coming from architecture modification since more power and clever hyperparameter tuning can only go so far.
But given that any of systems requires hyperparameter tuning and that requires a lot of time, it is inherently hard to distinguish between this and novel architectures. If someone says "X worked for me but Y didn't", tuning could always be the explanation.
It seems like one could only really scientifically distinguish two architectures if the hyperparameter methodology was fixed. But that's about the opposite of now as far as my amateur exposure to the field goes.
That should be apples to apples comparison... Or best to best. At the same time fragility of hyperparameters can be evaluated.
I really appreciate the detailed analysis in the article.
You simply put another layer of (mixed-integer) non-linear optimization on top of Deep Learning hyperparameters, i.e. the hyper-parameters like learning rate, batch size, category weights, even the loss function composition, will be the variables you optimize. The (mixed-integer) non-linear optimization method will have its own set of parameters (they all do), so later you might indeed want to optimize those as well, if you get hold of some hyper-computing device allowing you to compress the first two steps to less than your lifetime. You can probably do this kind of hyper^n optimization as deeply as you like but I doubt it would lead you anywhere, given how terrible non-linear optimization performance on general functions is and how short the existence of Universe is.
>"You can probably do this kind of hyper^n optimization as deeply as you like but I doubt it would lead you anywhere, given how terrible non-linear optimization performance on general functions is and how short the existence of Universe is."
In practice you don't need to find the actual optimum though, just do better than the alternatives. This reminds me of "a single nn layer can approximate any function". Sure, but in practice a reasonable approximation will arise much easier with other architectures.
By "101" I thought you meant something like "low level" or "first thing you learn/try". Did you mean something else?
After all, the models that work "well-enough" now, that attain state of the art results, don't have to have absolute best parameters but rather parameters that come from rules of thumb, trial-and-error and sometimes the 101-optimizations mentioned. I mean, we knows things work, there are working results, the challenge is determining exactly how these appear.
I hope the research from the community becomes mainstream soon.
1. grid search - you would now only test for parameters that look promising (based on some criteria). Instead of covering the whole grid, you want to be smart about which points to try out. 
2. you might still do grid search but you want to preferentially allocate resources to parameter exploration based on how promising they seem. Hyperband - - is an example.
3. you can do both - you can be smart about picking parameters, but for the ones you pick you can preferentially allocate resources. Ex *Bayesian Optimization and Hyperband* (BOHB)
 Hyperband: https://arxiv.org/abs/1603.06560
 BOHB: https://arxiv.org/abs/1807.01774
Anecdotal evidence: I wrote a technical report describing an algorithm and its novel applications, etc. A good compromise between a high-level description of the concepts and a documentation of the actual code (and thus a guide for implementing it), in my opinion.
I tried to turn that into a scientific paper and it was rejected. Changed the entire notation to a "math-y" notation that actually "obfuscates or impresses rather than clarifies" (direct quote from the article). Accepted with positive comments on the very same aspects that got it rejected in the first place. This in a Computer Science conference.
Just an aside, it's not just psychology. When you google "replication crisis" most articles say "... in science" for good reason.
When mentioning "replication crisis" I should probably add articles like this one too: https://www.insidehighered.com/news/2018/04/05/scholar-chall...
> to conclude that, although misconduct and questionable research methods do occur in “relatively small” frequencies,
which is kind of expected in a career track that doesn't exactly come with hedge fund-type money -- why would many people be dishonest for grants?
> there is “no evidence” that the issue is growing.
which might mean that the issue isn't in how science is practiced nowadays, but that the scientific process (grants, publication, peer review, research programs) has always been broken.
It's not clear that any of these is the case, but it's depressing to see "well, people are generally not crooks and the game has always been played this way" as an optimistic view of the replication crisis.