Ignoring the fact that the title of this post is misleading and also not used by the linked article, I was interested to read that MaxNorm was _so_ effective. In my experience it is rarely used in the state-of-the-art ConvNets, though maybe that is because they are trained on large ImageNet datasets where overfitting is less of an issue? Weight decay/L2 norm seems almost ubiquitous is comparison.
Have other HN readers found MaxNorm to be that useful? Am I missing out?
It's interesting to see all the techniques listed but there is no indication about when and how use any of them. Is it better to start by dropout or by regularization? Can I use both? (These are rhetorical questions...)
And as another commenter said, this is not unique from TF.