
A New Angle on L2 Regularization - lainon
https://thomas-tanay.github.io/post--L2-regularization/
======
abenedic
I thought this was a really good example of using images and examples to make
the point. I have seen a few other papers like it, but this is one of the best
done. I hope more people do this in the future.

~~~
stared
There is [https://distill.pub/](https://distill.pub/) journal.

------
madhadron
I feel like this boils down to: imagine a frame from which you draw data sets
for fitting a hyperplane to separate the data into two classes. The direction
of the hyperplane has lower variance if you impose regularization on the norm.

In a high dimensional, linear space you can add epsilon to lots and lots of
dimensions in such a way that you can cross that hyperplane, and not all the
regularization of the norm in the world will save you.

Both statements are fairly obvious if you've studied functional analysis and
probability.

This is a case where pre-ML computer vision didn't have the same problem. I've
always felt that something got lost in the transition.

------
plopilop
I've not read everything yet but the adversarial attacks remind me of an
article I read about evading SVMs and perceptrons by a simple gradient
descent.

Biggio et al., "Evasion Attacks Against Machine Learning at Test Time",
[http://www.ecmlpkdd2013.org/wp-
content/uploads/2013/07/527.p...](http://www.ecmlpkdd2013.org/wp-
content/uploads/2013/07/527.pdf)

------
jclos
If this isn't already, it should be submitted to distill.pub

~~~
ttanay
Thanks! We did write this article with distill in mind (and we used the
distill template). Unfortunately it didn't make it through the selective
reviewing process. The three reviews and my answers are accessible on the
github repository if you're interested: [https://github.com/thomas-tanay/post
--L2-regularization/issu...](https://github.com/thomas-tanay/post--
L2-regularization/issues)

------
logane
We took a look at adversarial examples for linear classifiers (and in general,
we looked at properties that adversarial training induces) here:
[https://arxiv.org/abs/1805.12152](https://arxiv.org/abs/1805.12152) For
$\ell_\infty$ adversarial examples on linear classifiers we found that
adversarial training forces a tradeoff between the $\ell_1$ norm of the
weights (which is directly associated with adversarial accuracy) and accuracy.

It looks like this article works through something vaguely similar for
$\ell_2$ adversarial examples. It would be interesting to compare the author's
approach with explicit adversarial training.

------
TheCabin
Just want to say thanks to the author for going the extra mile and making the
article as visual as possible.

------
verdverm
From my understanding, Capsule Networks seem to be the answer to the last
papragraph. We need hardware gains before we get to realize their potential.
TPU2 Pods?

