
Breaking 7 / 8 of the ICLR 2018 adversarial example defenses - anishathalye
https://github.com/anishathalye/obfuscated-gradients
======
anishathalye
Hi HN! I’m one of the researchers who worked on this. Both Nicholas Carlini (a
co-author on this paper) and I have done a bunch of work on machine learning
security (specifically, adversarial examples), and we’re happy to answer any
questions here!

Adversarial examples can be thought of as “fooling examples” for machine
learning models. For example, for image classifiers, for a given image x
classified correctly, an adversarial example is an image x* such that x* is
visually similar to an image x, but x* is classified incorrectly.

We evaluated the security of 8 defenses accepted at ICLR 2018 (one of the top
machine learning conferences) and we find that 7 are broken. Our attacks
succeeded when others failed because we show how to work around defenses that
cause gradient-descent-based attack algorithms to fail.

