
Real and Stealthy Attacks on State-Of-the-Art Face Recognition [pdf] - okket
https://www.cs.cmu.edu/~sbhagava/papers/face-rec-ccs16.pdf
======
unabridged
The fact humans can easily see through the disguise means there are huge
glaring faults in the recognition algorithm. These types of attacks will
probably be wiped away one day with a single breakthrough.

You can't fool a human by covering up a small percentage of something with
noise. Humans can recognize parts of a face pretty well, classifying smaller
parts of an image separately may help defend against this (and also be able to
detect composite images, for example where eyes and mouth of different people
are combined).

Also humans instantly detect that those glasses are noise, there probably
needs to be a filter that removes obvious noise before the face recognition
begins.

~~~
rawnlq
On the flip side, I wonder how much of face recognition failures by humans
would be considered "huge glaring faults" to machines?

For a famous example, take the
[https://en.wikipedia.org/wiki/Thatcher_effect](https://en.wikipedia.org/wiki/Thatcher_effect)
(where a face will look normal upside down even if you individually flip the
eyes and mouth). Or the
[https://en.wikipedia.org/wiki/Hybrid_image](https://en.wikipedia.org/wiki/Hybrid_image)
(where you see two different images depending on how far you are standing). I
bet machines can or eventually handle those cases better than us.

Is trying to emulate our own recognition system the best possible approach? It
might end up inheriting all our edge cases as well.

(For another example relevant to halloween, with a bit of face paint you can
fool any human:
[http://gfycat.com/DeliriousEdibleAustraliansilkyterrier](http://gfycat.com/DeliriousEdibleAustraliansilkyterrier).
But you can totally imagine a sufficiently advanced algorithm that is capable
of ignoring color information and reconstruct a 3d surface for comparison
instead.)

~~~
TrainedMonkey
Regarding Thatcher effect, one of the Hinton lectures from
[https://www.coursera.org/learn/neural-
networks](https://www.coursera.org/learn/neural-networks) talks about
difficulties in recognition of reflected/upside down objects. I can't easily
look it up, so here is my paraphrasing:

Humans exhibit a comparable delay when recognizing and reasoning normal and
reflected objects. He then gives a few examples such as answering "is this a
left or a right foot shoe" and suggests that brains do rotational
transformations as part of the object recognition. The machine solution is
probably similar to a human solution - do a bunch of transformations and pick
highest confidence one.

The interesting part that is humans can answer "this is a shoe" question
immediately, but take longer to reason about reflected images.

------
kabdib
When we were doing face recognition on Kinect, I'd pass by the Q/A areas and
_eventually_ got used to seeing Richard Nixon, Mao Tse-tung, Marilyn Monroe
and others dancing around, exercising the Kinect software stack. The masks
were pretty startling at first.

I think that the next version of the camera did analysis in IR to try to
detect masks, I don't know how effective it was.

------
savanaly
I'm curious, since the facial recognition algorithms they are testing are
created with ML, would adding the instances of someone wearing these glasses
to the training set result in a version that wouldn't be fooled? Or is the
method something the ML algorithms are inherently weak against? Is this an
exploit that could be patched, in other words?

~~~
username223
Adding some instances of these glasses to the training set would probably fix
the model for these glasses, but then it might break for lipstick, hats,
scarves, other glasses, or what-not.

The most interesting thing to me about this is that the models could be so
badly fooled while obscuring such a small amount of the face. It shows the
limits of solving problems by training zillion-parameter models to categorize
zillion-dimensional data points: they won't necessarily be doing what you
think they are, and may go completely off the rails when presented with data
not close enough to their training sets.

~~~
BickNowstrom
> ... so badly fooled while obscuring such a small amount of the face ... may
> go completely off the rails when presented with data not close enough to
> their training sets.

Older research has looked at this, in the context of neural networks in self-
driving cars. The nets take the easy way out: use just the most significant
features (like road markers) to solve the problem. However, training sets may
not contain roads where snow or fog is obscuring these significant features,
which makes the nets fail, because they haven't learned to rely on other
features too. In contrast, human vision is remarkably robust.

[http://repository.cmu.edu/cgi/viewcontent.cgi?article=1176&c...](http://repository.cmu.edu/cgi/viewcontent.cgi?article=1176&context=compsci)
[pdf] "FeatureBoost: A Meta Learning Algorithm that Improves Model Robustness"

~~~
username223
Thank you! My knowledge of how these things work is somewhat general and out-
of-date. I think I remember reading about FeatureBoost back in the day, and
will try to refresh my memory of how it compares to max-margin methods.

------
lifeisstillgood
tl;dr Using spectacle frames with the other persons image "perturbed" onto
that frame, facial recognition algorithms can be fooled into thinking your
face is someone else's.

(There is a weirder ability to hide a human face using similar approach.)

This is surprisingly cool - the fact that the algorithm can be fooled is not
amazing, but that they could _find_ a fairly practical attack is quite
impressive.

And at least now a computer can think I look like George Clooney, even if it
won't work in a singles bar.

------
bsenftner
Saw this last night, and being that I work on the application side of facial
recognition, I sent the paper and this discussion here to our neural R&D lab
(on the other side of the planet, where it was morning for them.) The chief
scientist read the paper with interest, ran their images through our system
and found none of them worked as the paper claimed. I suspect the technique
described had been suspected by some security industry technologists and
measures put in place to combat these types of attacks. I suspect those that
foresaw these developments have had their suspicions confirmed.

~~~
hokkos
The paper presents different techniques, the first one is as they say in the
thread model a "white-box scenario: the attacker knows the internals
(architecture and parameters) of the system being attacked". It means the
images or glasses are crafted for a particular instance of an algorithm using
its internal data. In their black-box scenario they "have no access to the
internals of the training process and the model used", they do say that
"dodging attacks with randomly colored glasses and found that it worked
immediately for several images" using the previously generated model, but they
did use "Particle Swarm Optimization" to modify their impersonation attack
based on the result of the black-box recognition. It means you can't take
their images and expect to work on your model, because mostly what they claim
is specially crafted image can impersonate white or black box model using
features or trial and errors.

~~~
bsenftner
Yes, I gathered that from the paper. In discussion with our lead scientist, he
felt it was interesting and started discussing another recent paper using
genetic algorithms to reverse any black box service, and how current research
focuses on on that type of attack. It looks like the larger security industry
is at least one step in front of this type of research.

------
EGreg
The real way to fool facial recognition via anything below UV light is with
these bad boys:

[https://m.youtube.com/watch?v=770nI13-MJs](https://m.youtube.com/watch?v=770nI13-MJs)

Takes care of the liveness test also. However, we still have a long way to go,
to overcome uncanny valley.

~~~
EGreg
The real question is, how would you fool wifi silhouette technology

[https://gizmodo.com/wifi-networks-can-now-identify-who-
you-a...](https://gizmodo.com/wifi-networks-can-now-identify-who-you-are-
through-wall-1738998333)

------
nl
In the real world, these attacks are fairly trivial to work around - you use
an ensemble of independent architectures (say ResNet/Inception/VGG) and train
them independently on the faces.

(Also, it's well known that if you include the perturbation in the training
the networks won't be fooled by them.)

~~~
BickNowstrom
Ensembles are also vulnerable to adversarial images.

From one of the canonical papers on adversarial images:

> In addition, the specific nature of these perturbations is not a random
> artifact of learning: the same perturbation can cause a different network,
> that was trained on a different subset of the dataset, to misclassify the
> same input.

> ... suggest that adversarial examples are somewhat universal and not just
> the results of overfitting to a particular model or to the specific
> selection of the training set

[https://arxiv.org/abs/1312.6199](https://arxiv.org/abs/1312.6199)

More recent research shows that one can make adversarial images which fool
ensembles. Once you have these images there is a very high chance it will also
fool any individual architecture you pass it on to next.

[https://twitter.com/goodfellow_ian/status/790330213963923456](https://twitter.com/goodfellow_ian/status/790330213963923456)

These attacks show the inherent flaws of ML models (not just convnets, as per
[http://karpathy.github.io/2015/03/30/breaking-
convnets/](http://karpathy.github.io/2015/03/30/breaking-convnets/) ), and as
far as I know, they are not trivial to work around. Research on adversarial
images is still very active, in part because the military would like to rely
on deep learning, and does not like to mistake a tank for a puppy (and vice
versa).

~~~
nl
[https://arxiv.org/abs/1412.6572](https://arxiv.org/abs/1412.6572) is a
Goodfellow et. al. (ie, the person who's Tweet you note above) paper where
they show adversarial examples do apply across architectures (as you state
above), but they show that using the adversarial examples during training has
a regularization effect and improves performance:

[https://arxiv.org/pdf/1511.05432v3.pdf](https://arxiv.org/pdf/1511.05432v3.pdf)
takes a similar approach:

 _We show that adversarial training of ANNs is in fact robustification of the
network optimization, and that our proposed framework generalizes previous
approaches for increasing local stability of ANNs. Experimental results reveal
that our approach increases the robustness of the network to existing
adversarial examples, while making it harder to generate new ones.
Furthermore, our algorithm improves the accuracy of the network also on the
original test data._

I agree this is an area of active research, but not that it is an inherent
flaw.

------
gohrt
historical context: [https://cvdazzle.com/](https://cvdazzle.com/)

------
smoyer
Very cool ... we should all get eyeglass frames made to trick the system into
thinking we're Guy Fawkes.

------
ChuckMcM
Nice! Eye wear fashion will be changing in 3.. 2.. 1..

I have always been more interested in defeating just the whole tagging where
cameras have seen you thing without makeup or tattoos and this is great for
that, but if I can leave behind a false trail pointing at someone else? Wow
that is even better.

------
guelo
And thus a new cat and mouse game begins.

------
hammeiam

      ...Colin Powell, a 79-year-old white male;

~~~
some-guy
Well he does have more Irish ancestry than African :)

------
debt
iris scanners minority report style ya dig

------
DennisAleynikov
This is incredible, the pattern looks great and I'm really impressed with how
well it dodges!

