Pretty cool stuff, but also if my assumption is correct it means that if you _didn't_ use the widely available ImageNet weights for inception v3 then this attack would be less effective (or not even work). Given that most actors who you don't want recognizing your images don't open source their weights this may not scale/or be very helpful...
A simple example would be if an image identifies as a basketball but you blur it slightly and it identifies as a cat, you might be looking at an adversarial attack.
The more simple ones don't even really work after basic transformations (like rotating, scaling, etc) on the target model, so those attacks are often brittle. But there are lots of techniques and some of them are more robust across more models and transformations. This sometimes has a tradeoff of causing the manipulation to the image to be more noticeable to the human eye.
Adversarial attacks are a bit of a cat-and-mouse game between new attacks and new attempts to find where they fail.
If these are hacking the specific, underspecified, realization of the classifier (I.e. the set of public weights trained from a specific seed and visit order through the data), the adversarial examples are probably just as fragile as the classifier.
Practically speaking it wouldn't fool anyone for more than a split second, not least since our input is video instead of snapshots, but it's an interesting thing to wonder about. Maybe we could build an AI which would be in most senses as smart as us, but which would be more vulnerable to such things?
It's not entirely the same method as these "adversarial noise" inputs, but some optical illusions are pretty close in how they mess with the localized parts of our optical processing (e.g. https://upload.wikimedia.org/wikipedia/commons/d/d2/Caf%C3%A...).
We can't backprop the human vision system to find "nearby" misclassifications as easily, and presumably our own "classifiers" are more robust to such pixel-scale perturbations, but especially lower-resolution images can trip us up quite easily too (see e.g. https://reddit.com/r/misleadingthumbnails/).
The proliferation of tools like this and the "LowKey" paper/tool linked below (an awesome paper!) will fundamentally change the distribution of image data that exists. I think that widespread usage of this kind of tool should trend towards increasing the irreducible error of various computer vision tasks (in the same way that long term adoption of mask wearing might change the maximum accuracy of facial recognition).
Critically, while right now the people who do something like manipulate their images will probably be very privacy conscious or tech-interested people, tools like this seriously lower the barrier to entry. It's not hard to imagine a browser extension that helps you perturb all images you upload to a particular domain, or something similar.
Hard to see why that wouldn't be the case, esp. for techniques that are general, vs. exploiting bugs in individual models. As long as a person can quickly tell the difference, it's in the grasp of deep learning for ~perception problems, and the economics of the arms race determines the rest of what happens when..
One thing that seems unique to technologies that are mostly just statistical learning is that each new manipulation approach can basically widen the distribution of possible inputs. In particular, I'm thinking that as more obfuscation and protest technologies are made public like this, the distribution of "images of faces available for computer vision training" becomes more complex. That is to say, whenever a adversarial tool creates a combination of pixels that's never been see before, if that "new image" can't be reduced back to a familiar image via de-noising or pre-processing, the overall difficulty of computer vision tasks increases.
All a long winded way of saying, I think for ML systems, there's a unique opportunity to "stretch the distribution of inputs" that may not exist for other security arms races.
Totally agree that economics of the arms race(s) will a huge factor in determining how much an impact obfuscation and protest can have.
Ideally I’d like to see something like this be part of the camera filter itself.
Why can’t Apple, if they choose to do so, just add something like this as part of their camera app itself?
That's the power of opensource.
Similar pitch -- use a small adversarial perturbation to trick a classifier -- but LowKey is targeted at industry-grade black-box facial recognition systems, and also takes into account the "human perceptibility" of the perturbation used. Manages to fool both Amazon Rekognition and the Azure face recognition systems almost always.
However machine learning is nowhere near what we consider as an AI, equivalent of our intelligence.
You can compare machine learning with training a hamster to jump on a command. If you will repeat learning process a lot of time hamster will jump. But change anything in the environment and he won't.
Machine learning is just a hamster that is trained thousands of times.
It can do one thing, sometimes quite good, but still it is as intelligent as a hamster.
Machine learning does not aim to become an intelligence. It is just a well trained hamster. Nothing more.
It is just a fuzzy algorytm.
That is why it is so easy to fool the algorytm. I hesitate to call machine learning any kind of AI just for the reason it generates such confusion.
If it comes to developing real AI we are nowhere near currently. However we enjoy machine based models that are easier to brute force train with processing power he have today
At a mile high conceptual level, AI is nothing but a program created by a computer based on the data it is provided
Which is why it is extremely easy to fool using techniques that it is not trained to handle, today, but might be able to handle tomorrow. It is a race...
I found a quote from Geoff Hinton where he talked about this last year.
From : “I can take an image and a tiny bit of noise and CNNs will recognize it as something completely different and I can hardly see that it’s changed. That seems really bizarre and I take that as evidence that CNNs are actually using very different information from us to recognize images,” Hinton said in his keynote speech at the AAAI Conference.
“It’s not that it’s wrong, they’re just doing it in a very different way, and their very different way has some differences in how it generalizes,” Hinton says.
This. In a nutshell every sort of algorithm we call "AI" today is reductive pattern matcher. This limitation isn't due to computational capacity or even, IMO, algorithm design, but due to our collective lack of understanding of how intelligence itself works. We'll get there eventually, but not for a long while.
Your brain is but a preprogrammed, biological computer that reacts to data obtained from its interfaces and attach peripherals.
This a common refrain but fairly obviously untrue. It assumes there's some secret sauce in human brains that makes us "intelligent" whereas AI is "just a machine".
It's pretty clear that human brains are just programs. Extraordinarily complicated highly optimised programs, sure. But nobody has even found a shred of evidence that there's anything fundamentally different to programs in them.
Thinking otherwise is along the same lines as thinking that animals don't have feelings.
Every time there's an advance in AI the "it's not really intelligent" goalpost shifts. Clearly intelligence is a continuum.
Artificial neural networks don't have "memory addresses" to store data in the same way that a conventional program does either. But they can still store data. GPT-3 knows the first page of Harry Potter, but if you feel through its weights you won't find any of the text. The knowledge is distributed somehow (in a way that we don't fully understand).
Despite that GPT-3 is clearly a program.
My general thought is, that since our brain is made of matter, it can be dissected and understood and eventually copied. Except, we are reverse engineering millions of years of evolution, which is exceedingly hard! We have had access to the information about all the proteins that make our brain cells for almost two decades, and still, their function has to be teased out in year long experiments. Not to speak of understanding the workings of the Homo Sapiens brain as a whole.
It is also an analog 'computer', the performance of which we have no perceivable chance to match, at least in the near and most likely also distant future.
That's not the case. Or rather, we don't know, we only have models and some are useful. In your statement there's a whiff of you having only a hammer, and everything looking like a nail.
I'm not saying I entirely disagree, only that the computer analogy is only an analogy, and has problems and detractors.
For example, this from 2005: "In cognitive science, the interdisciplinary research field that studies the human mind, modularity is a very contentious issue. There exist two kinds of cognitive science, computational cognitive science and neural cognitive science. Computational cognitive science is the more ancient theoretical paradigm. It is based on an analogy between the mind and computer software, and it views mind as symbol manipulation taking place in a computational system (Newell and Simon, 1976). More recently a different kind of cognitive science, connectionism, has arisen, which rejects the mind/computer analogy and interprets behavior and cognitive capacities using theoretical models which are directly inspired by the physical structure and way of functioning of the nervous system. These models are called neural networks—large sets of neuronlike units interacting locally through connections resembling synapses between neurons. For connectionism, mind is not symbol manipulation. Mind is not a computational system, but the global result of the many interactions taking place in a network of neurons modeled with an artificial neural network."
Google is making it nearly impossible for me to get a URL for this, here's a link I hope works:
So it's a little more sophisticated than just adding random noise, it's adding very specific quantities of noise to very specific locations, which are based on perfect knowledge of how the predictive system (the deep model) works.
Is this stuff interesting? Absolutely. Is it worth studying? Yes, again. Does it mean that CNNs as we know them are poor computer vision systems and fundementally flawed? No. It's a limitation of existing deep models, and one which may be overcome eventually.
Is the idea that it's useful if you're trying to stop targeted facial recognition of individual people?
There's a lot of work on compressing/denoising images so that only the human-salient parts are preserved, and without seeing this working past that I think it's better to interpret "adversarial" in the machine learning sense only. Where "adversarial" means useful for understanding how models work, but not with any strong security implications.
I also don't see how this would do much against object recognition or face recognition. More insight to the types of recognition this actually fights against would be helpful.
That’s precisely the point, you are creating noise that humans are insensitive to, but that severely affects AI.
The idea as I understand is that if you need to upload an image (of yourself for instance), you can use this to complicate matters to AIs by uploading the modified picture.
"The ai does not love you, the ai does not hate you. But you are made out of atoms, it can use for something else."
From the thought experiment side. I think the moral implications cut both ways. Mass image recognition is not always bad - think about content moderation or the transfer of images of abuse. As a society we want AI to flag these things.
Shame, I thought I would be able to trick Google Images and stop giving away answers for my movie quiz game that easily.
The only method that works randomly as an anti-cheat measure is to revert horizontally the image. It fools Google Images a lot of times.
I'd sooner spend the effort on legal challenges.
> it works best with 299 x 299px images that depict one specific object.
Wow. How incredibly useful.
In the end, this is a completely useless exercise and will not have any impact on mass image recognition. For this to even work the attack needs to be tailored to the exact weights in the neural network that is being attacked.