Hacker News new | past | comments | ask | show | jobs | submit login
Fooling Neural Networks [pdf] (illinois.edu)
149 points by dvkndn on Aug 6, 2021 | hide | past | favorite | 37 comments



I have a background in classic image processing and machine vision and back in the olden days we had the opposite problem: algorithms were just too specific to build useful applications. It's easy to detect lines and circles with a Hough Transform or do template matching for features that very closely match a sample. However, working up the chain it never came together, detecting cars in a parking lot, a relatively simple task with CNNs, was a very difficult problem.

I wonder if enough work is being done to combine the achievements of each field. Whenever I see adversarial examples I wonder why people aren't doing more preprocessing to root out obvious problems with normalization in scale, color, perspective, etc. Also, if we could feed networks with higher level descriptors instead of feeding low-information-density color images, wouldn't that make life easier.

I'm sure I'm not the only one thinking this, is there any good research being done in that space?


I'm often thinking the same, and am only aware of a single popular library doing that: rnnoise [0]

This library combines "classic" digital signal processing with a smaller RNN. As a result, it's smaller, faster and probably also has less uncanny edge cases than approaches that use an RNN for the complete processing chain. I think many use cases could benefit from this approach.

[0] https://jmvalin.ca/demo/rnnoise/


My feeling is that:

- lots of people in the DNN for machine vision community do not have a background in classical techniques.

- a lot of classical techniques and preprocessing pass make no real difference when applied to the input of a DNN and are thus worth eliminating from the pipeline to simplify it (this has been my experience).

However, I do think that there are gain to be gotten by combining classical image processing ideas with neural networks. It just hasn't really happened yet.


There's some stuff happening, eg applying anti-aliasing to improve shift invariance: https://richzhang.github.io/antialiased-cnns/ (also check out the related papers)


A library to do this that integrated well with typical inference and and training systems would be a great contribution!


Because the new wave don't know how to do this. Deep Learning as AI was sold as "statistics in place of physics" -- ie., no need to understand reality (the target domain), simply brute force it.

The peddlers of such a message were bamboozled by early successes and lacked sufficient experience of empirical science to realise this was never going to work.

No NN will discover the universal law of gravitation from any dataset not collected on the basis of knowing this universal law. With 'statistics as theory' there can never be new theory, as a new theory is a precondition of a new dataset.


Combine this with Apple’s new photo scanning tech and this could be a new way for bad people to SWAT someone else. Assuming you could fool Apples classifier, just text that manipulated image to the target and they will get flagged.

Any image on the internet that you save to your phone is now a risk to yourself, even if it looks innocent.


Probably easier to slip a "naughty" image that doesn't contain a minor into the image database. An image that resides in a target's iCloud storage. Or soon will.

Then it would take a super high level human reviewer to determine that original image, although sexual in nature, is not in fact illegal.

There are so many parties that can submit images to this database, it presumably wouldn't be too hard to subvert one of them with cash or laziness.


This research indicates that every neural-net-classified input sourced from an uncontrolled environment probably wants human review (if the neural net's owner wants to minimize malicious false-classifies).


The input space for these neural networks is huge, it is roughly the number of colors to the power of the number of pixels. What neural networks do is subdivide the input space and assign a label to it. Because of the high dimension of the input space it is very likely that it is possible to find images that are on the boundary between two labels. Using more advanced techniques might make it more difficult for an adversary to find such examples, but it does not eliminate their existence.

One of the big problems with neural networks (and other AI techniques as well) is that they cannot explain their classifications, which makes it difficult to determine whether a classification is correct. Most people seriously underestimate how difficult this task is. Humans can do it quite easily because our hardware has been optimized by eons of evolution. Neural networks are only in their infancy.


The problem goes much deeper than these adversarial examples. The main issue is Solomonoff Uncomputability (or the No Free Lunch in Search and Optimization theorem, or any of the other hard limiting theorems).

In short, it’s not only that you can devise adversarial examples that find the blindspots of the function approximator and fool it into misprediction, it’s that for any learning optimization algorithm you can abuse its priors and biases and create an environment in which it will perform terribly. This is a fundamental and inherent feature of how we go about machine learning — equating it with optimizing functions — and we will need a paradigm shift to go around it.

It’s curious to me how most of these results are known for decades, yet most researchers seem dead set on ignoring them.


I think machine learning researchers are well aware that successful optimisation is only possible using the right priors. This is explicit in bayesian machine learning but also implicit in neural networks in the choice of the architecture, optimisation algorithm and hyper parameters. It's a well discussed problem and a lot of researchers have a serious background in optimisation, theoretical machine learning and other related areas.


What exactly are the right priors for general intelligence? And keep in mind, whichever prior you choose, I can design learning problem where it will lead you astray.

This paper provides some interesting results on the weakness inherent in universal priors: https://arxiv.org/abs/1510.04931


Related question: What are the adversarial examples for human intelligence? We know some for the visual and auditory systems, but what about the arguably general intelligence of humans?

Maybe we can work our way backwards from the adversarial examples to the inductive biases?


'Thinking Fast and Slow' is basically all about the rough edges of human thinking.

The interesting tradeoff with ML systems is that you trade lots of individual human crap for one big pile of machine crap. The advantage of the machine crap is that you can actually go in and find systemic problems and work on fixing them at a 'global' level. On the human side, you're always going to be stuck with an unknown array of individual human biases which are incredibly difficult to correct.


I think fractional reserve banking has done a pretty good job of fooling everyone.


That's for reinforcement learning, right? What is the adversarial learning problem in say, classification based on Solomonoff?

If hypercomputation is possible, then anything based on Kolmogorov complexity would be SOL, but if not... is Solomonoff induction just too expensive in practice?


Regarding: "What neural networks do is subdivide the input space and assign a label to it."

I've made such plots when the input is 2d, breaking the input space into discrete chunks/pixels, having the net classify, and then coloring that pixel according to the classification, and what usually happens is something like what an SVM would produce: large contiguous regions of the same class.

But when the input space is high dimension, and the net is super deep, who is to say what this classification looks like... My guess is it looks less like oil and water carefully poured in a bottle, and more like oil and water shaken vigorously in a bottle.

Do you have any citations about how NNs subdivide the input space, or how regular it is?

The way I have thought of it so far is that we humans subdivide the input space, then stick those blocks into a NN that could have huge Lipschitz bound, and observe the output of a highly irregular function.

When you say "What neural networks do is subdivide the input space and assign a label to it." It sounds more like subdividing the input space helps solve the NNs problem (minimizing the loss). But, it seems to me that that is not so related to minimizing the loss. (Partly because the NN never sees most of the input space during training, and neither is it relevant to what humans want: generalization)



Anyone have a sense of how much of a problem this is?

It's not surprising that a network can be fooled by small input changes, but if some image preprocessing is enough to solve this It's not a big problem.

On the other hand, if I can make a sign that looks like a stop sign to people but looks like a road work sign to a tesla, that's obviously a big deal.

These slides touch on the difference by saying that physical examples of adversarial inputs are harder, and they mention some mitigation techniques, but they don't seem to really quantify how effective mitigation is in real world scenarios.


Tesla identifying the moon as a yellow traffic light ? https://twitter.com/JordanTeslaTech/status/14184133078625853...


Can’t we use the same method to generate adversarial inputs to iteratively train multiple model? After each model is generated we expand the data set by using the prior model to generate the adversarial inputs and then train a classifier maximizes the performance on both the inputs and adversarial inputs.

Now we just use n models in production and use voting for produce the label.

As n gets large, does this become robust to adversarial inputs?


This is basically adversarial training, which is a typical (& very practical) benchmark heuristic defense for this problem. An ongoing question is to precisely characterize when and how AT works. The line of work has also proved to be very fruitful for the theoretical community & has produced very general results about problems which can be solved by neural networks, but not other techniques- e.g. kernel methods.

https://arxiv.org/abs/2001.04413


Thanks for the link. It seems like the text is focused on correcting errors across layers. I guess fundamentally there is no difference between the multi-model challenge of correcting errors across models and that of correcting errors across layers. This is dense, but I’m going to dive into the discussion around figure 14 as a starting point.

Thanks again.


Teacher student!!


:) we are all students


Can someone please ELI5 why we can't just blur every image before passing it to the classifier? Wouldn't this defeat this sort of attack? Obviously you lose some accuracy, but that seems acceptable.

Edit: I guess that's similar to "image quilting" (whatever that is) in this slide deck. This is the first time I've seen something like this mentioned. Seems like a straight-forward solution.


It turns out that a similar technique, where you basically apply noise multiple times to a single image, and average predictions over all noisy images- equivalent to convolving your nn with Gaussian noise yields near state of the art bounds on provable robustness (under a specific class of attacks). The issue is the magnitude of noise you need in order to get practically robust networks is quite large relative to the data you are dealing with.

https://arxiv.org/abs/1902.02918

https://arxiv.org/abs/1906.04584


These are the papers I've been wondering why didn't exist ;)

Thanks!


Assuming the attacker knows how you are blurring the image they could do exactly the same attack.


Adversarial attacks is a super interesting field, but unfortunately I feel that a lot of papers are just incremental attack or defense improvements like a cat-and-mouse game. I originally did some research on 3D point cloud attacks, but later stopped because making super successful attacks (eg., attacks with higher success rates than all the previous techniques for some very specific task) don't really help us understand that much more about neural nets, its just optimizing a metric for publishing papers. This kind of research is quite common, even at top conferences.

Despite this, recently, we made a 1 minute explainer video introducing adversarial attacks on neural nets as a submission for the Veritasium contest: https://youtu.be/hNuhdf-fL_g Give it a watch!


Does this attack require access to the neural network internals or merely high-volume access to the input and output channels so you can keep passing perturbed images in until you see the classification flip?


For making my movie quiz free of cheaters (at least most of them), I tried to fool Google Images reverse search by adding some noise like described here, on my movie snapshots. Unfortunately it didn't work.

The only trick which works the most is to revert horizontally the image, at random. When it works, Google is not about to find similar images.



Looking from far far away at the examples it seems networks are still trained towards a local optimum and are easily tripped. Basically the equivalent of a card trick failing horribly the moment audience enters the stage.


Is there an equivalent case of supernormal stimuli for NLP?


Natural language is a bit too complex for that I think.

There can't be any universal stimuli simply because there's multiple languages and cultures that don't all have the same response to stimuli.

There's a relationship between neural activity and writing system, for example [0].

Then there's stimuli that activate the language centre in some languages (e.g. click-sounds the Khoisan language families in Africa) but not in others. Some languages (especially East Asian languages like Vietnamese) also use tone to distinguish lexical or grammatical meaning, while Indo-European languages do not. which is another significant difference in (here: verbal) language processing.

All this leads me to conjecture that supernormal stimuli are highly unlikely in this context due to the high-level nature of the subject as well as the differences and the diversity in the involved regions of the brain.

[0] https://www.sciencedirect.com/science/article/abs/pii/S01680...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: