“In some ways, adversarial policies are more worrying than attacks on supervised learning models, because reinforcement learning policies govern an AI’s overall behavior.If a driverless car misclassifies input from its camera, it could fall back on other sensors, for example.” TIL fail-safe components are 1) ubiquitous 2) work 3) only an option for supervised learning components.
“A supervised learning model, trained to classify images, say, is tested on a different data set from the one it was trained on to ensure that it has not simply memorized a particular bunch of images. But with reinforcement learning, models are typically trained and tested in the same environment.”
First, a RL environment is not equivalent to a supervised learning data set. Second, the train validate test paradigm is not thrown out in RL research, its why OpenAI put their Starcraft agent on public ladders.
“The good news is that adversarial policies may be easier to defend against than other adversarial attacks.” This sentence refers to Graves et al. adversarially training their agents. Adversarial training is, of course, also conducted frequently in supervised learning.
Website with videos: https://adversarialpolicies.github.io/ (that would make a better submission imho)
You have to stretch the definition of "new" somewhat to come up with the title TR chose, adversarial effects in all kinds of learning settings certainly aren't, the paper itself seems to contain quite interesting thoughts on how to assess them though (as opposed to just using them to steer the training process).