
Training Differentiable Models by Constraining Their Explanations - makmanalp
https://arxiv.org/abs/1703.03717
======
ice109
someone want to do us all a favor and define what an explanation is in this
context

~~~
anewhnaccount2
Quick explanation here: [https://github.com/marcotcr/lime#what-are-
explanations](https://github.com/marcotcr/lime#what-are-explanations)

~~~
avital
That explains LIME, an older paper that's not the one being discussed here
(but is referenced)

~~~
anewhnaccount2
It also explains what an explanation is in this context (which is what was
asked): a local linear approximation of the model. Additionally it has a
diagram which is nice. Obviously it's not the one being discussed here though
-- I'd hardly be adding useful information if I just linked to the submission
again as a reply.

~~~
asross
Yeah, so local linear approximations are what we and LIME are using as
explanations, but it's not what an explanation is generally.

In the paper we do define an explanation as basically any artifact that
"provides reliable information about the model’s implicit decision rules for a
given prediction." It's kind of a rough and over-general definition, but it
gets to the idea that explanations can be partial. All we want to do is turn a
completely black-box model into something slightly more transparent.

Ideally, we could have explanations that were at a higher level of
abstraction, e.g. "this image is a picture of a husky and not a wolf because
of the shape of the nose and the color of the coat," but a neural network has
no idea what "nose" and "coat" means. Sometimes its intermediate layers will
end up corresponding to meaningful abstract concepts like that, but not
always.

