
Explainable Deep Learning: A Field Guide for the Uninitiated - BERTHart
https://arxiv.org/abs/2004.14545
======
troelsSteegin
This is a review paper. It's long. Past the first sentence I find it readable
and well organized. It mentions some of the work on interpretability I'd
expect to see, Finale-Velez, Rudin, Wallach, and LIME, but does not appear to
mention Shapley. The bottom line conclusion is "In the end, the important
thing is to explain the right thing to the right person in the right way at
the right time." That's both an obvious truth and a differentiating mindset in
research-first space. It's worth a skim.

~~~
XuMiao
> This is a review paper. It's long. Past the first sentence I find it
> readable and well organized. It mentions some of the work on
> interpretability I'd expect to see, Finale-Velez, Rudin, Wallach, and LIME,
> but does not appear to mention Shapley. The bottom line conclusion is "In
> the end, the important thing is to explain the right thing to the right
> person in the right way at the right time." That's both an obvious truth and
> a differentiating mindset in research-first space. It's worth a skim.

People want to know whether some mathematical formulas can work. Then how do
they work? Then what can make them work in a different way.

Explanability or interpretability leads to controllability at the end.

I rather see NNs with semantic meanings instead of semantic meanings from NNs.
If human would like to control NNs, why not make them meaningful modules that
can be composed like a regular program.

For example, instead of using CNNs or RNNs, we simply make a model by stating
the definition:

Jaywalk :: (p: Person, scene: Image) := p in scene & exist s: Street in scene,
walk_cross(p, s) in scene & not exist z: ZebraCross in scene, inside(p, z) in
scene

Here predicates, walk_cross and inside, are neural network modules that might
be used in many different problems. We can identify cases where the model make
wrong predictions and modify the definition accordingly.

This is much human friendly development than tweaking parameters. After all,
not everyone is fond of programming in NNs directly.

------
uoaei
I think there ought to be a distinction between _explainable_ (what does this
neuron activate most strongly on?) models and _interpretable_ (what do the
model's parameters tell me about the data?) models.

The distinction is this: explanations can only be made _ex post facto_ , about
why the model acted a certain way based on specific inputs; interpretations
can be made based on the model's parameters themselves, i.e., "feature X is
very important and feature Y is almost always ignored and I know this because
my NN is one layer deep and all the weights for feature X are large in
magnitude and all the weights for feature Y are small in magnitude." This does
not require specific inputs to be fed, and specific outputs to be studied, so
is a different concept and why I am suggesting we make the distinction
explicit.

~~~
owenshen24
Zachary Lipton has a really good taxonomy of the different things people refer
to when they talk about interpretability and explainability here:

[https://arxiv.org/pdf/1606.03490.pdf](https://arxiv.org/pdf/1606.03490.pdf)

------
cs702
I'm surprised there is no mention of capsules and capsule-routing algorithms.

Capsules are groups of neurons that represent _discrete entities_ in different
contexts. For example, a 4x4 pose matrix is a capsule representing a
particular object in different orientations seen from different viewpoints.
Similarly, a subword embedding can be seen as a capsule with vector shape
representing a particular subword in different natural language contexts. More
generally, a capsule can have any shape, but it always represents only one
entity in some context.

In certain new capsule-routing algorithms -- e.g., EM routing[a], Heinsen
routing[b], dynamic routing[c], to name a few off the top of my head[d] --
each capsule can activate or not _depending on whether the entity it
represents is detected or not in the context of input data_.

Models using these algorithms therefore make it possible for human beings to
interpret model behavior _in terms of capsule activations_ \-- e.g., "the
final layer predicts label 2 because capsules 7, 23, and 41 activated the most
in the last hidden routing layer."

While these new routing algorithms are not yet widely used, in my humble
opinion they present a promising avenue of research for building models that
are explainable and/or enable assignment of causality at high levels of
function composition.

\--

[a]
[https://research.google/pubs/pub46653/](https://research.google/pubs/pub46653/)

[b] [https://arxiv.org/abs/1911.00792](https://arxiv.org/abs/1911.00792)

[c] [https://arxiv.org/abs/1710.09829](https://arxiv.org/abs/1710.09829)

[d] If you're aware of other routing algorithms that can similarly
activate/deactivate capsules, please post a link to the paper or code here.

~~~
Eridrus
It should be pretty obvious why, they don't work as well as what is standard.
This is about ways to explain the models we get good performance with.

~~~
cs702
Not sure I agree, for two reasons. First, capsule networks have been shown to
outperform standard architectures in at least some tasks (see the above
papers). Second, and perhaps more importantly, the large and growing chorus of
people -- from corporate executives to government regulators -- asking for
models that are "explainable" and "interpretable" really couldn't care less as
to what kinds of models are used. (In my experience, non-technical people with
decision-making power are almost always willing to trade performance for
better explainability/interpretability/assignment.)

~~~
visarga
If you want interpretability you can use Transformer and look at the attention
heads. Or, like in a recent paper, train a language model to give textual
justifications for its decision.

~~~
cs702
Yes. Been there, done that (by "that" I mean looking at attention heads, not
generating verbal "justifications" \-- the latter is on my want-to-try-it
list, even if only out of curiosity) :-)

FYI, Vaswani-style query-key-value self-attention mechanisms can be understood
as a type of capsule-routing algorithm -- one in which the capsules are in the
form of vector embeddings (each representing a token in a context), the
activations are in the form of attention heads (representing which input
tokens are most active for each output token), and the number of input and
output capsules is the same (for every input token there is an output token).

Here, I'm talking more generally about using capsule-routing algorithms in
which the capsules can be of any shape (they can be vectors, matrices, or
higher-order tensors), the activations can be computed via different proposed
mechanisms (including self-attention of course), and the number of input and
output capsules need not be the same (e.g., with some algorithms it's possible
to have a variable number of input capsules and a fixed number of output
capsules).

As I wrote elsewhere on this thread, the routing algorithms I find most
interesting are those in which each output capsule is a probabilistic model
that "must explain input data better than other output capsules" in order for
the capsule to activate.[a]

[a]
[https://news.ycombinator.com/item?id=23067556](https://news.ycombinator.com/item?id=23067556)

------
orionr
For those interested in this area for PyTorch models, take a look at Captum
([https://captum.ai/](https://captum.ai/)). Still a lot of work to do, but
we’ve provided a number of algorithms described in this field guide in the
library. Always looking for collaborators and contribution of others.

Disclosure: I support the team that developed Captum.

~~~
arolihas
That's really cool! My Deep Learning class used Captum for our assignments
with GradCAM activations; it's a very convenient tool for interpreting model
activations on images.

------
explainable
Thanks for posting. This seems interesting. It takes a while to get to making
good points but they provide a good overview of how the various parts of
safety, trust, ethics are explored by various explanatory methods like
visualization, model distilation, and "intrinsic" methods. It seems like
they're not actually addressing the issue of explainability but more the tools
that are available for trying to debug extremely large programs composed of
matrix multiplications and function applications. It seems like "explainable"
in this context really means "debuggable" and "debugging tools".

Because fundamentally neural networks really have a debuggability problem.
It's impossible to say if the program/code is actually correct and I'm not
sure how visualization is actually going to solve the problem of correctness.
If someone explained something to me I'd want to know why it actually
addresses whatever problem they claim it addresses and their reasons of
appealing to distilled models would not convince me because as long as we are
looking at a compressed version of the program then why would we conclude that
the larger program is actually correct and never misclassifies a pigeon as a
stop sign? So if I can't be sure that a pigeon will never be classified as a
stop sign then what has been actually explained and of what value is it?

------
mindgam3
Lost me at the first sentence.

> Deep neural network (DNN) is an indispensable machine learning tool for
> achieving human-level performance on many learning tasks.

Not to be pedantic, but words matter. Is anyone actually claiming that deep
learning achieves true “human-level performance” on any real world open-ended
learning task?

Even the most state of the art computer vision/object classification
algorithms still don’t generalize to weird input, like familiar objects
presented at odd angles.

I get that the author is trying to write something motivating and
inspirational, but it feels like claiming “near” or “quasi”-human performance,
with disclaimers, would be a more intellectually honest way to introduce the
subject.

~~~
svara
> Is anyone actually claiming that deep learning achieves true “human-level
> performance” on any real world open-ended learning task?

No, but the text you quoted doesn't say that.

Human level performance in this context means humans perform no better than
some algorithm on some specific dataset.

Incidentally, that's also how you get to claim superhuman performance on
classification tasks. Just include some classes that aren't commonly known in
your dataset, e.g. dog breeds, plant species, or something like that. ;)

~~~
mindgam3
> No, but the text you quoted doesn't say that [deep learning achieves human-
> level performance

Uh, it says DNNs are indispensable for achieving human level performance. That
clearly implies that this level of performance is achievable, despite all
evidence to the contrary.

~~~
jdminhbg
This is a weird interpretation of that sentence. There are lots of fields
where human-level performance has been achieved. See Go, for example.

~~~
asah
Maybe you need an RNN to help parse that sentence!! :-)

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

etc

