Hacker News new | past | comments | ask | show | jobs | submit login
Explainable Deep Learning: A Field Guide for the Uninitiated (arxiv.org)
321 points by BERTHart 26 days ago | hide | past | web | favorite | 25 comments



This is a review paper. It's long. Past the first sentence I find it readable and well organized. It mentions some of the work on interpretability I'd expect to see, Finale-Velez, Rudin, Wallach, and LIME, but does not appear to mention Shapley. The bottom line conclusion is "In the end, the important thing is to explain the right thing to the right person in the right way at the right time." That's both an obvious truth and a differentiating mindset in research-first space. It's worth a skim.


> This is a review paper. It's long. Past the first sentence I find it readable and well organized. It mentions some of the work on interpretability I'd expect to see, Finale-Velez, Rudin, Wallach, and LIME, but does not appear to mention Shapley. The bottom line conclusion is "In the end, the important thing is to explain the right thing to the right person in the right way at the right time." That's both an obvious truth and a differentiating mindset in research-first space. It's worth a skim.

People want to know whether some mathematical formulas can work. Then how do they work? Then what can make them work in a different way.

Explanability or interpretability leads to controllability at the end.

I rather see NNs with semantic meanings instead of semantic meanings from NNs. If human would like to control NNs, why not make them meaningful modules that can be composed like a regular program.

For example, instead of using CNNs or RNNs, we simply make a model by stating the definition:

Jaywalk :: (p: Person, scene: Image) := p in scene & exist s: Street in scene, walk_cross(p, s) in scene & not exist z: ZebraCross in scene, inside(p, z) in scene

Here predicates, walk_cross and inside, are neural network modules that might be used in many different problems. We can identify cases where the model make wrong predictions and modify the definition accordingly.

This is much human friendly development than tweaking parameters. After all, not everyone is fond of programming in NNs directly.


I think there ought to be a distinction between explainable (what does this neuron activate most strongly on?) models and interpretable (what do the model's parameters tell me about the data?) models.

The distinction is this: explanations can only be made ex post facto, about why the model acted a certain way based on specific inputs; interpretations can be made based on the model's parameters themselves, i.e., "feature X is very important and feature Y is almost always ignored and I know this because my NN is one layer deep and all the weights for feature X are large in magnitude and all the weights for feature Y are small in magnitude." This does not require specific inputs to be fed, and specific outputs to be studied, so is a different concept and why I am suggesting we make the distinction explicit.


Zachary Lipton has a really good taxonomy of the different things people refer to when they talk about interpretability and explainability here:

https://arxiv.org/pdf/1606.03490.pdf


I'm surprised there is no mention of capsules and capsule-routing algorithms.

Capsules are groups of neurons that represent discrete entities in different contexts. For example, a 4x4 pose matrix is a capsule representing a particular object in different orientations seen from different viewpoints. Similarly, a subword embedding can be seen as a capsule with vector shape representing a particular subword in different natural language contexts. More generally, a capsule can have any shape, but it always represents only one entity in some context.

In certain new capsule-routing algorithms -- e.g., EM routing[a], Heinsen routing[b], dynamic routing[c], to name a few off the top of my head[d] -- each capsule can activate or not depending on whether the entity it represents is detected or not in the context of input data.

Models using these algorithms therefore make it possible for human beings to interpret model behavior in terms of capsule activations -- e.g., "the final layer predicts label 2 because capsules 7, 23, and 41 activated the most in the last hidden routing layer."

While these new routing algorithms are not yet widely used, in my humble opinion they present a promising avenue of research for building models that are explainable and/or enable assignment of causality at high levels of function composition.

--

[a] https://research.google/pubs/pub46653/

[b] https://arxiv.org/abs/1911.00792

[c] https://arxiv.org/abs/1710.09829

[d] If you're aware of other routing algorithms that can similarly activate/deactivate capsules, please post a link to the paper or code here.


It should be pretty obvious why, they don't work as well as what is standard. This is about ways to explain the models we get good performance with.


Not sure I agree, for two reasons. First, capsule networks have been shown to outperform standard architectures in at least some tasks (see the above papers). Second, and perhaps more importantly, the large and growing chorus of people -- from corporate executives to government regulators -- asking for models that are "explainable" and "interpretable" really couldn't care less as to what kinds of models are used. (In my experience, non-technical people with decision-making power are almost always willing to trade performance for better explainability/interpretability/assignment.)


My personal experience with capsule networks is that they didn't work better than a similar number of ungrouped neurons in any case.

If capsules work wonders for you, my first guess would be that you can improve your training of the standard network to make it work equally well.

In general, my hunch is that capsules are still too low level and too much of a local change to make a strong difference.

To give an example, all of the state of the art optical flow AIs are based on building cost volumes and then resolving them. There are edge cases, where one can prove mathematically that reducing the cost volume to a flow direction will make it impossible to produce the correct result. So to make a significant contribution, it doesn't help to use capsules in the feature processing stage, but you need to replace the entire architecture.


Thank you. You may be right. To some extent we're all guessing based on our own hunches :-)

FWIW, I've had the most success with EM/Heinsen-type routing algorithms -- that is, those in which each output capsule is generated by a probabilistic model (such as a Gaussian mixture), and the output capsule activates only to the extent its model can explain (i.e., generate) its view of input data better (in some quantifiable manner) than other output capsules. The notion that an output capsule "must explain input data better than other capsules in order to activate" is very appealing to me as a mechanism for inducing per-layer "explainability" in models.

In my experience so far, routing tends to work better on top of conventional architectures, e.g., use a ResNet for feature detection and stack two or more routing layers on top for classifying into hidden factors and then into training labels. Also, to get models to converge, I have found it helps to apply a nonlinear transformation to the features and then at least two routing layers on top. (I don't have a good explanation as to why two or more tend to work better than only one.) Finally, I usually feed only the capsule activations to the loss function -- that is, during training I let the capsules themselves "do whatever they want" to learn to explain input data.


If you want interpretability you can use Transformer and look at the attention heads. Or, like in a recent paper, train a language model to give textual justifications for its decision.


Yes. Been there, done that (by "that" I mean looking at attention heads, not generating verbal "justifications" -- the latter is on my want-to-try-it list, even if only out of curiosity) :-)

FYI, Vaswani-style query-key-value self-attention mechanisms can be understood as a type of capsule-routing algorithm -- one in which the capsules are in the form of vector embeddings (each representing a token in a context), the activations are in the form of attention heads (representing which input tokens are most active for each output token), and the number of input and output capsules is the same (for every input token there is an output token).

Here, I'm talking more generally about using capsule-routing algorithms in which the capsules can be of any shape (they can be vectors, matrices, or higher-order tensors), the activations can be computed via different proposed mechanisms (including self-attention of course), and the number of input and output capsules need not be the same (e.g., with some algorithms it's possible to have a variable number of input capsules and a fixed number of output capsules).

As I wrote elsewhere on this thread, the routing algorithms I find most interesting are those in which each output capsule is a probabilistic model that "must explain input data better than other output capsules" in order for the capsule to activate.[a]

[a] https://news.ycombinator.com/item?id=23067556


> train a language model to give textual justifications for its decision.

This doesn't work for humans. Sure, they'll give an explanation, but they don't fully understand their own decision making process so they can't reliably explain it. I am not sure which paper you're referring to, but how did the researchers address this issue?


I think it should be obvious why there's no mention-in fact, you said it yourself-"these new routing algorithms are not yet widely used" but are a "promising avenue of research." The purpose of the paper as stated is to help people explain how commonly used deep learning tools work to laymxn, and including an aside about some niche subfield of deep learning research doesn't align with that goal (regardless of how interesting you personally think it is).


For those interested in this area for PyTorch models, take a look at Captum (https://captum.ai/). Still a lot of work to do, but we’ve provided a number of algorithms described in this field guide in the library. Always looking for collaborators and contribution of others.

Disclosure: I support the team that developed Captum.


That's really cool! My Deep Learning class used Captum for our assignments with GradCAM activations; it's a very convenient tool for interpreting model activations on images.


Thanks for posting. This seems interesting. It takes a while to get to making good points but they provide a good overview of how the various parts of safety, trust, ethics are explored by various explanatory methods like visualization, model distilation, and "intrinsic" methods. It seems like they're not actually addressing the issue of explainability but more the tools that are available for trying to debug extremely large programs composed of matrix multiplications and function applications. It seems like "explainable" in this context really means "debuggable" and "debugging tools".

Because fundamentally neural networks really have a debuggability problem. It's impossible to say if the program/code is actually correct and I'm not sure how visualization is actually going to solve the problem of correctness. If someone explained something to me I'd want to know why it actually addresses whatever problem they claim it addresses and their reasons of appealing to distilled models would not convince me because as long as we are looking at a compressed version of the program then why would we conclude that the larger program is actually correct and never misclassifies a pigeon as a stop sign? So if I can't be sure that a pigeon will never be classified as a stop sign then what has been actually explained and of what value is it?


Lost me at the first sentence.

> Deep neural network (DNN) is an indispensable machine learning tool for achieving human-level performance on many learning tasks.

Not to be pedantic, but words matter. Is anyone actually claiming that deep learning achieves true “human-level performance” on any real world open-ended learning task?

Even the most state of the art computer vision/object classification algorithms still don’t generalize to weird input, like familiar objects presented at odd angles.

I get that the author is trying to write something motivating and inspirational, but it feels like claiming “near” or “quasi”-human performance, with disclaimers, would be a more intellectually honest way to introduce the subject.


> Is anyone actually claiming that deep learning achieves true “human-level performance” on any real world open-ended learning task?

No, but the text you quoted doesn't say that.

Human level performance in this context means humans perform no better than some algorithm on some specific dataset.

Incidentally, that's also how you get to claim superhuman performance on classification tasks. Just include some classes that aren't commonly known in your dataset, e.g. dog breeds, plant species, or something like that. ;)


> No, but the text you quoted doesn't say that [deep learning achieves human-level performance

Uh, it says DNNs are indispensable for achieving human level performance. That clearly implies that this level of performance is achievable, despite all evidence to the contrary.


This is a weird interpretation of that sentence. There are lots of fields where human-level performance has been achieved. See Go, for example.


Maybe you need an RNN to help parse that sentence!! :-)

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

etc


If you've been following the field at all (i.e. who the paper is aimed at), the sentence is obvious and non-controversial. There have been many tasks where deep learning has even exceeded human performance (a stronger claim than that sentence).

>Even the most state of the art computer vision/object classification algorithms still don’t generalize to weird input, like familiar objects presented at odd angles.

"some x are not y" does not invalidate "many x are y"


Words matter, concretely define "open ended"? Did you just add that phrase to preemptively nullify evidence to the contrary?

Deep learning has surpassed human level performance on many tasks [1][2]... (could add more you get the point).

[1] https://www.sciencedirect.com/science/article/pii/S2215017X1... [2] https://arxiv.org/pdf/1502.01852v1.pdf


Agreed. Also, nobody is, or should be, using deep neural networks, for legislation and law enforcement. Explainability should be a core design decision when making an algorithm, and not slapped on top of an inherently black box algorithm. Black boxes and even their explanations are used to launder bias and unfairness. And most of these tricks are not even explanations that can be trusted. "Oh look, the cat's head is highlighted, so that's why this picture was classified as a cat!" no insight, no justification, just hoping the network learned some higher level features like humans do, but oh no, when we flip the picture it is suddenly a dog, and when we photoshop the background to be snow, now it is suddenly a polar cat or a pinguin.

Let deep learning do what it is good at, without explaining their performance and errors to anyone: invading your privacy on social networks, helping hedge funds make more money by analyzing Elon Musks tweets, and building military surveillance.

Leave the justifications and explanations to inherently white box models (they are nearly as good in performance as black box now, at least for structured data), and hold off on firing radiologists for a few decades, even though your train set performance is overfitted to be on par with "human-level".

Somehow, somewhere, the deep learning revolution started to drink its own kool-aid and became alergic to critique or solid verifiable computer science. Explainable deep learning does not exist, since half of the time the engineer that build the system can't even explain why it works in the first place. "Strong inspectable feature engineering is hard and time-consuming, so here we shook a box of legos a million times, burned six holes in the ozon layer, and out comes a deep net optimized with gradient descent". End-to-end learning is supposed to be really end-to-end, including the explanation.


“Many learning tasks” is a wiggle term. Sure, edge cases exist, but the methods do work impressively well in many cases.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: