
Lime: Explaining the predictions of any machine learning classifier - polm23
https://github.com/marcotcr/lime
======
bjterry
Seeing this reminded me of an episode of TWiML&AI [0] (This week in machine
learning & AI podcast). It features a very interesting discussion of how
Stripe provides an explanation for their black box fraud model to customers
when it chooses to block transactions. They basically have created an
explanatory model which they train against the black box model and then run
against the fraudulent transaction. It outputs the first explanation from a
list of human interpretable explanations that would have changed the fraud
decision from "Fraud" to "Not Fraud." This is one of the most interesting
episodes in the show based on the number of times it has come up in
conversation.

In looking up the show notes for that episode, I see that there is also a
mention of this paper (in the OP), and that the authors were previously
interviewed in TWiML #7 (which I haven't listened to).

0: [https://twimlai.com/twiml-talk-73-exploring-black-box-
predic...](https://twimlai.com/twiml-talk-73-exploring-black-box-predictions-
sam-ritchie/)

------
taliesinb
LIME dates from 2016. For a more recent approach (“anchors”) from the same
authors see
[https://homes.cs.washington.edu/%7Emarcotcr/aaai18.pdf](https://homes.cs.washington.edu/%7Emarcotcr/aaai18.pdf)

------
jimfleming
SHAP[0] is another model-agnostic method for interpreting predictions. It's a
bit newer and builds on LIME, Shapely, and a few other works. There's also an
associated tool[1].

[0] [https://arxiv.org/abs/1705.07874](https://arxiv.org/abs/1705.07874)

[1] [https://github.com/slundberg/shap](https://github.com/slundberg/shap)

~~~
abhgh
Adding to the credibility, this was also accepted to NIPS-2017.

------
crishoj
Essentially the tool seems to remove e.g. individual tokens from the input to
a text-based classifier in order to determine their significance for the
predicted class.

From the documentation:

> In order to figure out what parts of the interpretable input are
> contributing to the prediction, we perturb the input around its neighborhood
> and see how the model's predictions behave. We then weight these perturbed
> data points by their proximity to the original example, and learn an
> interpretable model on those and the associated predictions. For example, if
> we are trying to explain the prediction for the sentence "I hate this
> movie", we will perturb the sentence and get predictions on sentences such
> as "I hate movie", "I this movie", "I movie", "I hate", etc.

~~~
nerdponx
Yes, but the perturbation principle applies beyond text classification. It
also has special message for image data, as well as a general method for
general numeric data.

------
vivin
> The model's decision function is represented by the blue/pink background,
> and is clearly nonlinear. The bright red cross is the instance being
> explained (let's call it X). We sample instances around X, and weight them
> according to their proximity to X (weight here is indicated by size). We
> then learn a linear model (dashed line) that approximates the model well in
> the vicinity of X, but not necessarily globally.

So is this sort of like a local SVM classifier?

~~~
aje403
Not exactly, but you can "sort of" look at it that way, especially depending
on the distance metric they're using to penalize distance from X, and any bias
in sampling perturbed samples that are "near" X - the regression line will fit
in the middle of a bunch of "support points" in the biased direction. It'd be
much closer to an SVM than a regular regression line

