and there is def an example already of doing it with gradient based techniques but I'm having trouble finding it!
3. It's cool to see someone do this with from-scratch code, since gradient based explanation techniques are very complicated and also have a lot of variance from one technique to another.
Yes. Captum is a great library and a few of my colleagues have used it with good results in the past. Like you mention, most of the few examples that demonstrate gradient based explanation methods for Huggingface models typically focus on Pytorch models. The example here looks at things from the Tensorflow 2.0/Keras perspective.
One thing to note is that model agnostic SHAP can be resource intensive to compute , especially compared to gradient methods that require a single pass through the model for a datapoint.
One weak point I see - this tool only measures how much an individual input token would be changed to decrease the loss. The reason a token might have large gradients might be related to how many times it appears in the training set and how consistent is the training set with the evaluation set, not just how much the prediction disagrees with the target label.
So it jointly measures data coverage, consistency and data to target fit. Just my intuition, might be wrong.
1. Huggingface models are supported by Captum - a framework for gradient based explanations of any pytorch model: https://captum.ai/tutorials/Bert_SQUAD_Interpret
2. There are several huggingface "spaces" which show-case in the browser the ability to do model explanations on huggingface models using a variety of techniques, such as with LIME: https://huggingface.co/spaces/Hellisotherpeople/Interpretabl...
or with SHAP: https://huggingface.co/spaces/Hellisotherpeople/HF-SHAP
and there is def an example already of doing it with gradient based techniques but I'm having trouble finding it!
3. It's cool to see someone do this with from-scratch code, since gradient based explanation techniques are very complicated and also have a lot of variance from one technique to another.