
Show HN: Learning Explainable Machine Learning Models with Attribution Priors - gabeerion
https://github.com/suinleelab/attributionpriors
======
gabeerion
I wanted to share this work from a machine learning paper
([https://arxiv.org/abs/1906.10670](https://arxiv.org/abs/1906.10670)) we
recently submitted. TL;DR - the idea is that there has been a lot of recent
research on explaining neural networks by attributing importance to each input
feature. We go one step farther and incorporate attribution priors - prior
beliefs about what these feature attributions should look like - into the
training process. We develop a fast, differentiable new feature attribution
method called expected gradients, and optimize differentiable functions of
these feature attributions to improve performance on a variety of tasks.

Our results include: In image classification, we encourage smoothness of
nearby pixel attributions to get more coherent prediction explanations and
robustness to noise. In drug response prediction, we encourage similarity of
attributions among features that are connected in a protein-protein
interaction graph to achieve more accurate predictions whose explanations
correlate better with biological pathways. Finally, with health care data, we
encourage inequality in the magnitude of feature attributions to build sparser
models that perform better when training data is scarce. We hope this
framework will be useful to anyone who wants to incorporate prior knowledge
about how a deep learning model should behave in a given setting to improve
performance.

