
Fairwashing: The Risk of Rationalization - lbeziaud
https://arxiv.org/abs/1901.09749
======
lbeziaud
Abstract:

Black-box explanation is the problem of explaining how a machine learning
model -- whose internal logic is hidden to the auditor and generally complex
-- produces its outcomes. Current approaches for solving this problem include
model explanation, outcome explanation as well as model inspection. While
these techniques can be beneficial by providing interpretability, they can be
used in a negative manner to perform fairwashing, which we define as promoting
the false perception that a machine learning model respects some ethical
values. In particular, we demonstrate that it is possible to systematically
rationalize decisions taken by an unfair black-box model using the model
explanation as well as the outcome explanation approaches with a given
fairness metric. Our solution, LaundryML, is based on a regularized rule list
enumeration algorithm whose objective is to search for fair rule lists
approximating an unfair black-box model. We empirically evaluate our
rationalization technique on black-box models trained on real-world datasets
and show that one can obtain rule lists with high fidelity to the black-box
model while being considerably less unfair at the same time.

