
What is Math washing? - mihau
http://www.mathwashing.com/
======
a_puppy
The article is correct that machine learning doesn't remove bias. If the
training data is biased, the output will be biased in the same way. But
machine learning doesn't add bias, either. If the training data are unbiased,
the output will be unbiased. In this sense, algorithms are fairer than humans.

So if someone wants to argue that a machine learning model is or is not
biased, they should base that argument on how the model is trained. For
example: suppose a bank wants to use a machine learning model to predict who
to make loans to. Historically, human bank managers made those decisions, and
they tended to have a bias against people from the wrong side of the tracks.
There are several possibilities:

* If the bank trains the model on the bank managers' decisions, and it uses ZIP code as a feature, then it will discriminate against people from the wrong side of the tracks just like the human bank managers did.

* If the bank trains the model on the bank managers' decisions, but the only features it uses are monthly income and existing debts, then it will probably be unbiased (although it's still conceivably possible for there to be a bias).

* If the bank runs a controlled experiment by approving loans for 100 people at random, and trains the model on which loans were paid back, then the results of the model will be fair; it will accurately predict how likely people are to pay back loans, regardless of which side of the tracks they live on.

* If the bank trains the model on loans made by human bank managers, but it trains the model to predict loan repayment instead of loan approval, then the algorithm will actually _invert_ the bank managers' biases. If the bank managers never approved loans for people from the wrong side of the tracks unless they were an extraordinarily safe bet, then the algorithm will conclude "people from the wrong side of the tracks always pay back their loans!"

Arguments about machine learning bias should be based on these sorts of
specific details, rather than assuming "algorithms aren't biased" or
"algorithms are biased".

~~~
OtterCoder
I don't think you really understand the kinds of details that AI picks up on.
If that data pool shows that people who apply for loans on a Tuesday and have
a salary whose third digit is 7 almost always default on loans, then neural
networks will pick up on that, regardless of the connection to reality.

An algorithm, especially a trained algorithm, should be assumed biased until
proven otherwise. It's not humanly possible to cleanse and normalize a dataset
to a level that will prevent all possible overfitting ever, and even if you
have godlike precision in squashing irrelevant data, it still depends on your
personal definition of unbiased.

As someone who has been boots on the ground for political surveys, I'd
frequently asked surveys that respondents from both parties would comment were
biased against their own side. I don't trust human judgement, and I don't
trust the judgement of human tools.

------
rdtsc
I like the idea of algorithm transparency. There were a few cases were people
have tried to subpoena the source code from red light cameras, or DUI
breathalyzers.

[http://digital.law.washington.edu/dspace-
law/bitstream/handl...](http://digital.law.washington.edu/dspace-
law/bitstream/handle/1773.1/1069/7WJLTA123.pdf?sequence=4)

It's not clear cut as it seems is that states can side-step forcing discovery
of the source code by arguing that the state prosecutor does not control / own
the source code.

This is of course moot in regards to a private entity like FB, Google or
Twitter. They can ban anyone for any reason. Do they even need the
"algorithms" excuse? Maybe if they ban someone for whom they get a lot of
negative PR, then they can issue an apology "Sorry you feel this way, but
algorithms did it, we swear".

------
klondike_
It's amazing how big companies like Google can be so oblivious to how biased
their algorithms are. See: YouTube "hate speech" detection, Facebook and fake
news.

Algorithms more often then not inherit the same biases their programmers have.
Having those biases determined mathematically makes no difference.

