
Be Careful What You Code For - miraj
https://points.datasociety.net/be-careful-what-you-code-for-c8e9f3f6f55e#.mow3f6y8l
======
beat
The money quote: "I don’t care what your politics are. If you’re building a
data-driven system and you’re not actively seeking to combat prejudice, you’re
building a discriminatory system."

This is extremely important. The weight of existing prejudice drives and
reinforces future prejudice. Garbage in, garbage out. If you start with
biased, flawed data, and you look for patterns like the biased, flawed data,
you'll just add to the biases and flaws. You need to account for the poor data
quality explicitly.

~~~
selectron
This proposal shows more concern about being politically correct than actually
correct. A truly unbiased system would just make the best analysis possible
given the data available. What you propose is to make a biased system in order
to give an advantage to one group of people rather than the other group. This
type of political correctness fuels resentment.

~~~
Joof
If the data itself is bad and known to be bad, the results will reflect the
bad data.

In the case of policing, the arrest record is racially biased (the US releases
data that shows this bias and it's well researched). If you train on arrest
records, it won't predict actual crime because the results will be racially
biased.

Being unbiased is actually a very difficult problem. Especially when we are
predicting things rooted in social / cultural trends.

~~~
teacup50
There's a difference between something being "offensive" because it's
_incorrect_ , and being "incorrect" because it's _offensive_.

Our job is to be correct. If we avoid offense in the process, great.

------
PeCaN
I feel like this is a politically-correct rant thinly disguised as programming
related.

> I don’t care what your politics are. If you’re building a data-driven system
> and you’re not actively seeking to combat prejudice, you’re building a
> discriminatory system.

Fuck that. Build _correct_ systems first. If you’re making a system that
intentionally distorts data for the sake of “combating prejudice”, you’re
_lying_. That doesn’t help anyone.

~~~
MereInterest
What you call "intentionally distorting data", I would call "correcting for a
confounding variable".

Unrelated Example: Every morning, I weigh myself. Today, I found that my scale
is inaccurate, and understates my weight by 5%. Therefore, if I am to predict
my weight in the future, I must account for the inaccuracy of the scale.

Related Example: There exist police statistics for arrests. Studies have found
that arrests are racially biased, and therefore those police statistics do not
accurately represent the crime rate, only the biased arrest rate. Therefore,
if I am to predict the crime rate in the future, I must account for the
inaccuracy of the data set.

~~~
programmarchy
But the problem you describe isn't biased data, it's state law enforcement
agencies practicing discrimination. Seems like it'd be more effective to
strike at the root, not the branches.

~~~
MereInterest
Whether or not data are biased depends on what you want to measure. If you are
trying to measure the arrest rate, then the arrest rates are an unbiased
predictor. If you are trying to measure the true crime rate, then the arrest
rates are a biased predictor.

I certainly agree that racially motivated arrests are a large issue, and
should be dealt with. Avoiding misinterpreting data is a widespread issue that
in these cases happens to have prejudicial consequences.

------
niftich
I don't care what your politics are. If you're building a data-driven system
and you forget that empirically observed covariation is a necessary but not
sufficient condition for causality, you're building flawed system, which is
inaccurate at best, discriminatory at worst.

------
NetTechM
It seems perfectly reasonable to station police in places which are
indisputable hotspots for crime.

I mean, is it discriminatory to say that Compton has a lot of crime and police
should patrol it?

Or is this just loosely using the tech community as a demographic to drum up
clicks while scraping the bottom of the barrel as far as looking for topics to
write about?

~~~
rconti
Hotspots for crime? Or hotspots for arrests?

~~~
PeCaN
Why not both? What if all races commit roughly proportional amounts of crime,
but there's a higher _density_ of crime in black neighborhoods?

There are way too many variables to say anything about whether “blacks commit
more crime” OR “police arrest blacks too much”.

------
jordigh
There's those funny/sad stories of how face recognition software has
categorised black people as either invisible[1] or gorillas:

[http://www.businessinsider.com/google-tags-black-people-
as-g...](http://www.businessinsider.com/google-tags-black-people-as-
gorillas-2015-7)

[http://edition.cnn.com/2009/TECH/12/22/hp.webcams/index.html](http://edition.cnn.com/2009/TECH/12/22/hp.webcams/index.html)

The prejudice of the programmer (unintentional, as all prejudice really is)
can show up in the code just because they forgot to include enough data in
their training set.

\---

[1] In light of which Ralph Ellison's novel is particularly prescient:

[https://en.wikipedia.org/wiki/Invisible_Man](https://en.wikipedia.org/wiki/Invisible_Man)

~~~
prodigal_erik
If you forget I exist, you are not judging me in any way. If my training data
consists of one face, I'm obviously incompetent at machine learning but not
prejudiced against everyone else in the world.

------
analognoise
Ugh, more garbage from the technical bubble children who are perpetually an
exposed nerve.

------
hueving
"ignore the data and do what's good for the feels."

If you are writing a system that is designed to predict arrests for any reason
other than telling police officers where to go, then arrest records are
precisely the data you want to use without these political correctness
coefficients. So advertising legal help for criminal arrests to someone that
matches arrest demographics is the correct thing to do. Anything else is
stupidity.

~~~
CoryG89
This is a good point. If you are trying to design a system that "tells police
officers where to go" then you shouldn't be using data to predict arrests, you
should be predicting crime, which we don't have unbiased data for.

------
rndmize
I thought this was going to be something interesting until I hit the
"Environmental Consequences" part. I mean, really? We have environmental
problems as a result on agriculture and manufacturing and transportation and
construction and literally any other industry, and you're going to complain
about the cost of running some servers? We are _orders of magnitude_ away in
impact. Actually, I suspect its even worse - the more computing power we make
available to those industries, the more efficient they become. Please, tell me
how making digital designs is environmentally worse than physical prototypes,
or simulations are less power efficient than real-world tests, or data-driven
irrigation is more wasteful than spraying water into the air for hours.

