
Delayed Impact of Fair Machine Learning - jonbaer
http://bair.berkeley.edu/blog/2018/05/17/delayed-impact/
======
joe_the_user
"Machine learning systems trained to minimize prediction error may often
exhibit discriminatory behavior based on sensitive characteristics such as
race and gender. One reason could be due to historical bias in the data."

The thing about this discussion is that it aims to balance social welfare
goals and profit maximizing goals but without any criteria for attaining basic
fairness on a general level.

To wit, suppose a machine learning algorithm is looking at twenty or fifty
piece of data about a given individual, all of which are entirely irrelevant
to the individual's chance of repaying a loan. But by random chance, one of
those pieces of data, say handedness, happens to be correlated with a group's
repayment history. So, some individuals with a good handedness are given loans
more frequently and some individuals, with a bad handedness, are given loans
less frequently. This situation doesn't matter to the company since the data
is irrelevant and they only give out so many loans anyway and shitting on,
say, left-handed people, gives them no grief. Moreover, if this trend is noted
by all the companies soon it will be made "true".

Which to say, companies making life-defining decisions like mortgages or
parole-granting, just should be prohibited from the using a lasagna of random
data to make their decisions and instead should be required to use specific
rules with specific reasons behind them.

And cry me a river about missed chances for optimization. This about the
structure of society and optimization doesn't benefit society here imo.

~~~
lambdaphagy
This isn't about overfitting. Race is almost exactly the opposite problem: the
signal in the data is so strong that you can't help but pick it up, even after
you censor everything you can think of. If you try to predict any kind of
consumer behavior from a racially heterogeneous dataset you will end up
finding something that correlates with race, because practically everything
correlates with race.

~~~
paulsutter
> practically everything correlates with race.

Hold on, excuse me?

~~~
mlevental
lol. what this person is basically saying is that racism affects almost every
part of a person's life. I know it seems unbelievable but "practically
everything correlates with race" is the most nerd-intelligible way I've seen
of explaining/expressing that racism is pervasive.

~~~
lambdaphagy
That's not what I'm saying.

~~~
dang
You've been using this account primarily for political and ideological battle.
That's destructive of what HN is for—regardless of your ideology—and we ban
accounts that do it, as explained here:
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html).

If you want to keep commenting here, please (re-)read the guidelines and use
this site as intended from now on. The intention is intellectual curiosity,
and that is the first casualty of ideological war.

~~~
lambdaphagy
Apologies, and thank you for the correction. My comments betray a lack of
intellectual curiosity and I will strive to harmonize them going forward.

~~~
pokemongoaway
Wow :(

------
ralusek
If you use data such as repayment history, income, assets, and other metrics
highly related to whether or not somebody is going to pay back a loan, and the
resulting model does not output equal representation among arbitrary
population groups, I fail to see a problem. If the data being used to train
the model has absolutely no knowledge of arbitrary group identifiers, such as
"gender" or "race," the resulting output quite simply _not_ biased.

However, if something like "race" or "gender" is actually being used as a
_feature_ input, then the output from most ML strategies is highly likely to
pick up on correlations between racial and gender groups and certain outputs.
That is undoubtedly going to lead to negative discriminatory outputs.

So while I see absolutely no problem with the first scenario I mentioned, and
I clearly see a problem with the second scenario...I'm inclined to believe
that the politics of many people would lead them to have a problem with both.

~~~
pooya13
I am not saying that this is the case in your examples, but sometimes these
"predictors" create an unfair feedback loop. e.g. Cops patrol majority black
neighborhoods more because of higher misdemeanor rate and because misdemeanor
is a strong predictor of crime. This higher patrol rate leads to higher
arrests (for crimes such as having weed on you that might not have a
significant difference among races and neighborhoods). The arrest data is then
fed back into the system, leading to even more patrols, essentially creating a
forced equilibrium. Whereas if we didn't use the model the system might have
naturally evolved to a different state. This is one possibility of why these
models might be effective in the short term but not "fair" (i.e. not effective
in the long term). But in any case, I think it is obvious that the benefits of
a private company (such as the insurance company) are not necessarily aligned
with the benefits of the society.

~~~
lambdaphagy
Misdemeanors are not a predictor of crime, they _are_ crime. Further, many
types of misdemeanors are quality of life issues that have outsize impacts on
the people who live in those neighborhoods. Go to a town hall meeting in a bad
neighborhood and see what people complain about. It's often not so much
burglaries as the guys who are chronically drinking and breaking bottles on
the corner at 2 am. It's not terribly enlightened to criticize the police for
going after QOL violations you would never tolerate in your own neighborhood.

As for whether police attention creates feedback loops, we can check by
looking at crime victimization surveys, which generally show the same patterns
of racial disparities as arrest records. For that matter, the black / white
murder gap has been stable at about 6-8:1 for as long as we've been keeping
track, and you can't change that by selectively hassling certain people for
weed.

~~~
pooya13
I fail to see how crime victimization surveys disprove possibility of feedback
loops. You can have higher murder rates among the black population, yet have
overall similar misdemeanour rates. You can even have a higher misdemeanour
rate among the black population but have a propostionally even higher arrest
rates due to these models and their feedback loops. Or maybe I am missing
something in your argument? Care to elaborate?

~~~
lambdaphagy
Well, the victimization surveys tell us that for every category of crime that
matters, there are large disparities in base rates that (a) predate policing
decisions and (b) recommend focusing on black neighborhoods, because that's
where the crime is. The idea that higher arrest rates are a self-reinforcing
statistical artifact has no support from the data. The notion that "crime is
where you look for it" is specifically undercut by victimization surveys as
well as by murder, a crime that is largely impossible to conceal from the
state indefinitely. If the baseline disparities are already so high to begin
with, there's hardly any variance left over to be explained by self-
reinforcing patrol strategies.

Put another way, if you took the first-order strategy you'd come up with by
looking at the body count, and then layered the most moustache-twirlingly
racist jaywalking policy that you could think of on top of it, the two
strategies wouldn't look that much different.

That's all before questioning why it is a bad thing for neighborhoods to be
policed for misdemeanors. I lived in West Baltimore for years and I can assure
you that my neighborhood was, if anything, chronically underpoliced.

~~~
pooya13
So now you have moved the goalpost to:

>> There might be unfair feedback loops, but they would be effectively
negligible.

First of all, it is curious to me that you mention the 8x disparity in murder
rate but don't mention that the violent crime survey disparity is less than
0.2x. Secondly, you just claim that the policing has nothing to do with people
being arrested at different rates but do not offer any evidence.

Here are the actual numbers if you really care about facts. The victimization
rates are 20.5% whites vs 24.1% blacks. Less than 18% difference. On the other
hand incarceration rates are 0.7% whites vs 4.5% blacks. That is more than
540% difference. This means the blacks are being arrested at about 32 times
the rate at which they commit violent crimes. So your claim that:

>> "we can check by looking at crime victimization surveys, which generally
show the same patterns of racial disparities as arrest records"

is not only ignorant and false. It is so far from the truth that it is
laughable.

~~~
lambdaphagy
I'd love to respond substantively to this post, and in particular to your
figures, but hn has threatened to ban me for discussing this topic. I suppose
each of us can only make of that what we will. Best wishes!

~~~
pooya13
I would have personally liked to hear your response but I guess it is what it
is. Best wishes to you as well!

