Hacker News new | past | comments | ask | show | jobs | submit login
Discrimination in the Age of Algorithms (arxiv.org)
49 points by martinlaz 21 days ago | hide | past | web | favorite | 44 comments

The paper's authors have the best of intentions, but to me race-aware algorithms programed for social justice sounds completely dystopian.

If a black man and a white man rob a bank together, is it fair that they receive the same sentence? Or is it fair that a sentencing algorithm give the black man probation and send the white man to jail to correct for disparate outcomes based on race?

Affirmative action stands for affirmatively elevating the status of disenfranchised groups to equality, not above it. In the example you gave, that would mean giving both of them probation, since probation is what the privileged group is afforded.

(As an aside: our current eagerness to imprison people for relatively minor crimes strikes me as fundamentally unfair, especially in light of our inability to provide corrective justice. As such, while it would be equal to send both to prison for robbery, I don't think it would be correct to call it fair.)

That doesn't really work. It's impossible to restore individual justice by using statistical numbers as KPI. How'd you know the system became "fair"? By comparing the racial ratio of incarceration to the ratio in the general population? That doesn't make any sense if you account for disproportionate representation of black people on the lower end of income scale, which is generally associated with violent crime and reoffending. What target level of disproportionate incarceration of blacks would you target? Or would you prefer giving white people harsher sentences just for the colour of their skin? Statistics may point to problems (so we do investigate why exactly black people are more likely to get in prison), but simple ratios aren't nearly enough to bring justice.

I'm not interested in ratios or statistics when it comes to justice. I'm interested in fairness, which entails doing right by individuals.

One of the ways that we can do right by individuals, especially individuals who have been and are currently disenfranchised, is to take affirmative steps towards improving their lives. One of those affirmative steps (and a relatively easy one at that!) is simply not punishing them more harshly than we would the privileged. Nothing about this entails treating white people (or any particular privileged group) worse.

The problem you mentioned is not the one that is being tackled. The real challenge is that being blind to just race isn't enough. For example, zip code is often highly correlated with race, so using zip code to predict whether an applicant would repay a loan may implicitly be discriminatory. We need to know race to determine whether a given covariate is okay to use. But we would usually not use race to predict outcomes (except possibly in a limited number of affirmative action settings, such as college admissions).

Ok, help me reason though this. Let's say that native americans are poorer on average and therefore less likely to repay a $1k loan. Therefore, if you give out loans based on the probability that someone will be able to pay them back you will be discriminating against native americans (not the individual native americans with good credit scores, but you will be denying more native americans on average). So, you shouldn't try to compute how likely it is for someone to pay their loan back. I keep hearing things like, "you can't use zipcodes as input because zipcodes are correlated with race and it is illegal to discriminate on race." However, some races have better financial situations on average, which means that financial situations correlate with race, and by the same argument you shouldn't take finances into account when you review a loan application.

Obviously that is absurd, but where does it go wrong?

Nope. You are exactly correct. The way to approach removing bias in all personal financial decision making, algorithmic or otherwise, is to only allow it to be based on previous individual financial behavior.

The solution is pretty much that simple. I would argue that all correlative models dealing with human beings will always be discriminatory.

Will loans become more expensive for many people? Sure. But that is the true solution. Sometimes the right decision is simple but unconfortable. In my opinion, this is one of those times.

>is to only allow it to be based on previous individual financial behavior

Your financial history includes your rent or mortgage payments, which are for living at your zip code. Restriction to financial history is not all that much of a restriction.

In principle I agree with what you are saying. We have to be a bit careful, however, since past financial history may have itself been biased by prejudice, e.g., if you were unfairly denied a job based on your race. Of course, we sometimes have to draw the line somewhere, and personally I agree that using previous financial behavior probably is fair.

Where things go wrong is when race comes into play to begin with. In your example it is clear that credit scores - or whatever equivalent is used in the country in question - are a factor in deciding whether an applicant should be allowed to take out a certain loan. That check should be done on base of that applicant's credit score irrespective of where he lives, what car he drives, what his name is, what colour his skin has or any other unrelated characteristic. While there might be relationships between average credit score and the aforementioned data points they are not in itself indicative of the likelihood of the applicant defaulting on the loan. The credit score is a better one, as is their income for the last X months and other similar data.

It might well be the case that a totally 'colour-blind' scoring algorithm still ends up sorting people by one of those mentioned, unrelated data points. This is not racism, it is just the consequence of the mentioned relationships. Calling it racism is a sign of ignorance and does a disservice to the effort to get rid of true racism.

The problem is, e.g. banks will want to use all the data they can to make a decision on what sort of loan to offer you. Employment history, travel history, family medical history, as much data on your spending habits as they can get, how much money your relatives have, which properties you own and are they likely to rise or drop in value, where, when, and how you drive (which affects likelihood of having a car accident -> unexpected expenses or death -> affects loan repayment), which websites you visit, how much time you spend on your phone, in bars, etc. etc..

Each has a plausible justification for inclusion. But the more data points you have, the better you're able to predict race from them (or anything else you're not supposed to).

So how do you decide if an algorithm is discriminatory? Maybe a whitelist or blacklist of what data it can use?

I think we can use dangerous data in safe ways. If you use how you drive only to predict likelihood of car accident that's ok. If you use it to predict future wage growth it's not ok. If you use it in a blackbox model it's not ok.

If you use time in bar to predict cirrhosis it's ok.


>If you use how you drive only to predict likelihood of car accident that's ok.

We may say that now, but what if later some algorithm finds there's a hidden correlation between driving performance and race/gender? will it still be an ok metric then? where do we draw the line?

I said that expecting such a correlation to exist.

The line is between inferences of bad behavior -> bad result such as "you drive risky so you must be at risk of an accident" and pure correlation "you drive risky so you must be a man so you must be at increased risk of prostate cancer".

And conclusions should be assumed to be of the latter kind unless shown otherwise.

Your personal financial history is a lot more specific than your zip code. They may be on different ends of the same scale, but it seems far more defensible to make choices based on financial situation than it does zip code.

Setting aside for a moment the fact that your financial history includes your home loan which includes your home as collateral which is located in your zipcode, suppose that the model got more accuracy by looking at the zipcode. Isn't that all the defense that's needed? What is meant, exactly, by "defense?"

AFAIK the credit information in your file does not include zip code as part of the home loan accounting.

Defense means 'what are you going to say when the feds come knocking and want proof you are not discriminating against a protected class?'

As a practical matter it isn't too hard to come up with a legal defense for that, of course. Lots of plausible deniability here. IMO most of this effort is by people trying to help prevent computer nerds from becoming (or staying, as we may already be there) complicit in discrimination by race while hiding behind algorithms as if they are somehow infallible.

This part doesn't make sense to me:

want proof you are not discriminating against a protected class

If black people are poorer on average, then even if the system takes only takes personal history in account, black people will inevitably get worse credit rating on average. It just follows from the premise, unless banks participate in some form of race-based redistribution which I find distasteful. How's it supposed to work in your opinion?

This is what makes fairness so challenging. There isn't a clear place to draw the line, where certain covariates are permitted and others are not. Intuitively, what you are saying makes sense---your financial situation is a consequence of your choices, whereas things like zip codes are correlated both with your choices (i.e., I chose to live here) but also your race (e.g., due to lasting consequences of racial bias). Thus, we might decide that it's fine to use financial history, but not zip code.

Of course, things are not so clear cut. Your financial situation can also be a consequence of discrimination, e.g., if you were denied a job or unfairly arrested. At the end of the day, some human has to sit down and decide what is OK and what is not. Personally, I believe the goal of research on algorithmic fairness should be to give the people who will ultimately make these decisions (e.g., judges, politicians, etc.) the tools to understand both broadly, the kinds of things that can go wrong when using algorithms, and also to understand what might have gone wrong in a specific situation.

I don't know about this. I'm on board with not using race as a covariate -- people are born with their skin color and to penalize someone's loan application over something they're born with and have no control over is unfair.

But zip code? I understand there is empirical data showing how race is correlated with zip code, but you still choose your zipcode at the end of the day. If you have data saying that applicants from "Zipcode X" are less likely to repay loans, why is one's choice of zipcode out-of-bounds but their choice to have a hard credit check several months ago not? What if a poor credit history is correlated with race (fwiw I don't know if it is or not)? Is it then discriminatory to use credit history as a covariate?

The truth is that machine learning to predict people's behavior is inherently discriminatory and involves generalizations by definition. Once you start disallowing training on covariates that exist due to voluntary behavior, it's not really clear whether you should be allowed to use machine learning to predict individual behavior at all (I'd probably listen to that argument actually).

There's clearly a tradeoff to be made in terms of how much discrimination is allowable in exchange for a given unit of predictive performance, but that sounds incredibly difficult to regulate for every business's ML problem. I don't know what the answer is, but I think it's hasty to argue that anything that's correlated with race is out-of-bounds.

> I understand there is empirical data showing how race is correlated with zip code, but you still choose your zipcode at the end of the day.

Unless you live in a city that was redlined[1], which is to say, unless you live in virtually any city. In which case, you really did not and in many cases still do not actually have much of a choice where you live. And that's without even looking at the fact that the racial wealth gap means that even if there weren't still strong barriers to integrated neighborhoods, lots of members of groups that were and are discriminated against would be unable to move into the neighborhoods that would theoretically let them escape that zip code bias.

[1]: https://www.chicagomag.com/city-life/August-2017/How-Redlini...

You make a good point, but you overstate the effects of the wealth gap. Going by absolute numbers, there are more white people living in poverty in the US than any other ethnicity [1], so you should be able to find a majority-white slum and move there. Of course, having to do so would be (is?) ridiculous.


Because zip codes in themselves do not get assigned based on data relevant to whether someone is likely to pay off a loan, while credit do. Base decisions on data relevant to the matter at hand, not on unrelated data.

As an example, some 'professional' sports are dominated by people of certain skin colours. Basketball is predominantly 'black', ice hockey predominantly 'white' while soccer is more representative of the population at large. If you're looking for people who have the potential to become good basketball players you might look at data on their athletic achievements, on their length and other such things. I don't think there is a need to have a dark skin to be able to play basketball at a high level even though the majority of players seem to have such. I assume scouts for the NBA, NHL and MLS don't profile people based on their skin colour but on the aforementioned things and others related to a person's ability to play the sport.

[1] http://www.chineseorjapanese.com/wp-content/uploads/2009/04/...

Ironically I think algorithms would do the opposite, with the training data that is current justice system antics.

It's not so much about programming for social justice, but accounting for biases that are present in the current status quo and that end up in training data

There was a recent talk from a Meetup engineer about this but I can't find the link right now.

Then shouldn't the algorithms incorporate the unbiased data rather than incorporate something we know is biased to correct for other biases?

A question. How do we know the algorithm is biased if it give a racial disparate result? Don't tell me the evidence is that the result is racial disparate.

There usually isn't unbiased data (think: law enforcement databases), though you could maybe remove the variables more likely to be discriminated against and train on that

There are statistical analysis that can answer your second question, and it's not only based on disparate results (though that's probably part of it). And the solution is not "changing the signal" of the bias

If there are statistical analysis that can answer if the data is biased why isn't the focus of attention all biases rather than just biases that may result in racial discrimination?

Shouldn't all the arguments be over the data and any biases present (and how to obtain unbiased data) and not the result?

As soon as you label people based on superficial personal traits that are outside of their control and without their consent, that's discrimination.

If you start giving some people special treatment just because of their ethnicity or gender, you're going to be opening up more cracks in the system for other people to fall through - and it will hurt those people even more because they will rightly feel that the whole system is working against them personally; down to the core specifics of who they are (unfortunate individuals who don't fit under any special label).

Modern anti-discrimination approaches are often racist or sexist. They should not try to classify individuals into superficial groups, instead, they should be decided on an individual case-by-case basis based on the individual's history.

The root of injustice is simply bad luck. Ethnicity and gender are only loosely correlated with bad luck but there is no causal relationship between them (not so much anymore at least).

Anti-discrimination efforts should be aimed at averaging-out the effects of bad luck in people's lives; so that means we need a way to quantify luck on an individual basis. It's a difficult problem to solve, but it can't be solved by making gross generalizations.

> The root of injustice is simply bad luck. Ethnicity and gender are only loosely correlated with bad luck but there is no causal relationship between them (not so much anymore at least).

This is wildly ignorant and ahistorical. There are literally thousands of books, articles, studies, movies, etc. disproving this. Literally just this week a study was released showing that predominantly white school districts collectively receive $23 billion more in funding than predominantly black districts, and that: "For every student enrolled, the average nonwhite school district receives $2,226 less than a white school district,".[1] That isn't bad luck; that is a system designed to reinforce a system of racial segregation, and it's just one example of thousands, all of which are a moment of research away if you actually cared about this.

[1]: https://www.npr.org/2019/02/26/696794821/why-white-school-di...

The NPR story you site contains this quote: "As Rebecca Sibilia, founder and CEO of EdBuild, explains, a school district's resources often come down to how wealthy an area is and how much residents pay in taxes."

Based on that, any causal relationship between race and school funding is far from simple.

> That isn't bad luck; that is a system designed to reinforce a system of racial segregation

Who you believe is designing the "system" for the purposes of segregation, and what evidence do you have to support it? Asking because everybody I've met who works in education seems to genuinely want all students to thrive.

You can only say "The root of injustice is simply bad luck. Ethnicity and gender are only loosely correlated with bad luck but there is no causal relationship between them" if you ignore the years of segregation and red-lining and other racist policies designed to keep the Black man down.

The root reason why those schools get more funding is not because the kids are white. It's because the kids' parents are rich.

There are valid historical reasons to explain why there is a correlation between wealth, ethnicity and gender but the causality is not there anymore. My point is that a poor white kid will be just as disadvantaged today as an equally poor kid of any other race.

>There are valid historical reasons to explain why there is a correlation between wealth, ethnicity and gender but the causality is not there anymore.

Why not? What allows you to make the claim that the time of extreme disenfranchisement (one generation ago) or ever slavery (two to three generations ago) bear no effect in 2019? Because they are not formally recognised in law?

>My point is that a poor white kid will be just as disadvantaged today as an equally poor kid of any other race.

Some would say that the white kid still has access to structural privileges afforded by society through extra-legal means. That's not to say his situation isn't bad, but it's probably less bad than the poor black kid's fate. This is reflected in biases about intelligence (for instance when an employer is deciding whether or not to hire), for instance.

I have an issue with this style of reasoning. To me it seems just incredibly lazy to point at some arbitrary statistical fact and infer a "system designed to reinforce racial segregation". A system might be in place, but that sole number doesn't prove that. It just says something about the incredibly complex world we live in, but nothing about cause and effects or how world is supposed to be.

Same goes for "there are multiple books" thing. That statement by itself doesn't reinforce your point, both because you aren't specific enough with that generic "books", and because there is a multitude of books written about vaccines causing autism. Book authors can be wrong.

I'd also like to point out that the very article you've posted contradicts your point.

As Rebecca Sibilia, founder and CEO of EdBuild, explains, a school district's resources often come down to how wealthy an area is and how much residents pay in taxes.

"We have built a school funding system that is reliant on geography, and therefore the school funding system has inherited all of the historical ills of where we have forced and incentivized people to live," she says.

So basically school funding partially depends on how affluent the area is. Due to various, mostly historical, reasons poor people are disproportionately black and so schools that have lower funding tend to have more black pupils. There is nothing requiring a grand racist conspiracy in this explanation, limited social mobility and income disparity are sufficient to explain this phenomenon. And you can never, ever be able to understand root causes just from that number you've cited. You'd need those causes to fix stuff, unless the only "solution" that you're trying to justify is "white people bad and rich, black people poor and good, take money from white people, give it to black people". That one isn't really helpful.

From Jon Kleinberg also, I highly watching his recommend his presentation [1] on the inherent trade-offs when trying to achieve algorithmic fairness.

It's a little more technical than this paper, and presents very well why removing discrimination in algorithmic decision making is a complex task.

[1] https://www.youtube.com/watch?v=4X3Z7FPwkA8

Note that this paper is arguing for race-aware algorithms, not removing discrimination.

If I'm understanding the authors correctly, they're saying that blind algorithms don't have equal outcomes by race -- so they're reintroducing race to the algorithms so they can adjust for disparate outcomes.

Basically they're arguing for affirmative action algorithms.

it is way more complicated than this. i know i underestimated how complicated these issues can be at first. there are many different failure modes.

the race-blind models aren't necessarily really race-neutral; they could implicitly have something like "affirmative action" in either direction, and reintroducing an explicit race variable could be the only way to get rid of it. or the opposite might be true! it takes real care to get this stuff right whether notion of equality you aim for.

This cuts both ways, of course - in some ways, I think we want automated systems to be able to discriminate. A good example is speed cameras - in days past, a man speeding to get his wife to the maternity ward could be pulled over and then hurriedly wished good luck, maybe even getting an escort. These days, he gets a fine in the post three weeks later.

Isn't it better now, since they arrive at the maternity ward sooner?

How would an automated system actually deal with that scenario?

In theory, the same way as a policeman - assessing the situation and the context, and then making a decision not to issue the fine.

But in practice, I don't think it can, without having total surveillance capabilities - not to mention essentially sci-fi quantities of AI advancement.

In principle both systems permit this kind of correction: you could contest the ticket by going to court.

There should always be a human in the loop when justice is meted out. There's no way to write a law or an algorithm that covers every circumstance. Unthinking "by the book" enforcement is tyranny. (Like those wretched "zero tolerance" laws, where everyone knows they are unjust, but feel impelled to literally interpret the law.)

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact