Hacker News new | comments | show | ask | jobs | submit login
Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem (ssrn.com)
6 points by miraj 11 months ago | hide | past | web | favorite | 10 comments

I consider it very important to fight bias in AI, but I fundamentally do not understand this article's take.

The thing stopping AI researchers from obtaining unbiased training data is not that we're waiting for laywers to give us permission. The thing stopping us is that the right training data is already hard to find, and unbiased training data is the hardest of all to find because it doesn't exist.

Google, for example, does not need permission. They can acquire and train on basically any data they want if they put their mind to it. And Google is completely shit at deploying unbiased AI. (They write great blog posts and presentations about it! Then they don't do it in their actual products.)

You won't just find a naturally unbiased dataset. The way to fight AI bias is deliberately and artificially, like in Bolukbasi et al. [1]

[1] https://arxiv.org/abs/1607.06520

The term "implicit bias" is a poor choice here.

First, it only appears in the title, but not the abstract, for good reason: they are not talking about implicit bias, which as a concept only makes sense when contrasted with explicit bias. Humans can have implicit and explicit bias. An AI can just have "bias" (well, at least until we get true human-level general AI, and we'll be able to talk to it and differentiate what it does from what it says it does).

Second, "implicit bias" brings in a lot of unfortunate connotations. It is tied up with the IAT (implicit association test) [1], which is highly controversial. That controversy has no meaning here (again, since there is no explicit bias to contrast it to), it only hurts.

Unless I'm missing something, I can only guess that they use the term to grab attention, which is cynical and sad.

[1] https://en.wikipedia.org/wiki/Implicit-association_test#Crit...

The criticisms of the Implicit Association Test don't apply to testing AI systems for racial bias.

You ask a word2vec-based system about predominantly-black names and it associates them with negative emotions, gives them negative sentiment scores, hides their comments with PerspectiveAPI, etc. Predominantly-white names are more positive. This is a problem.

The effect is so blatant and so repeatable (it's not about something fuzzy like response times, it's the actual predictions that come out of the system) that you don't have to quibble about the predictive validity of giving the IAT to humans.

You need to actively de-bias your data, because data encodes and propagates the biases of the past.

However, I do not follow this article's argument about why Fair Use will provide automatically de-biased data, especially when it leads with the fact that Google News doesn't.

The problem is then debiasing data. That in itself requires strong AI to work reliably. Dumb methods like say PCA/ICA do not work on multidimensional datasets. (They become unworkably slow and often trade biases between dimensions.)

If you debias manually or declare subspace manually you're actually cheating and potentially introduce your own biases.

Enough of this false equivalence.

You can call what I do "introducing my own biases". I am "biasing" NLP algorithms toward treating people equally whenever possible, toward not discriminating based on their name or how they describe themselves.

The default biases are really bad, and whatever my biases are, they're not as bad.

I don't even know what you say I'm "cheating" at. That is a very weird accusation.

You always make a choice of what algorithms and heuristics to deploy. Deploying a system like pre-trained word2vec or GloVe that believes everything it reads on the Web is a harmful choice. Not even waiting forever for strong AI will fix it. You wouldn't even let a child believe everything that they read, and a child has real intelligence.

When you deploy harmful AI, you can't blame its effects on the code or the data, because the code and data have no moral agency. You wrote the code and it's your responsibility.

Quoting the paper, an example of this "terrible" implicit bias is associating men with programming and women with house keeping.

What exactly does an AI that's free from biases look like? Will it presume for instance, that all men and all women are the same, all the time?

How is that AI at all?

Correct me if I'm wrong, but I think the whole point of AI is to find patterns in the data that can help make decisions that lead to a desired outcome.

If you're not using real real world data, are you really doing AI?

AI is just a tool, a way to automate making more kinds of decisions. The decisions are not better or inherently more right by virtue of taking a human out of the loop. In fact, they're usually worse, because humans are intelligent for real and computers are not.

But the decisions are automated, and that makes more things possible. And sometimes you need to deal with the effects of automation.

Nobody could reasonably declare an AI "free from biases", but you can work on particular aspects. Here's an example: Meetup de-biases their recommender system so that it doesn't automatically recommend knitting instead of coding meetups if you are a woman. [1]

Yes, in the data, there are relatively more women going to knitting meetups and fewer going to coding meetups, but it doesn't have to use that data or its correlates to make predictions. In fact, it shouldn't. If you do like knitting, it can still learn that from your actual personal preferences and not infer it bluntly from your gender.

[1] https://civichall.org/civicist/meetup-counters-invisible-sex...

A little off topic but I find most recommendation systems to be abysmal failures. It's like, as soon as you look into something, they start assuming you're very interested in it, and they don't show recommendations for anything other than things related to what you've looked into before.

Artificial coin flipping.

As they say, garbage in, garbage out. This is especially true in automated decision making. We do not want the system to actually make bad decisions for sake of equality right? Now if you want to debias datasets without hurting accuracy or decision results you really need a true superintelligence much better than human. (People are biased, more or less subtly. Bias is somewhat reduced by taking a weighted or flat average of opinions.)

Indeed, the merits of the science behind Implicit Bias seems suspect.


Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact