
To study possibly racist algorithms, professors have to sue the US - nols
http://arstechnica.com/tech-policy/2016/06/do-housing-jobs-sites-have-racist-algorithms-academics-sue-to-find-out/
======
danso
Interesting...I would've guessed that this was referring to propietary
algorithms used by the criminal justice system to determine recidivism [1],
but it's actually about pushing back against the CFAA, which may restrict
researchers' ability to write scrapers and deploy bots (and create fake
profiles on services).

[1] [https://www.propublica.org/article/how-we-analyzed-the-
compa...](https://www.propublica.org/article/how-we-analyzed-the-compas-
recidivism-algorithm)

~~~
yummyfajitas
The pro-publica article is self-debunking. The R script written by the authors
is unable to find statistically significant evidence of bias.

[https://www.chrisstucchio.com/blog/2016/propublica_is_lying....](https://www.chrisstucchio.com/blog/2016/propublica_is_lying.html)

That's probably why the article is nothing but a bunch of anecdotes.

~~~
marketforlemmas
So I've read the PP piece and your article, and I think your criticism is way
off-base. The most technical part of your argument relies on a p-value being
.057 vs .05, which is not a good one. No one seriously believes that .05 is
magical number that determines true from false; things that are close to it
but not quite below .05 are not automatically false.

They go on to give supporting analysis in form of the false positive and false
negative rates by race, which is pretty compelling evidence. You claim to not
believe that because you cant find it in the notebook but its literally right
underneath the Cox model section.

I was intrigued by this article and went a step further to plot the ROC curves
and the evidence is solid. It's messy, but you can see it here
[https://github.com/stoddardg/compas-
analysis/blob/master/my_...](https://github.com/stoddardg/compas-
analysis/blob/master/my_analysis.ipynb) in cell 78. Its quite clear that the
algorithm is choosing a different point of optimization on the ROC for white
people (a more lenient one) than for black people. A white defendant with a
risk score of 5 is as likely to commit a crime as a black defendant is with a
score of 7. That's an obvious case where you could simply relabel and be more
fair but their algorithm chooses not to.

I also hate when people abuse bad statistics and reasoning to sell page views.

~~~
yummyfajitas
I have plenty of criticisms of hypothesis testing and p-values. Nevertheless,
if you choose to run that type of analysis, _do it right_ \- this means
sticking with your analysis and not using weasel words like "almost
statistically significant" when it doesn't come out the way you want.
Incidentally, the real p-value is 11.075% since they ran _two hypothesis
tests_ and didn't adjust for multiple comparisons.

Your analysis might be right - if so, that's interesting. I'll take a closer
look and write a followup piece if true - among other things glancing at your
ROC curve suggests they are pretty close, and perform better for whites in
some regions and better for blacks in others. But it's 7:30AM (pre-coffee) and
I haven't looked closely yet.

But since PP did not do any of this, my criticism of them holds - they ran an
NHST, got the wrong result, and then spouted a bunch of anecdotes instead of
admitting that their analysis went against what they wanted to find.

~~~
danso
What's your response to the GP that you seem to have missed the part where
false positive/negative rates in the notebook? In your blog post, you said
this:

> _Finally, the article includes a table of false positive probabilities (FPP)
> and false negative probabilities (FNP). This may or may not be evidence of
> bias - the authors would need to run a statistical test to determine that,
> which they don 't. In fact, I can't even find the place in their R notebook
> where they did that calculation. Is this the result of bad statistics? Is it
> merely random chance? Who knows!_

Looking at PP's Jupyter Notebook, the calculations seem to be performed at
lines 50 onwards (if you're referring to the table that I think you're
referring to).

FWIW, those "weasel words" you allege are in the writeup of the methodology,
where the audience is expected to follow along and see how the 0.057 is
calculated. I'm not sure how you're interpreting that calculation...My read is
that it's not the bedrock from which all of the other analyses are based from.
Where in the story do you see that particular calculation being used as the
main (or even ancillary) thrust of the piece?

~~~
yummyfajitas
Aggregate false positive/false negative rates don't prove anything. They can
be caused by composition differences, which the analysis demonstrates the
existence of.

But I'm sure they look convincing to ProPublica's readers who are not
statistically sophisticated.

~~~
danso
What's even more concerning is that these statistical noobs sometimes look to
experts, unaware that these math geniuses may lack the literacy to read a
plaintext notebook to the end.

------
oldmanjay
What does this have to do with racist algorithms? I know that Ars is big on
pushing whatever the latest progressive agenda is, but the connection is
tenuous even considering that.

~~~
maldusiecle
The article cites academics looking at "possible racial discrimination in
online advertising for housing and employment." Since online advertising tends
to be selected algorithmically, I think the connection is pretty clear.

~~~
toehead2000
Unfortunately whatever algorithm computers come up with to assess
creditworthiness is going to be "racist" by the broad disparate impact
definition of racism that the government uses. Ironically, the algorithms will
have to be made actually racist to correct for this.

------
nieksand
Umm. They want to commit click and impression botnet fraud without
consequences? Do they plan on compensating the advertisers after their
studies?

~~~
justinlardinois
Are advertisers owed compensation for non-humans wasting their ad exposures?

If you're a site owner and you're using bots to screw over your advertisers,
that's one thing.

But otherwise, what's their claim here? I'm sure web crawlers interact with
ads all the time in the course of their crawling, and no one's complaining
about that.

~~~
ep103
To the contrary, advertisers complain about it constantly. But they're only
one piece of the ad industry, and everyone else in the industry is okay with
it, if only because its another methods of auditing the advertisers.

------
xufi
This famously reminds me of this fiasco
[http://blogs.wsj.com/digits/2015/07/01/google-mistakenly-
tag...](http://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tags-black-
people-as-gorillas-showing-limits-of-algorithms/)

