

Racism is Poisoning Online Ad Delivery, Says Harvard Professor - antichaos
http://www.technologyreview.com/view/510646/racism-is-poisoning-online-ad-delivery-says-harvard-professor/

======
jerf
This is one of those cases where I find myself wanting the professor to first
sit down and very precisely say what they think "non-discriminatory" would
mean in this context. The devil is in the details. Just flinging around wild
accusations of racism at Google, this company, and finally society as a whole,
without giving any action items on how one would discharge the accusation of
racism is just being mean, hitting people with a very big stick without giving
them any chance to dodge.

(Perhaps she does somewhere, however, I will freely admit my Bayesian priors
on that probability suggest it is not worth my time to try to find it.)

I'm not demanding this to the nth degree, obviously; I don't expect her to
submit a working patch to Google's results engine. But what amounts to a vague
wave in the direction that can't even be nailed beyond "society" is not
helpful to anyone, just an incendiary attack.

~~~
scott_s
_Sweeney says there are essentially three possibilities. One is that
www.instantcheckmate.com has set up the arrest-mentioning ads to be served up
to black identifying names. Another is that Google has somehow biased its ad
serving mechanism in this way.

A more insidious explanation is that society as a whole is to blame. If
Google’s Adsense service learns which ad combinations are more effective, it
would first serve the arrest-related ads to all names at random. But this
would change if it were to discover that click-throughs are more likely when
these ads are served against a black-identifying name. In other words, the
results merely reflect the discriminatory pattern of clicks from ordinary
people._

Keep in mind that the professor did not write this article, which buries her
methods and conclusions. In her actual paper
(<http://arxiv.org/abs/1301.6822>) she has possible solutions in the
conclusions.

~~~
jerf
Thank you for the link. The conclusions seem to me to be accurately relayed by
the first article, and my priors were accurate. There is still not a statement
that I see of what the desired outcome should be.

Let me go ahead and spell out what I mean. Should the Google search results
give the exact same % of results for each race? Should it accurately reflect
the % of searches? Should it accurately reflect the % of ads put in? (Those
are all 3 very different things.) If so, should Google tweak the percentage of
those ads? If so, in what manner? If a combination, which combination?

The _implication_ in the conclusion is that first one, but in that case, why
does that one override the others? Might in fact be racist to _hide_ the
incidence of these searches, based on race? A case could be made for that,
after all.

These are rich and interesting questions, and it does not simply go without
saying what the desired outcome is.

~~~
anigbrowl
The null hypothesis is described at the beginning of page 4, under the heading
'problem statement': _Our hypothesis: no difference exists in the delivery of
ads suggestive of an arrest record responding to online searches of racially
associated names. Then, when presented with evidence of a pattern to the
contrary, examine the pattern’s credibility, likelihood and circumstances of
occurring._ This hypothesis is briefly restated in the conclusion.

Methods are spelled out beginning on page 10, observations and how they
compare to specific expectations are spelled out beginning on page 20. I can
only infer that you didn't bother to read the paper.

------
hncommenter13
I have spent quite a bit of time working with arrest and incarceration data
that has names attached. Rates of arrest and incarceration are not precisely
correlated to demographics, in the sense that certain "races" (as defined by
the government) are over- and under-represented in the criminal justice
system, for a variety of reasons. Some of this divergence may be explained by
systemic racism, some may be explained by confounding variables (poverty,
education), etc.

All that said, my point is that if one bought ads with keywords for 100% of
the unique first name-last name pairs found among prisoners/arrestees,
"stereotypically black" names would be over-represented relative to the total
population and relative to the internet-using population.

In other words, in my view, this is a symptom of the criminal justice system,
not a racist policy choice by Google's algorithm. Google's algorithm sees only
one color: green.

------
martey
I am a bit disappointed by the comments here. Several of them ask questions
that are clearly explained in the actual paper -
<http://arxiv.org/abs/1301.6822> \- which is linked at the bottom of the
article (albeit broken because of a spurious colon). Some of them even ask
questions that are explained in the abstract.

------
stcredzero
So, after a century of entire societies based on race-based slavery, is
legislative and judicial reform going to completely wipe out racism, or is it
more likely to be subject to some sort of asymptotic decay? Is everyone going
to change their minds instantly about racism, or will it persist in the minds
of large numbers of people? Is racism likely to disappear with everyone's
logical and rational acknowledgement, or is it more likely to to persist in
forms which are deniable and publicly invisible? Everything we know about
human nature indicates the latter choices are going to be true.

This is precisely why witch-hunt attitudes towards racism are counter-
productive. Racism isn't an evil or a personal shortcoming [1], it's a
consequence of unfortunate history combined with shortcomings in the way our
minds process social information. Treating racism as a kind of evil makes
communicating rationally about it impossible. And since it's a once huge
social factor asymptotically decaying, it's going to be all around us. It
would be much better for us as a society to be able to talk about it
rationally. That's not what the current social climate is conducive to,
however.

Basically, everyone's attitude towards stuff like racism and sexism should be
somewhat like the stance "Everyone Poops" takes towards, well, poop. It's not
the most pleasant thing in the world. It's just a consequence of where we came
from. The only difference is that there is hope that eventually we will
overcome it. (Well, maybe when people's minds are uploaded into computers, we
won't poop or judge people overwhelmingly on external morphology.)

EDIT: [1] - Harboring some racist attitudes or ideation is perfectly
understandable, but if you go and perpetrate some sort of crime or act of
cruelty as a result, this is certainly wrong. We all experience hate and
negative emotions. This doesn't excuse you from acting like a civilized human
being.

~~~
ruds
Where is the witch hunt here? Discovering discrimination, publicizing it, and
creating a dialogue are the only way it gets better. Ignoring it doesn't make
it go away.

~~~
stcredzero
Everything you list as actions are good, provided those are pursued with the
correct attitude. Please re-read and note that the _attitude_ one approaches
this with is my key point.

 _> Ignoring it doesn't make it go away._

How is advocating a less tense attitude towards the whole notion of racism to
achieve a calmer, more rational discussion of it, "ignoring?" Are you
emotionally invested in the notion that it should be punished? Or, did you
read some sort of meta-witch-hunt accusation into what I wrote?

------
bcoates
Skimming the paper, it doesn't look like she accounted for how common each
name is overall. Assume instantcheckmate, which runs the "arrest" themed
searches, uses a broader set of names than the competition, that runs generic
"looking for foo?" ads. If the black-identifying names are less common than
white-identifying ones, that would be enough to cause the correlation.

Do these scam sites even check public records? "The Cesspool of Online Ads is
Poisoning Online Ad Delivery" might be a better title.

~~~
astrodust
Someone's bought up keywords and created ads for them. This is not Google
being racist but some advertiser trying to narrow in on a particular
demographic that they think will be profitable.

Sadly, as the researcher purchased that information, I think it's working.

~~~
bcoates
The advertiser claims they're using a generic name list and applying all their
templates uniformly across the board.

They're professional liars, so take it with a grain of salt.

------
DanBC
The article doesn't mention a few points.

EDIT: my point 1 and 2 are dumb and clearly explained in the paper.

3) As I understand it the US has very many black people in prison. I've heard
a variety of stats; 1 in 3 black men are either in prison, on probation or on
parole. Wikipedia says that the US Bureau of Justice Statistics says that 39%
of the prison population is non-hispanic black (while the black including
hispanic population is just 13% of the US population.) That suggests that
people with a black name will need legal services more than someone with a
white name. The algorithm hasn't been tweaked by racists; the algorithm is
just responding to a racist society.

This post is not meant to bash the professor's work! I haven't read the paper
yet. I'm about to give it a read.

~~~
omonra
Your suggestion is logical. Or we could just ascribe things to racism. The
good professor thinks it's the latter.

~~~
DanBC
That's not the impression I get from reading the paper. I admit I'm lousy at
sociology stuff.

------
cschmidt
I've built gender identification models from first names, using census data.
In playing with that data, it seems to me that African American names have a
longer tail distribution. That is, the top 100 names cover a much smaller
fraction of the African American population. I'd be interested to see actual
data on that, but that is my semi-informed opinion.

Given that, those long tail names are going to be cheaper on Google ads. In my
experience, the headwords are always more expensive. Thus, if this website is
scooping up cheap traffic, that will tend to be biased toward "black sounding"
names.

Google isn't doing anything other than selling keywords.

------
finnw
I would like to know how the database of names was built.

That is, how would one qualify a name as "black-sounding" or "white-sounding"?
I hope it was not based only on intuition as that may give misleading results.
Is there a public data set somewhere that correlates given names with
ethnicity?

Edit: On second thoughts, it might be more interesting to perform similar
google search experiments with both intuitive _and_ empirical name/ethnicity
pairings and see if the results differ.

~~~
martey
The paper - <http://arxiv.org/abs/1301.6822> \- explains that the names were
taken from Bertrand and Mullainathan's 2003 study looking at racial
discrimination and names on resumes - <http://www.nber.org/papers/w9873> \-
and a study from Freaknomics -
[http://pricetheory.uchicago.edu/levitt/Papers/FryerLevitt200...](http://pricetheory.uchicago.edu/levitt/Papers/FryerLevitt2004.pdf)
. Both studies used actual names given to black and white children in
Massachusetts and California.

------
freshhawk
> they ought to be able to reason about the legal and social consequences of
> certain patterns of click-throughs

I've worked with the people who optimize things like this (although they never
thought of this particular one, probably a result of being Canadian)

They can and do reason about the legal and social consequences. They don't
give a shit as long as it makes money, and if there are legal consequences
then they hide behind anonymous proxies and vps payed for with prepaid credit
cards (hides them from google as well). Plenty of them make jokes about how
shady and dishonest their business is and then go to church every weekend with
their families and think nothing of it.

If they were forced to they would justify things like this by pointing out
that they are just optimizing keywords, if society is racist and the data
algorithms detect that then so be it. Mostly they would just not care and
continue to worry about a bad daily fluctuation wiping out a months profits or
google catching on to the blackhat tactics they use sometimes and shutting
their adwords accounts down.

There is literally zero chance of convincing people doing ad network arbitrage
to consider social consequences, and IANAL but the what are the legal
consequences of using automated keyword optimization tools that would, by
definition, reflect any biases of society at large?

------
gnu8
I'm inclined to think that the company running the ads typed in a bunch of
black sounding names as keywords for their ad campaign. Having run ads on
google before I believe that would produce the result observed. I have no
information that discounts the other possibilities though.

~~~
qeorge
I would agree with you, except that these look like remnant ads (i.e., no one
specifically bid on this keyword, which is common with names).

You see this with Amazon and eBay too - if you search Google for something
weird which no one has bid on, you'll often see "Find [bananaphone] on
eBay/Amazon". These "people search" results are similar: extremely long-tail,
very generic ad.

So, I think its probably algorithmic vs intentional. Could be ML that's
learned racism from the internet itself.

Who knows though! Your explanation would certainly make sense. Its a
fascinating problem.

~~~
jeremyjh
> Could be ML that's learned racism from the internet itself.

This was my first thought.

------
gyardley
This is likely happening because the public is more likely to click on an
arrest-related ad for a black-sounding name. Higher click-through rates mean
higher ad quality scores, which in turn mean lower minimum cost-per-click
rates, which in turn means instantcheckwhatever's bottom-feeding ads appear
more often. In other words, society is more interested in the arrest records
of people with black-sounding names, so Google adapts. The professor does
raise this as a possibility in her article.

But if that's the case, _why_ is society more interested in the arrest-records
of people with black-sounding names? Perhaps it's because in America, blacks
are disproportionately likely to have criminal records. (Some quick Googling -
in 2010, according to the census, blacks were 13.6% of the population, but in
2009, according to the FBI, 28.3% of arrests were of a black person.)

I'm not making any judgements about black people by quoting these statistics -
perhaps these arrest rates reflect institutional racism, or disproportionate
levels of poverty, or lack of access to opportunity. But I'd rather the
professor focus her time on correcting _that_ disparity instead of trying to
make Google's AdWords algorithm correspond to something other than the
interests of the public.

~~~
Anechoic
_But I'd rather the professor focus her time on correcting that disparity_

But it's all circular. If society associates "black-sounding" names with
criminal records, that leads to employment discrimination
(<http://www.nber.org/papers/w9873>) which results in "disproportionate levels
of poverty, or lack of access to opportunity." It's all related and it all
needs to be at least understood.

~~~
gyardley
Yes, you're absolutely right - it does need to be understood.

------
darkxanthos
Referencing the below data... a rough estimate is black people are 2.3x more
likely to be arrested. Given that it would make sense though it is REALLY
awkward as a company practice. Not sure how I feel about that as advertising
can also shape/reinforce reality.

Quick data: White Black Arrested 7389208 3027153 10416361 Not Arrested
216164057 35902166 252066223 223553265 38929319

References:
[http://en.wikipedia.org/wiki/Demographics_of_the_United_Stat...](http://en.wikipedia.org/wiki/Demographics_of_the_United_States#Race_and_ethnicity)
[http://www.census.gov/compendia/statab/2012/tables/12s0325.p...](http://www.census.gov/compendia/statab/2012/tables/12s0325.pdf)

~~~
brixon
Careful with those two sets of data. One includes Hispanic or Latino separated
and the other does not. Hispanic or Latino is a larger minority than the black
population.

Targeted advertising is nothing new. Do you think you will see the same
advertising on BET as you will on TCM? They are going to use what data is on
hand to focus ads. Most ads don't really accurately target most people, but
enough do to keep them advertising. If it did not work then they would not do
it.

Most ads on Facebook want me to get an MBA online.... I already have one, but
they have to work with the data they have available.

~~~
darkxanthos
Great catch brixon. And exactly. There comes a point where for whatever
reason, race, gender, or other physical attributes become great predictors for
a market _cough_ etsy _cough_. There's one great reason they're (thankfully)
supporting a ton more female development... That's their main consumer
demographic.

------
bo1024
Am I missing something here? It sounds like what is going on is that some
sketchy company (instantcheckmate.com) is willing to pay more than their
competitors for Google ad slots for certain names. They may even be getting
positive feedback in the machine learning sense from the scare tactic effect,
getting people who pay to sign up just to see if their name is associated with
an arrest record (as Dr. Sweeney did).

Of course media sites always overhype research with incredible titles, but I
can't see that this tells us much about online ad delivery in general ...
though it perhaps raises interesting (unanswered) questions about Google's ad
delivery....

------
jessaustin
_If the algorithms behind Adsense can reason about maximising [sic] revenues,
[Sweeney] says they ought to be able to reason about the legal and social
consequences of certain patter[n]s of click-throughs._

What?!? Does not follow!

~~~
yid
Doesn't follow because Adsense isn't "reasoning", it's optimizing an ill-
defined objective function. Optimization isn't the same as reasoning.

------
gfodor
I think it's funny that this inference based approach to discovering "racism"
is, in its own way, "racist" itself. The only thing this ad tells you is that
people searching for "Latanya Sweeney" seem to warrant advertisers purchasing
ad space relevant for people who have been arrested. It says nothing directly
about the race of the person doing the search.

 _That_ takes a leap that is based upon correlations: that there is a
correlation between someone searching for "Latanya Sweeny" and them being
black, based upon historical birth records. Of course, it's this same type of
blind correlation-based thinking that results in racism. Swap out "birth name"
with some other less appealing attribute and "black" with your race, ethnic
group, or other group of choice and you have a textbook example of racist
thinking. It doesn't exactly serve their point well to use the same mechanism
which brings about racism as a means to make an argument.

~~~
wmf
I think the complaint is that such ads discriminate against black people who
are being searched for, not people doing the searching.

~~~
gfodor
Fair enough -- but that has nothing to do with race. The authors discovered a
correlation there but that's not the point. If it turned out Foo Bar was a
name that happened to correlate highly with people who would pay for arrest-
related services, then advertisers would fill in the gap.

So sure, should there be mechanisms in place that can prevent "libel" in
Google because of correlations between names and less-than-savory behavior
(regardless of race)? Yes, of course. If my last name is Manson it is probably
a bit unfair to me as a person if Google is advertising stuff to people
searching for me as if I am associated with a famous serial murderer, too. But
the implication that this is all about race and is some systemic racist thing
is just data mining and likely the authors projecting their own biases. I'm
sure you could find all kinds of interesting clusters of names that have
certain negative advertising, and only a subset of them would be clusters that
highly corresponded to race or ethnic groups.

------
incision
I'm inclined to go with the "more insidious explanation" that "the results
merely reflect the discriminatory pattern of clicks from ordinary people".
Though, I don't think it's necessarily discriminatory.

On the topic of *-sounding names...

I've worked at least one technical job where having having a "black" or even
just generally "American" name would send a job application straight to the
trash.

However, I'd expect any established employer to have an automatic background
check / verification system in place, so a possibly suggestive Google search
wouldn't be particularly relevant.

I'm thinking prospective dates are more likely to be Googling names than
employers.

~~~
Anechoic
_I've worked at least one technical job where having having a "black" or even
just generally "American" name would send a job application straight to the
trash._

Relevant: "Are Emily and Brendan More Employable than Lakisha and Jamal?"
<http://www.chicagobooth.edu/pdf/bertrand.pdf>

edit: I see martey beat me to it.

------
danso
_edit: I was too brief in the intro here; this comment is meant to provide
additional context on how others (namely, Reuters) have interpreted Sweeney's
work and is not meant as a judgment of the actual study itself._

It's funny ( _"funny", as in, it was a confusing coincidence, not as in, the
study is suspicious_ ) that the OP mentions the ad-delivery on Reuters.com.

I tried it out for myself using the professor's name and got this massive
correction note to a December Reuters story involving her study (the
correction was so major that the story has been removed from Reuters archive:

[http://www.reuters.com/article/2012/12/13/us-usa-internet-
pr...](http://www.reuters.com/article/2012/12/13/us-usa-internet-profiling-
idUSBRE8BC19S20121213)

 _(Reuters) - Please be advised that a November 25 article reporting that
Instantcheckmate.com's advertising relies on racial profiling has been
withdrawn. The story, "Professor finds profiling in ads for personal data
website," contains errors.

The headline of the article and the article itself incorrectly assert that
Harvard Professor Latanya Sweeney's research showed that Instantcheckmate.com,
an online background research website, had engaged in racial profiling in its
advertisements.

Sweeney says the preliminary results of the research found "significant
discrimination" in Instantcheckmate.com's online ad search results, but were
insufficient for the article's assertion of deliberate racial profiling by
Instant Checkmate. Her research is ongoing. Instant Checkmate denies any such
activity, which it describes as being at odds with the company's values. The
company says further that it hasn't seen Sweeney's research.

There will be no substitute story._

\---

This doesn't have any bearing on the legitimacy of the OP's summation of Ms.
Sweeney's work, just that her work has been written about before, and
apparently, easily misinterpreted by the media.

edit: If you want to read the pulled-Reuters story, this appears to be a copy
of it:

[http://technewthings.blogspot.com/2012/11/reuters-
technology...](http://technewthings.blogspot.com/2012/11/reuters-technology-
news-professor-finds.html)

The Reuters story focused more on Instantcheckmate.com's practices and
apparently made too strong of a conclusion. Strangely, the author of the
piece, a Reuters corespondent, is also a Harvard fellow who is collaborating
with Sweeney for a book.

~~~
pekk
A quick reading of your post suggests that there is some reason to dismiss
Prof. Sweeney's findings as discussed in the posted article (massive
correction, easily misinterpreted, etc.) However, this is not the case. It is
very clear from OP that unless Prof. Sweeney has falsified her data or
described the findings incorrectly - something you have managed to imply
deniably without actually showing it - the site is in fact targeting the
'arrest' variant of the ads at queries with black names in them.

~~~
Anechoic
Read danso's post again, the problem isn't with Prof. Sweeney's research, the
problem is with the Reuters story that jumped the gun and came to a conclusion
that Sweeney hadn't. As a result, the correction note was attached to the
_Reuters article_ not Sweeney's research.

(I had the same reaction you did when first reading the post, but when I re-
read it, I realized what it was saying. The post could have been clearer
though.)

------
jiggy2011
This is more a problem with name collisions that "racism" per se. I some white
people who share a name with a known criminal, google searching their names
comes up with similar results. I also know people who share relatively rare
names with well known celebrities which makes it difficult to get their own
content to come up in google searches at all.

So this is probably just a reflection of black sounding names to be
statistically more likely to be shared with criminals. In the same way that
googleing for "teen girls" has a high likelyhood of returning porn.

------
drpgq
Would having such ads in a magazine like Jet or Ebony be racism too? I've
worked on ethnicity detection from faces and this is just the tip of the
iceberg.

------
hugh4life
Can someone please explain to me the scientific(not political) justification
for racial egalitarianism?

I'm sorry, but I look at you people like you people look at young earth
creationists.

~~~
anigbrowl
The repeated falsification of various racist scientific theories through out
history: <http://en.wikipedia.org/wiki/Scientific_racism>

It's not clear from your comment what you consider to be an accurate
descriptor of reality, so perhaps you'd do better to offer your own position
rather than leading off with a pronouncement that everyone else ('you people')
is wrong.

------
abraininavat
Racism is based, in part, on the fact that our brain's pattern recognition and
extrapolation faculties are often too simplistic and shallow to see through
correlation and make real judgments about causation.

Google's AdSense is nothing more than a pattern matcher, and it is (of course)
fundamentally simplistic and stupid when compared to a human. To ask it not to
be racist is to ask it to be smarter than ourselves. No doubt some of Dr.
Sweeney's colleagues in the Computer Science department at Harvard are working
on that very thing -- she should take it up with them.

~~~
jiggy2011
This is precisely it. The conventional advice goes along the lines of "treat
each person on their individual merits and characteristics". This scales fine
when you only deal with a handful of people on a day to day basis.

As we do more stuff algorithmically and globally this approach does not scale
at all. This can cause technology to re-enforce our biases no matter how
subtle.

