This is one of those cases where I find myself wanting the professor to first sit down and very precisely say what they think "non-discriminatory" would mean in this context. The devil is in the details. Just flinging around wild accusations of racism at Google, this company, and finally society as a whole, without giving any action items on how one would discharge the accusation of racism is just being mean, hitting people with a very big stick without giving them any chance to dodge.
(Perhaps she does somewhere, however, I will freely admit my Bayesian priors on that probability suggest it is not worth my time to try to find it.)
I'm not demanding this to the nth degree, obviously; I don't expect her to submit a working patch to Google's results engine. But what amounts to a vague wave in the direction that can't even be nailed beyond "society" is not helpful to anyone, just an incendiary attack.
Sweeney says there are essentially three possibilities. One is that www.instantcheckmate.com has set up the arrest-mentioning ads to be served up to black identifying names. Another is that Google has somehow biased its ad serving mechanism in this way.
A more insidious explanation is that society as a whole is to blame. If Google’s Adsense service learns which ad combinations are more effective, it would first serve the arrest-related ads to all names at random. But this would change if it were to discover that click-throughs are more likely when these ads are served against a black-identifying name. In other words, the results merely reflect the discriminatory pattern of clicks from ordinary people.
Keep in mind that the professor did not write this article, which buries her methods and conclusions. In her actual paper (http://arxiv.org/abs/1301.6822) she has possible solutions in the conclusions.
Suppose you'd never heard of this study, and some conservative institution announced a finding that names associated with males (Steve, Darnell) were more likely to bring up ads mentioning arrest records than names associated with females (Emily, Latisha). Suppose the paper announcing this discovery was full of terms like "discrimination" and "fairness," and speculated about who or what was to "blame" for treating men in this discriminatory way.
Would the politically enlightened consider this increased association of men with arrest records (vs women) to be unfair bias against men? Would there be a call for a followup to determine whether it was a biased algorithm, a discriminatory corporation, or our entire anti-male sexist society that was to blame? Might they shrug it off, placing most of the blame, not on the ad system, but on the statistical behavior of men themselves?
If there is, as I suspect, a similar "bias" against men in this ad system, then any discussion of how to "fix" it should not address the question of black/white names. It ought to deal with the larger question of what to do when the data show that the probability of X given name 'A' is greater than the probability of X given name 'B'.
Thank you for the link. The conclusions seem to me to be accurately relayed by the first article, and my priors were accurate. There is still not a statement that I see of what the desired outcome should be.
Let me go ahead and spell out what I mean. Should the Google search results give the exact same % of results for each race? Should it accurately reflect the % of searches? Should it accurately reflect the % of ads put in? (Those are all 3 very different things.) If so, should Google tweak the percentage of those ads? If so, in what manner? If a combination, which combination?
The implication in the conclusion is that first one, but in that case, why does that one override the others? Might in fact be racist to hide the incidence of these searches, based on race? A case could be made for that, after all.
These are rich and interesting questions, and it does not simply go without saying what the desired outcome is.
The null hypothesis is described at the beginning of page 4, under the heading 'problem statement': Our hypothesis: no difference exists in the delivery of ads suggestive of an arrest record responding to online searches of racially associated names. Then, when presented with evidence of a pattern to the contrary, examine the pattern’s credibility, likelihood and circumstances of occurring. This hypothesis is briefly restated in the conclusion.
Methods are spelled out beginning on page 10, observations and how they compare to specific expectations are spelled out beginning on page 20. I can only infer that you didn't bother to read the paper.
Oh rubbish. She's demonstrated the bias in ad delivery exists and is unlikely to be the result of chance, identified possible vectors, and observed that there's no way of knowing where the bias originates at present. The paper  is cautious and methodical.
It is you who is 'just flinging around wild accusations'. Read the paper and critique it on its merits, by all means. As it is, your little rant tells us far more about you than its subject.
> there's no way of knowing where the bias originates
Just off the top of my head, why not go make a Google Ads account and bid on keywords that sound like black names, and then bid on keywords that sound like white names (according to the professor's criteria)? Wouldn't the price differences show that there is greater demand for one set or another?
My hypothesis would be that the discrimination happening here is simply happening in the ad marketplace, and that while it may reveal something unpleasant about our culture which we would like to see changed, doesn't lend itself to prescriptive intervention-type remedies. I'd like to believe that it may only be revealing that when a bunch of white marketing guys buy adwords in bulk they aren't cognizant of their list of "common names" being essentially "white names," and the arrest record websites caught on to this fact first.
> Just flinging around wild accusations of racism at Google, this company, and finally society as a whole, without giving any action items on how one would discharge the accusation of racism is just being mean, hitting people with a very big stick without giving them any chance to dodge.
It should be dealt with rationally and clinically, with the idea that everyone is probably affected by it, and probably embodies some of it. Treating racism as something wrong is like treating lust as something wrong. It makes it impossible to talk about things that would be awkward to talk about even without such strong emotional reactions attached to them. Just as lust is simply a consequence of our biology, racism is an unfortunate consequence of our psychology and our history.
We are all creatures "of our time." Our only hope of transcending that is to admit of the possibility and freely question ourselves.
Yes, it's definitely way worse to be called racist (it's just being mean!) than to be a victim of racism.
She identifies possible mechanisms for the discrimination; presumably the desired remedy would differ based on which mechanism is actually responsible. This paper is step one: tell people who may not know that the discrimination is occurring.
I have spent quite a bit of time working with arrest and incarceration data that has names attached. Rates of arrest and incarceration are not precisely correlated to demographics, in the sense that certain "races" (as defined by the government) are over- and under-represented in the criminal justice system, for a variety of reasons. Some of this divergence may be explained by systemic racism, some may be explained by confounding variables (poverty, education), etc.
All that said, my point is that if one bought ads with keywords for 100% of the unique first name-last name pairs found among prisoners/arrestees, "stereotypically black" names would be over-represented relative to the total population and relative to the internet-using population.
In other words, in my view, this is a symptom of the criminal justice system, not a racist policy choice by Google's algorithm. Google's algorithm sees only one color: green.
I am a bit disappointed by the comments here. Several of them ask questions that are clearly explained in the actual paper - http://arxiv.org/abs/1301.6822 - which is linked at the bottom of the article (albeit broken because of a spurious colon). Some of them even ask questions that are explained in the abstract.
edit: I was too brief in the intro here; this comment is meant to provide additional context on how others (namely, Reuters) have interpreted Sweeney's work and is not meant as a judgment of the actual study itself.
It's funny ("funny", as in, it was a confusing coincidence, not as in, the study is suspicious) that the OP mentions the ad-delivery on Reuters.com.
I tried it out for myself using the professor's name and got this massive correction note to a December Reuters story involving her study (the correction was so major that the story has been removed from Reuters archive:
(Reuters) - Please be advised that a November 25 article reporting that Instantcheckmate.com's advertising relies on racial profiling has been withdrawn. The story, "Professor finds profiling in ads for personal data website," contains errors.
The headline of the article and the article itself incorrectly assert that Harvard Professor Latanya Sweeney's research showed that Instantcheckmate.com, an online background research website, had engaged in racial profiling in its advertisements.
Sweeney says the preliminary results of the research found "significant discrimination" in Instantcheckmate.com's online ad search results, but were insufficient for the article's assertion of deliberate racial profiling by Instant Checkmate. Her research is ongoing. Instant Checkmate denies any such activity, which it describes as being at odds with the company's values. The company says further that it hasn't seen Sweeney's research.
There will be no substitute story.
This doesn't have any bearing on the legitimacy of the OP's summation of Ms. Sweeney's work, just that her work has been written about before, and apparently, easily misinterpreted by the media.
edit: If you want to read the pulled-Reuters story, this appears to be a copy of it:
The Reuters story focused more on Instantcheckmate.com's practices and apparently made too strong of a conclusion. Strangely, the author of the piece, a Reuters corespondent, is also a Harvard fellow who is collaborating with Sweeney for a book.
A quick reading of your post suggests that there is some reason to dismiss Prof. Sweeney's findings as discussed in the posted article (massive correction, easily misinterpreted, etc.) However, this is not the case. It is very clear from OP that unless Prof. Sweeney has falsified her data or described the findings incorrectly - something you have managed to imply deniably without actually showing it - the site is in fact targeting the 'arrest' variant of the ads at queries with black names in them.
Read danso's post again, the problem isn't with Prof. Sweeney's research, the problem is with the Reuters story that jumped the gun and came to a conclusion that Sweeney hadn't. As a result, the correction note was attached to the Reuters article not Sweeney's research.
(I had the same reaction you did when first reading the post, but when I re-read it, I realized what it was saying. The post could have been clearer though.)
I can see why you'd infer that from my wording, but FWIW, I'm not intending to imply anything against Prof. Sweeney's work as I've not yet read the actual study. I was posting what I had found when trying the searches myself; for some reason, Google isn't serving me any ads...and Reuters served me what I had posted above.
If anything, this post is directed at commenters who have reflexively dismissed the study because they dislike the OP's summation. Just because the article summarizing the study may jump to a contentious conclusion does not mean that the study itself did. The fact that Reuters apparently messed up here is just an example of how a study with possible controversial implications can be misread by someone trying to write about it.
Skimming the paper, it doesn't look like she accounted for how common each name is overall. Assume instantcheckmate, which runs the "arrest" themed searches, uses a broader set of names than the competition, that runs generic "looking for foo?" ads. If the black-identifying names are less common than white-identifying ones, that would be enough to cause the correlation.
Do these scam sites even check public records? "The Cesspool of Online Ads is Poisoning Online Ad Delivery" might be a better title.
So, after a century of entire societies based on race-based slavery, is legislative and judicial reform going to completely wipe out racism, or is it more likely to be subject to some sort of asymptotic decay? Is everyone going to change their minds instantly about racism, or will it persist in the minds of large numbers of people? Is racism likely to disappear with everyone's logical and rational acknowledgement, or is it more likely to to persist in forms which are deniable and publicly invisible? Everything we know about human nature indicates the latter choices are going to be true.
This is precisely why witch-hunt attitudes towards racism are counter-productive. Racism isn't an evil or a personal shortcoming , it's a consequence of unfortunate history combined with shortcomings in the way our minds process social information. Treating racism as a kind of evil makes communicating rationally about it impossible. And since it's a once huge social factor asymptotically decaying, it's going to be all around us. It would be much better for us as a society to be able to talk about it rationally. That's not what the current social climate is conducive to, however.
Basically, everyone's attitude towards stuff like racism and sexism should be somewhat like the stance "Everyone Poops" takes towards, well, poop. It's not the most pleasant thing in the world. It's just a consequence of where we came from. The only difference is that there is hope that eventually we will overcome it. (Well, maybe when people's minds are uploaded into computers, we won't poop or judge people overwhelmingly on external morphology.)
EDIT:  - Harboring some racist attitudes or ideation is perfectly understandable, but if you go and perpetrate some sort of crime or act of cruelty as a result, this is certainly wrong. We all experience hate and negative emotions. This doesn't excuse you from acting like a civilized human being.
Everything you list as actions are good, provided those are pursued with the correct attitude. Please re-read and note that the attitude one approaches this with is my key point.
> Ignoring it doesn't make it go away.
How is advocating a less tense attitude towards the whole notion of racism to achieve a calmer, more rational discussion of it, "ignoring?" Are you emotionally invested in the notion that it should be punished? Or, did you read some sort of meta-witch-hunt accusation into what I wrote?
EDIT: my point 1 and 2 are dumb and clearly explained in the paper.
3) As I understand it the US has very many black people in prison. I've heard a variety of stats; 1 in 3 black men are either in prison, on probation or on parole. Wikipedia says that the US Bureau of Justice Statistics says that 39% of the prison population is non-hispanic black (while the black including hispanic population is just 13% of the US population.) That suggests that people with a black name will need legal services more than someone with a white name. The algorithm hasn't been tweaked by racists; the algorithm is just responding to a racist society.
This post is not meant to bash the professor's work! I haven't read the paper yet. I'm about to give it a read.
I've built gender identification models from first names, using census data. In playing with that data, it seems to me that African American names have a longer tail distribution. That is, the top 100 names cover a much smaller fraction of the African American population. I'd be interested to see actual data on that, but that is my semi-informed opinion.
Given that, those long tail names are going to be cheaper on Google ads. In my experience, the headwords are always more expensive. Thus, if this website is scooping up cheap traffic, that will tend to be biased toward "black sounding" names.
Google isn't doing anything other than selling keywords.
I would like to know how the database of names was built.
That is, how would one qualify a name as "black-sounding" or "white-sounding"? I hope it was not based only on intuition as that may give misleading results.
Is there a public data set somewhere that correlates given names with ethnicity?
Edit: On second thoughts, it might be more interesting to perform similar google search experiments with both intuitive and empirical name/ethnicity pairings and see if the results differ.
I'm inclined to think that the company running the ads typed in a bunch of black sounding names as keywords for their ad campaign. Having run ads on google before I believe that would produce the result observed. I have no information that discounts the other possibilities though.
I would agree with you, except that these look like remnant ads (i.e., no one specifically bid on this keyword, which is common with names).
You see this with Amazon and eBay too - if you search Google for something weird which no one has bid on, you'll often see "Find [bananaphone] on eBay/Amazon". These "people search" results are similar: extremely long-tail, very generic ad.
So, I think its probably algorithmic vs intentional. Could be ML that's learned racism from the internet itself.
Who knows though! Your explanation would certainly make sense. Its a fascinating problem.
This reeks of affiliate marketing. Background check websites are big business among affiliate marketers (https://www.google.com/search?q=background+check+affiliate). Advertisers only pay for clicks, so creating 10,000 campaigns targeting first names is basically free. The CPC on most first names is nearly $0.00, too.
Google is involved to the extent that they serve ads in response to specific keywords. Instant Checkmate is involved only to the extent that they ignore the source of their referrals. Someone with a little more free time could probably trace HTTP traffic to find the specific affiliate responsible.
> they ought to be able to reason about the legal and social consequences of certain patterns of click-throughs
I've worked with the people who optimize things like this (although they never thought of this particular one, probably a result of being Canadian)
They can and do reason about the legal and social consequences. They don't give a shit as long as it makes money, and if there are legal consequences then they hide behind anonymous proxies and vps payed for with prepaid credit cards (hides them from google as well). Plenty of them make jokes about how shady and dishonest their business is and then go to church every weekend with their families and think nothing of it.
If they were forced to they would justify things like this by pointing out that they are just optimizing keywords, if society is racist and the data algorithms detect that then so be it. Mostly they would just not care and continue to worry about a bad daily fluctuation wiping out a months profits or google catching on to the blackhat tactics they use sometimes and shutting their adwords accounts down.
There is literally zero chance of convincing people doing ad network arbitrage to consider social consequences, and IANAL but the what are the legal consequences of using automated keyword optimization tools that would, by definition, reflect any biases of society at large?
This is likely happening because the public is more likely to click on an arrest-related ad for a black-sounding name. Higher click-through rates mean higher ad quality scores, which in turn mean lower minimum cost-per-click rates, which in turn means instantcheckwhatever's bottom-feeding ads appear more often. In other words, society is more interested in the arrest records of people with black-sounding names, so Google adapts. The professor does raise this as a possibility in her article.
But if that's the case, why is society more interested in the arrest-records of people with black-sounding names? Perhaps it's because in America, blacks are disproportionately likely to have criminal records. (Some quick Googling - in 2010, according to the census, blacks were 13.6% of the population, but in 2009, according to the FBI, 28.3% of arrests were of a black person.)
I'm not making any judgements about black people by quoting these statistics - perhaps these arrest rates reflect institutional racism, or disproportionate levels of poverty, or lack of access to opportunity. But I'd rather the professor focus her time on correcting that disparity instead of trying to make Google's AdWords algorithm correspond to something other than the interests of the public.
But I'd rather the professor focus her time on correcting that disparity
But it's all circular. If society associates "black-sounding" names with criminal records, that leads to employment discrimination (http://www.nber.org/papers/w9873) which results in "disproportionate levels of poverty, or lack of access to opportunity." It's all related and it all needs to be at least understood.
Referencing the below data... a rough estimate is black people are 2.3x more likely to be arrested. Given that it would make sense though it is REALLY awkward as a company practice. Not sure how I feel about that as advertising can also shape/reinforce reality.
Arrested 7389208 3027153 10416361
Not Arrested 216164057 35902166 252066223
Careful with those two sets of data. One includes Hispanic or Latino separated and the other does not. Hispanic or Latino is a larger minority than the black population.
Targeted advertising is nothing new. Do you think you will see the same advertising on BET as you will on TCM? They are going to use what data is on hand to focus ads. Most ads don't really accurately target most people, but enough do to keep them advertising. If it did not work then they would not do it.
Most ads on Facebook want me to get an MBA online.... I already have one, but they have to work with the data they have available.
Great catch brixon. And exactly. There comes a point where for whatever reason, race, gender, or other physical attributes become great predictors for a market coughetsycough. There's one great reason they're (thankfully) supporting a ton more female development... That's their main consumer demographic.
Am I missing something here? It sounds like what is going on is that some sketchy company (instantcheckmate.com) is willing to pay more than their competitors for Google ad slots for certain names. They may even be getting positive feedback in the machine learning sense from the scare tactic effect, getting people who pay to sign up just to see if their name is associated with an arrest record (as Dr. Sweeney did).
Of course media sites always overhype research with incredible titles, but I can't see that this tells us much about online ad delivery in general ... though it perhaps raises interesting (unanswered) questions about Google's ad delivery....
If the algorithms behind Adsense can reason about maximising [sic] revenues, [Sweeney] says they ought to be able to reason about the legal and social consequences of certain patter[n]s of click-throughs.
I think it's funny that this inference based approach to discovering "racism" is, in its own way, "racist" itself. The only thing this ad tells you is that people searching for "Latanya Sweeney" seem to warrant advertisers purchasing ad space relevant for people who have been arrested. It says nothing directly about the race of the person doing the search.
That takes a leap that is based upon correlations: that there is a correlation between someone searching for "Latanya Sweeny" and them being black, based upon historical birth records. Of course, it's this same type of blind correlation-based thinking that results in racism. Swap out "birth name" with some other less appealing attribute and "black" with your race, ethnic group, or other group of choice and you have a textbook example of racist thinking. It doesn't exactly serve their point well to use the same mechanism which brings about racism as a means to make an argument.
Fair enough -- but that has nothing to do with race. The authors discovered a correlation there but that's not the point. If it turned out Foo Bar was a name that happened to correlate highly with people who would pay for arrest-related services, then advertisers would fill in the gap.
So sure, should there be mechanisms in place that can prevent "libel" in Google because of correlations between names and less-than-savory behavior (regardless of race)? Yes, of course. If my last name is Manson it is probably a bit unfair to me as a person if Google is advertising stuff to people searching for me as if I am associated with a famous serial murderer, too. But the implication that this is all about race and is some systemic racist thing is just data mining and likely the authors projecting their own biases. I'm sure you could find all kinds of interesting clusters of names that have certain negative advertising, and only a subset of them would be clusters that highly corresponded to race or ethnic groups.
I'm inclined to go with the "more insidious explanation" that "the results merely reflect the discriminatory pattern of clicks from ordinary people". Though, I don't think it's necessarily discriminatory.
On the topic of *-sounding names...
I've worked at least one technical job where having having a "black" or even just generally "American" name would send a job application straight to the trash.
However, I'd expect any established employer to have an automatic background check / verification system in place, so a possibly suggestive Google search wouldn't be particularly relevant.
I'm thinking prospective dates are more likely to be Googling names than employers.
This is more a problem with name collisions that "racism" per se. I some white people who share a name with a known criminal, google searching their names comes up with similar results. I also know people who share relatively rare names with well known celebrities which makes it difficult to get their own content to come up in google searches at all.
So this is probably just a reflection of black sounding names to be statistically more likely to be shared with criminals. In the same way that googleing for "teen girls" has a high likelyhood of returning porn.
It's not clear from your comment what you consider to be an accurate descriptor of reality, so perhaps you'd do better to offer your own position rather than leading off with a pronouncement that everyone else ('you people') is wrong.
Racism is based, in part, on the fact that our brain's pattern recognition and extrapolation faculties are often too simplistic and shallow to see through correlation and make real judgments about causation.
Google's AdSense is nothing more than a pattern matcher, and it is (of course) fundamentally simplistic and stupid when compared to a human. To ask it not to be racist is to ask it to be smarter than ourselves. No doubt some of Dr. Sweeney's colleagues in the Computer Science department at Harvard are working on that very thing -- she should take it up with them.
This is precisely it. The conventional advice goes along the lines of "treat each person on their individual merits and characteristics". This scales fine when you only deal with a handful of people on a day to day basis.
As we do more stuff algorithmically and globally this approach does not scale at all. This can cause technology to re-enforce our biases no matter how subtle.