
Amazon scraps secret AI recruiting tool that showed bias against women - wyldfire
https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
======
fuscy
The eye opening thing here is not that the AI failed, but why it failed.

At start the AI is like a baby, it doesn't know anything or have any opinions.
By teaching it using a set of data, in this case a set of resumes and the
outcome then it can form an opinion.

The AI becoming biased tells that the "teacher" was biased also. So actually
Amazon's recruiting process seems to be a mess with the technical skills on
the resume amounting to zilch, gender and the aggressiveness of the resume's
language being the most important (because that's how the human recruiters
actually hired people when someone put a resume).

The number of women and men in the data set shouldn't matter (algorithms learn
that even if there was 1 woman, if she was hired then it will be positive
about future woman candidates). What matters is the rejection rate which it
learned from the data.. The hiring process is inherently biased against women.

Technically one could say that the AI was successful because it emulated the
current Amazon hiring status.

~~~
gambler
The article didn't specify how they labeled resumes for training. You're
assuming that it was based on whether or not the candidate was hire. Nobody
with an iota of experience in machine learning would do something like that.
(For obvious reasons: you can't tell from your data whether people you did not
hire were truly bad.)

A far more reasonable way would be to take resumes of people who were hired
and train the model based on their performance. For example, you could rate
resumes of people who promptly quit or got fired as less attractive than
resumes of people who stayed with the company for a long time. You could also
factor in performance reviews.

It is entirely possible that such model would search for people who aren't
usually preferred. E.g. if your recruiters are biased against Ph.D.'s, but you
have _some_ Ph.D.'s and they're highly productive, the algorithm could pick
this up and rate Ph.D. resumes higher.

Now, you still wouldn't know anything about people whom you didn't hire. This
means there is some possibility your employees are not representative of
general population and your model would be biased because of _that_.

Let's say your recruiters _are_ biased against Ph.D.'s and so they undergo
extra scrutiny. You only hire candidates with a doctoral degree if they are
amazing. This means within your company a doctoral degree is a good predictor
of success, but in the world at large it could be a bad criteria to use.

~~~
jonny_eh
Men are promoted quicker, and more often, than women.

~~~
deegles
There was a company meeting one year at Amazon when they proudly announced
that men and women were paid within 1-2% of each other for the same roles. It
completely missed the point which you raise.

I want to see reports of average tenure and time between promotions by gender.
I suspect that the reason we don't see those published is that the numbers are
damning.

~~~
zaarn
Or possibly noone did a study of sufficient size that passed peer review.

It's also not hard to make the pay gap 1-2% just like it's not hard to make it
25% (both values are valid). Statistics is a fun field. Don't trust statistics
you didn't fake yourself.

Amazon could easily cook the numbers to get to 1-2%, I doubt anyone checked
the process of determining that number if it's unbiased and fair and accounts
for other factors or not.

------
HashHishBang
Hold on here. This article seems to have buried a pretty important piece of
information wayyy down in the middle of the text.

> Gender bias was not the only issue. Problems with the data that underpinned
> the models’ judgments meant that unqualified candidates were often
> recommended for all manner of jobs, the people said. With the technology
> returning results almost at random, Amazon shut down the project, they said.

Granted an article isn't going to get as much attention without an attractive
headline but that seems a far more likely reason to have an AI based
recruiting recommendation scrapped. The discovery of a negative weight
associated with "women's" or graduates of two unnamed women's colleges is
notable but if it's tossing out results "almost at random" then...well there
seems to be bigger problems?

~~~
user812
The media is no longer reporting things. You can't make money with reporting.
The media is actively creating narratives, and one of the narratives that
people are fed nowadays is that women are victims.

Men and women are pitted against each other.

Due to the way the media has evolved people consume their own biases and most
often just read the headlines.

~~~
HashHishBang
I mean, yeah you're not wrong. I try not to be too cynical about the whole
thing even if I think the narrative is suspect. Yes women and minority
representation in tech is a potential issue but I really want to know more
about the AI recommendation system for potential hires. Especially if it was
giving out spurious recommendations.

It's amazon, I can't imagine how many millions went into something like that.
We'll almost certainly not get a postmortem but it's definitely intriguing.

------
jkingsbery
(Disclaimer: I am an Amazon employee sharing his own experience, but do not
speak in any official capacity for Amazon. I don't know anything about the
system mentioned in this article.)

I am a frequent interviewer for engineering roles at Amazon. As part of the
interview training and other forums, we often discuss the importance of
removing bias, looking out for unconscious bias, and so on. The recruiters I
know at Amazon all take reaching out to historically under-represented groups
seriously.

I don't know anything about the system described in the article (even that we
had such a system), but if it was introducing bias I'm glad it's being
shelved. Hopefully this article doesn't discourage people from applying to
work at Amazon - I've found it a good place to work.

To say something about the AI/ML aspect of the article: I think as engineers
our instinct is "Here's some data that's been classified for me, I can use
ML/AI on it!" without thinking through all that follows, including doing
quality assurance. I think a lot of focus in ML (at least in what I've read)
has been on generating models, and not nearly enough focus has been on
generating models that interpretable (i.e., give a reason along with a
classification).

~~~
darawk
It seems like they did think it through, though? And that's why it's being
shelved. I don't really see what the story is here. It seems like the whole
process worked exactly as it should - Amazon tried something, it had some
unintended consequences, they caught it, and shelved it.

~~~
weliketocode
Agreed. There is no story here.

~~~
hannasanarion
The story is, some ML researchers did their job properly and detected ethical
issues before they became a problem. That's more rare than you'd think.

~~~
rrcaptain
Yes. Software engineers taking ethics seriously and not letting technical
enthusiasm blind them is news, not normality.

------
macinjosh
This will be unpopular but I don't care. What is the evidence that the source
data for this 'AI' is biased because the men it came from did not want to hire
women? Is there a reserve of unemployed non-male engineers out there? If so
what evidence is there of that?

Technical talent is both expensive and a rare commodity for tech companies.
The non-male engineers I've worked with have always been exceedingly
competent, smart, and their differing perspectives invaluable. If there was an
untapped market of engineers you'd better believe every tech company would be
taking advantage of it.

~~~
petsormeat
Oh, there are a bunch of us, even here in the SF Bay Area. Trouble is, we're
older than 35, or don't have degrees from "top" schools, and/or don't have the
"passion" for bizarre extended hiring rituals. I could staff an entire dev
team with non-male people within a week.

~~~
jhfhhhf
So what do you do now? Btw some men in tech are also over 35 and tired of
hiring rituals.

~~~
petsormeat
After 15 years of front-end dev, I now work in retail. Some of my other peers
are scraping by with Uber/Lyft. Some are muddling through as housewives or
substitute teaching.

And, yes, Bay Area tech hiring is needlessly hostile for men over a certain
age as well.

~~~
jhfhhhf
Ageism, OK - but still I find it hard to believe that you can't find a job if
you can code. Maybe competition or demands are especially high in the Bay
area?

------
gfodor
This is a direct and clear example of bias which made it easy to flag the ML
algorithm. But what about ML algorithms that are inducing benefits to groups
in less obvious contexts? What about groups that are not so easily identified
as being protected classes by simple, human-understandable model features?
What about cases where the features are just merely correlated with a
subpopulation of a protected class?

If we're being honest, a system only needs to be in a decision-making
_capacity_ for discriminatory behavior to be scrutinized, since in many cases
human operators will not be able to identify the specific features being used
to make decisions about people -- the features could be highly correlated with
some subpopulation of protected class. If you take that to be true, the
question reduces onto what decision-making roles ML algorithms have that could
be discriminatory, and it's hard to argue this is not a massive part of their
current and expected roles.

I think this is going to be a long, winding ethical nightmare that is probably
just getting started by human-digestible examples such as these. One can
imagine things like this one being looked back on as quaint in the naivety to
which we assume we can understand these systems. Where do we draw the line,
and how much control do we give up to an optimization function? Surely there
is a balance -- how do we categorize and made good decisions around this?

As far as I know, a cohesive ethical framework around this is pretty much non-
existent -- the current regime is simply "someone speaks up when something
absurdly and overtly bad happens."

~~~
jakelazaroff
_> What about cases where the features are just merely correlated with a
subpopulation of a protected class?_

This question can be rephrased as "is there a difference between de facto and
de jure discrimination?"

My answer is no, causality doesn't matter here: if feature A is a good
predictor that some person belongs in group B and not group C, then filtering
out feature As is effectively the same as filtering out only group Bs.

~~~
darawk
Ok, so if you're hiring professional arm wrestlers, and your model looks at
bicep muscle mass, is that discrimination because it selects against women?

If you're hiring therapists, and your candidates take a personality test, and
your ML model weights the 'nurturing' feature highly, is that discrimination
because it selects against men?

~~~
jakelazaroff
Underlying your examples is the implication that a preference shouldn't be
considered discriminatory if the trait being selected for correlates with
fitness. I agree with this position!

What I don't agree with is the assumption that, in this case, the preferred
traits _do_ correlate with fitness, since there's at least one — gender — for
which this model is biased even though it has no apparent correlation.

~~~
darawk
Ya, I just mean to say that uncorrelation with fitness is an important
qualifier.

------
strict9
At best ai amplifies existing patterns and biases when handling repetitive
work. Over and over we hear how Facebook, Twitter, Google, and others will
solve the problem of problematic content and bad actors through ai and neural
networks. It's a fraud and the digital potemkin village of our era.

~~~
fx32s
AI learns from the training data it's given and copies any biases this data
exhibits. Pretty much all software today uses ML in some form to improve their
services. I feel it's here to stay and not bad by default. We just have to
make sure we are aware of its current limitations.

Facebook is already auto-flagging content this way but it's just a very hard
problem (even for humans).

~~~
mindcrime
_AI learns from the training data it 's given and copies any biases this data
exhibits_

I hate to sound like "that pedantic guy", but I'd argue that the quote above
is only partially true. It's the case that _some subset_ of AI techniques
"learn from the training data it's given and copies any biases this data
exhibits". There are AI techniques that aren't based on supervised learning
from a pre-existing training set. That doesn't mean that those techniques
can't wind up adopting the biases of their human overlords, but I believe some
aspects of AI are _less_ susceptible to this kind of bias, than others.

------
hahan
Call me cynical but I found it amusing that no one points out the fact that
engineering work is laborious and dry to say the least for most people. That's
the reason why there are so few people who have other options, females, upper
middle class people, people of means, in the scene. There are so many lowish
paid low status unsought for sectors where majority workers are male, say
janitors in Universities, why I never see any discussions on that bias ?

~~~
skywhopper
You're saying that software engineers are in the role because they have no
other option? No upper-middle-class people in the position? What are you
talking about?

You're missing the point anyway. The article made it pretty clear that this AI
amplified biases humans already have about women applicants to tech positions.
Stating your own biases about women make no sense to the topic, or to the
argument you seem to be trying to make.

Janitors are worth talking about as well (women in the same job usually have a
different title with less authority and less pay), but high-status, highly-
paid, highly influential jobs are where it's most important to avoid bias, and
so we talk about those more.

~~~
hahan
And of you think programming is high status, high paying, highly influential,
why we import immigrants to do that ? You think the talent pool of some
hundred million American people with most entensive education system is
insufficient ?

------
patwolf
The explanation seems overly simplistic. If the difference in volume of male
candidates mattered, then I would also expect to see a bias in favor of
applicants from larger universities. That seems like too obvious an issue in
the way the algorithm was designed.

I see four possibilities here:

1\. The algorithm was designed in a completely inept fashion

2\. The algorithm design was sound, but ultimately ineffective

3\. The algorithm was sound and effective, but results were considered
discriminatory.

4\. There's something biased about how employees are rated--the data that
would feed into the algorithm, which is possibly more of a human element.

Edit: Added fourth possibility

~~~
bicubic
We're talking about Amazon, one of the biggest powerhouse ML employers. I
don't buy that the model was poorly designed or ineffective. They also didn't
just scrap the model without understanding how or why it failed to meet its
objectives.

And whatever the cause was, it was not the poor quality of the training data.
They tried to stop the model from downranking women based on obvious keywords,
only to find it learning to downrank them based on more subtle language cues:

> Amazon edited the programs to make them neutral to these particular terms.
> But that was no guarantee that the machines would not devise other ways of
> sorting candidates that could prove discriminatory, the people said.

So the answer is 3 or 4.

If the answer was 4 then they would have probably mentioned the cause of the
bias somewhere in that otherwise detailed article. But they didn't, possibly
because the cause is controversial - probably option 3 but possibly still
option 4.

And then there's the subtle cop-out:

> Gender bias was not the only issue. Problems with the data that underpinned
> the models’ judgments meant that unqualified candidates were often
> recommended for all manner of jobs, the people said. With the technology
> returning results almost at random, Amazon shut down the project, they said.

If the model was actually useless and returning random noise, then there
wouldn't be any bias, and the article wouldn't need to talk about
discrimination. This paragraph reads to me like they decided to mention long-
tail results (that you'd find in any ML model) as supportive 'evidence' that
the model was somehow broken rather than producing valid but controversial
results.

~~~
moate
Well it's also what you're training for. If you're building a machine to
return "John Doe, The Company's Best AI Dev" then there's a few things that
you might get back from a working and effective machine. The problem is that
while the machine is doing the best to replicate a John Doe, the humans who
designed the machine might realize that there's a lot of variables in what
they're looking for that scoping the design is impossibly complex.

Basically, people WANT bias, but they want specific bias. One of the
difficulties in training a machine to understand what you find as viable bias
vs problematic bias is all the tiny nuances. Yes, you want a great engineer on
paper, but you also need to have as diverse a cast as you can in your company
(both for optics and creative solutions) AND you need to get people you can
afford AND you need someone who's enjoyable to work with etc etc.

Hiring is always going to be part art and part science. There will always be
some type of discrimination because of the perceptions of what makes a good
qualification for the job. Any hiring group is just going to have their own
hierarchy of what they think are the most important skills to have. You can
only approach perfection/unbiased hiring, you can never actually achieve it.

------
electrograv
I wish we could move away from resumes for tech role screening anyway, since
they convey very little real reliable information. I’ve seen too many great
hires from candidates with relatively weak resumes, and failed interviews from
candidates with great resumes (and obviously vice versa).

I’m not sure what the best alternative should be, though. I am a fan of open
source work as a sort of code portfolio, but it doesn’t work for every kind of
engineering/science (edit: and also would introduce bias against professionals
too busy for open source.)

Regarding bias — it seems the only way to truly eliminate it (including
unconscious bias) is author-blind reviews, i.e. reviewing code written by a
candidate without knowing anything about that candidate’s identity. (And the
nice thing about code is it usually doesn’t signal any identity traits of the
author via side channels.)

~~~
stef25
> author-blind reviews

Should I ever be in a position to hire a colleague, I wouldn't ever do so
without having a chat with them.

I spend 8hrs a day in an office with my colleagues (sometimes more than with
my wife & kid) and the ones I can't stand is about the only thing wrong with
my job.

If we can't even see the person's face over some gender bias hysteria then I
wonder how the hell we got here.

People should just get over the fact that men and women are different.

~~~
electrograv
I’m not saying that all candidates should be interviewed without ever meeting
or interviewing the candidate face-to-face; rather, I was speaking
specifically of the coding interview portions. In these segments, face-to-face
doesn’t really matter IMO, since it’s all about the candidate’s problem-
solving ability.

Yes there’s a lot of “bias hysteria” out there, as you put it, but I would
dispute that advocating “author-blind meritocracy” falls into that category.

Quite the contrary: An author-blind review process would actually make any
bias impossible — either for or against any particular identity group. It
seems to me most people should be able to get behind that, but maybe I’m
wrong.

In fact, the main opposition to author-blind meritocracy is the “post-
meritocracy” movement which is slowly making its way into open-source projects
codes of conduct.

~~~
stef25
You're totally right in that coding tests can and probably should be done
"blind".

When it gets to the "let's meet" stage, it could be possible that the bias
just comes back. Yes this woman made a perfect score on the coding but she's a
woman and I'm not going to hire her for reason X that just bubbled up out of
my biased brain. I can totally see that happening, unfortunately.

------
arandr0x
The idea that Amazon is trying to enforce diversity by using an algorithm that
is made to detect, and match, patterns boggles the mind. Why, yes, if your
recruiting cost function is "is that person just like all the others we
hired", you will end-up with a non-diverse workforce, no matter whether the
model optimized with this function has 20 layers, 250 hyperparameters, or two
legs, two arms and a fast-receding hairline.

You can de-bias by explicitly controlling for gender, but now everyone in your
company went to CMU and likes dogs.

The more I see news about what recruiting for ultra-large corporations, the
more I think one of two things is true:

* ultra-large corporations are doomed to hire less and less well in a way that is more and more biaised, and we should regulate against such corporations in a way that forces them to redistribute their wealth to SMBs;

* ultra-large corporations need to start exclusively growing through acquisitions, which will have the effect of redistributing their wealth to SMBs, and also of hiring a more diverse base of employees because there is a priori a greater diversity of backgrounds leading to success in the free market than the diversity of backgrounds leading to success in the Amazon interview.

------
thoughtexplorer
Anyone who has worked at tech companies and has been involved in the hiring
process knows the following:

The best thing to be right now is a woman engineer. You can easily get hired
within the week.

Unfortunately this doesn't seem to be well known outside of those involved in
hiring.

~~~
fizwhiz
Is this anecdotal? Are you insinuating that women have an upper hand now for
reasons beyond their skillset?

~~~
thoughtexplorer
Yes. There is a bonus to being a woman in this situation, all other things
being equal.

Silicon Valley did a good bit on this
[https://www.youtube.com/watch?v=Dek5HtNdIHY](https://www.youtube.com/watch?v=Dek5HtNdIHY)

It's funny because it's true.

------
beat
Garbage in, garbage out.

The system failed because they were trying to solve the wrong problem, or
maybe more specifically, didn't solve the problem that led to the problems
with the AI. Amazon was treating the hiring problem as an _efficiency_ problem
alone, and ignoring the _bias_ problem. So they wound up training the AI to do
a shitty job much faster than humans ever could be shitty - and, by analyzing
the data in a way the human results weren't analyzed, showed the failings of
the human hiring process.

Existing process is sexist. Automate to "improve" it, and you wind up with
something even more sexist. What this means is that Amazon needs to go back
and revamp their whole hiring process to make it fair, before trying to make
it faster.

~~~
pcmaffey
> So they wound up training the AI to do a shitty job much faster than humans
> ever could be shitty

If nothing else, modeling our existing behavior in this way is a great use
case for ML. As it allows us to "fast forward" and thus--hopefully!--identify
our flaws based on modeled iterations.

------
ncallaway
I would guess that the training data for the ML set was the set of all resumes
and an indicator of whether the candidate was eventually hired (maybe with
supplemental data about how far in the process the candidate got).

Could this be a direct indicator of a powerful subconscious bias in Amazon's
existing hiring process?

~~~
michaelt

      Could this be a direct indicator of a powerful
      subconscious bias in Amazon's existing hiring
      process?
    

Maybe - but maybe not.

Imagine a company with 2 men in HR, 2 women in HR, 40 men in engineering, and
10 women in engineering. That's with gender-blind hiring, reflecting only the
4:1 ratio of male to female CS graduates.

If you picked a random male hire, there's a 40/42=95% chance they're an
engineer whereas if you picked a random female hire, there's a 10/12=83%
chance they're an engineer.

Thus if you look over all hires' CVs, due to Bayes’ law the dataset says being
male increases the conditional probability you meet engineering hiring
requirements - and the ML system picks up on that.

~~~
sanxiyn
Why would one train on "all hires CVs"? It'd be "engineering CVs", moreover
it'd be "engineering applicants CVs", not "engineering hires CVs".

~~~
Bartweiss
In the article I noticed: _" Problems with the data that underpinned the
models’ judgments meant that unqualified candidates were often recommended for
all manner of jobs"_

That language is a bit ambiguous, it could just mean that the algorithm failed
on a wide variety of jobs beyond engineering. But another reading suggests
that the algorithm was not asked "is this person a good fit for this role" but
instead "what, if anything, is this person qualified for?"

If that's the case, then the problem starts to make more sense: the algorithm
learned a correlation between male-sounding resumes and being hired for
engineering roles. That could produce a biased approach even if the decisions
in the training data were gender neutral but position-specific. Of course, it
would also mean that an Amazon ML team trained an algorithm with inputs that
didn't match to its eventual task, and makes me wonder what they used as a
test set...

(Anecdotally, Amazon spent quite a while recruiting me for SysEng work I'm
wildly unqualified for and uninterested in, even suggesting a switch to
applying for that team when I was already in the funnel for something I'm more
qualified at. When my resume eventually made it to a syseng engineer, they
were rightly baffled that I had landed on in their stack, giving me the sense
that _something_ was screwy with how Amazon decides who heads towards which
role.)

------
zdragnar
I'm reminded of why Watson failed, and the problem with ml and ai in general-
you can't peek under the hood to see why something happened, or how to keep it
from happening without a lot of time, a lot of hard work, and a whole lot of
carefully groomed data.

~~~
bglusman
This is especially memorable/well treated in the Cory Doctorow novella/short
story "Human Readable"

[https://craphound.com/stories/2005/10/12/human-
readable/](https://craphound.com/stories/2005/10/12/human-readable/)

------
noetic_techy
The problem is you cant feed the ML algorithm training data based on what your
company currently looks like, you have to feed it an idealized set of what you
want it to look like. It almost needs to be fictitious training data to hide
the ugly bias that's already built in.

I don't think this will ever work. There is too much variability in resume
wording that correlates to gender and even culture of origin even when you
take out names and any other protected class identifying markers. The Dutch
tried this and ended up with less diversity.

I'm going to go out on a limb and say you almost want to leave all that
identifying data in, but put each candidate into buckets with separate rating
algorithms trained against only that "type" of candidate. The top candidates
from each culture, and the top candidates from each gender, etc etc, however
you want to do it. Feed them into a picking algorithm that builds a composite
of what you want your team to look like diversity wise based on the top
candidates from each bucket, and go from there.

Don't take my opinion seriously, I'm not an ML guy.

------
daenz
It seems that the only socially acceptable output for the AI would have been
hiring women 50% or more and hiring minorities at a rate greater than or equal
to their representation in the populations. Anything else is clear bias and
discrimination.

The project was doomed from the start.

------
ummonk
Machine learning should not be used in this way on humans, whether for resume
screening or even more dystopic, for sentencing.

~~~
mcherm
Why not?

Seriously: let us take as given that the AI models are biased. Will you also
admit that the existing processes are biased? If so, then what we need to ask
is which is MORE biased. It might be complaining that we shouldn't release
self-driving cars because on rare occasions they cause accidents.

There is, however, another criterion besides how biased it is: how biased it
will be in the future. Human-driven processes have the opportunity to become
less biased in the future (also the chance to become more biased, but overall
things tend to improve). AI processes that are opaque might lock in bias in a
fashion that is unreviewable. I believe that the solution is to build AI
models that are more transparent -- that could be BETTER (in terms of avoiding
bias) than the human-driven processes we use today.

~~~
ummonk
I think we basically have the same view. I don't support black box machine
learning models. I do support using automated tests and simple well defined
objective criteria though, which is basically a transparent AI model.

It's just that generally what seems to distinguish whether something is called
"machine learning" rather than "data science and modelling" is that the former
is black box and the latter is not.

------
leeny
I tried to do something similar a while ago (for eng hiring specifically). It
turned out that the number of grammatical errors and typos mattered way more
than anything else on a resume.

[http://blog.alinelerner.com/lessons-from-a-years-worth-of-
hi...](http://blog.alinelerner.com/lessons-from-a-years-worth-of-hiring-data/)

That aside, what sucks is that attempts to automate resume scoring rarely look
at harder-to-quantify features and focus on low-hanging fruit like keyword
occurrences... though in my experience it's such a low-signal document for
engineering hiring that the whole thing is a fool's errand.

------
vishal_pym
This is not very surprising - Machine Learning algorithms trained on biased
datasets tend to pick up the hidden biases in the training data. It’s
important that we be transparent about the training data that we are using,
and are looking for hidden biases in it, otherwise we are building biased
systems. Fortunately, there are open source tools out there that help audit
machine learning models for bias, such as Audit AI, released by pymetrics -
[https://github.com/pymetrics/audit-ai](https://github.com/pymetrics/audit-ai)

------
trustmath
I hate this industry. Shooting themselves in the foot over and over again
because no one can get passed the idea that possibly, women can be just as
good at math, logic and computer science - if people would just let them. This
never ends. It's just one place after another, when it gets discovered. It
never changes.

~~~
exo762
Why not "women are just not as interested in math, logic and computer science
to pursue it AS OFTEN as men"? Why are you not considering this possibility?

~~~
komali2
Ah, the Damore argument. Besides the fact that his psuedo science has been
summarily handled[0], to consider his argument you then have to equally
consider the possibility of sexism in academia pressuring women to not study
these subjects and societal pressure their whole lives pressuring them to not
persue these career paths.

There's also the idea that lack of women scientist "heroes" can be limiting
(lack of role models). Basically the idea that if you stack the cards against
a population, you're gonna see population-wide effects.

Given these data points, a biased hiring AI contributes to the problem.
Therefore, it should be fixed, along with the above points.

[0][https://www.bbc.co.uk/news/world-40865261](https://www.bbc.co.uk/news/world-40865261)

[0][https://www.theguardian.com/technology/2017/aug/13/james-
dam...](https://www.theguardian.com/technology/2017/aug/13/james-damore-
google-memo-youtube-white-men-radicalization)

~~~
21
> _There 's also the idea that lack of women scientist "heroes" can be
> limiting (lack of role models)_

This one is a bit weird, computer guys were always "nerds" and "geeks" to stay
away from.

~~~
crooked-v
...starting in the 80s, which is also when the percentage of women going into
computer fields started dropping like a rock.

------
sanitycheck
Given that Amazon is so far unable to successfully recommend any product which
I actually want, even given the vast dataset of my Amazon purchase history, I
am not remotely surprised that their engineers can't successfully develop a
people recommendation engine either.

~~~
maym86
\- If you hired this male candidate you may be interested in this identical
male candidate.

------
village-idiot
We’re going to keep seeing stuff like this until people finally realize that
AI isn’t some magic tool that solves every problem. It still reflects the
biases and assumptions of its creators and its training data set.

------
kaitai
I'm doing a bunch of ML on a very different data set -- looking at what people
eat (survey data). What's interesting to me is that if you do principal
component analysis, for instance, there are some differences between the boys
& girls in the sample, but they're not very distinct. If you do clustering or
random forests on the dietary intakes of the whole cohort, you get mushy and
unclear signals. If you split the survey respondents and bin by age and
gender, and run different models for each, suddenly signals jump out of
clustering _incredibly_ clearly! What's weirdest is that you get some of the
_same_ dietary clusters for the different demographic groups -- but those
clusters were not evident when you did clustering across the cohort.

It's surprising to me that Amazon didn't (apparently) try different models for
different populations. Sure, it might open you up to criticism, but there are
some good data-driven reasons to do so. Women's colleges won't show up with
regularity on men's resumes, for instance. Similarly, there are fraternities
and sororities around engineering and STEM that may provide different signals,
but won't appear equally distributed on men's & women's resumes. Language use
on resumes does differ by gender, and using "Captured value of $100 million
by..." rather than "Created value of $100 million by..." may describe the same
project. (I gotta say, using verbs at all seems silly, since it really is
about how well you market, rather than what you did.)

So, curious about the model. Different models for different subsets of the
training data can lead to big wins.

~~~
rfeather
Off topic, but I'm looking at similar data. Are you talking about a public
source (eg NHANES) or something else?

------
crimsonalucard
Once I worked for a startup selling fresh human baby milk to mothers who
couldn't produce milk. I was contracted to write a supply side AI that would
find people willing to sell the milk they produced to us. The AI had the exact
opposite bias as the one in this article... it showed extreme bias against
men.

I also wrote a similar AI for finding surrogate pregnancy candidates and it
also showed bias against men.

Goes to show how AI can fail and be incredibly sexist.

------
gumby
I hate the clickbait way in which this story has spread across the net.
"secret AI recruiting tool" sounds like Amazon did something nefarious.
Instead they built a tool, found out it was broken, and didn't deploy it.

The actual newsworthy part, which is getting slightly stale, is that it was
influenced by the data bias.

I am not even a fan of Amazon but I think this is unfair to them. They did the
right thing here.

------
whatgoodisaroad
I feel like many comments here are taking this story on face value, but the
cynic in me reads this as a planned leak to skapegoat a (hitherto unknown) AI
system for their existing hiring biases. Public perception of AI is more aware
of model biases nowadays, and we seem all to willing to accept this
explanation over the simpler explanation that tech hiring at Amazon is broken
in the same way it is everywhere.

------
darawk
There's something I don't understand in stories like this. It ought to be
relatively straightforward to correct biases like this. All you need to do is
train a model to explicitly classify gender from resume's, and then use _that_
model to de-gender resume's before passing them on to the hiring model. Is
there some reason people aren't doing this?

------
Novashi
I think there's also a subtle point here that an AI figured out that Amazon
hiring was biased more quickly than Amazon itself.

I wouldn't go so far to say that running your history through an AI is
necessarily proof of anything though (esp. in court). Imagine using another
company's historical data to suggest that they're discriminatory.

------
c3534l
I'm honestly pretty shocked that Amazon went ahead with this which is such an
obvious legal liability in an extremely touchy area of business practice.
Everything about this should have been a red flag that meant it never should
have gotten off the ground, let alone been in place for years.

~~~
megy
You don't think they should try new things, even though they might not be
successful?

~~~
c3534l
When it systematically and illegally oppresses a class of people in an
obviously ethically dubious manner, I would expect a certain level of caution.

------
iamleppert
The thought that you can somehow suss out with any consistent performance who
will be good or not at their job based solely on a training set consisting of
resumes is so banal and inept that its laughable. Team building has so much
more to do with assembling the right kinds of people with the right talent and
personalities that no present day machine learning could ever hope to come
close in effectiveness to a thoughtfully applied interviewing and hiring
process.

It is clear whatever executives in charge of this project haven't the first
clue of whatever it is they are doing. It lacks not only any technical
deliberateness but also fails the common sense test. I really wouldn't want to
be working for such people.

------
40acres
I know diversity in tech is a hot button issue but I think hidden underneath
the politics of it all the most important reason to champion more diversity is
AI.

I don't think we're anywhere close to general AI so any AI system out there is
built on what we feed it. If your music suggestion algorithm has a biased AI
that's one thing, but when you're using AI to make critical decisions in
society like who gets hired/recurited, medical diagnoses and other things you
need to be extremely careful.

When designing AI models the broadest set of viewpoints should be considered.
Any piece of AI is simply a reflection of it's creators, we need to make sure
that some sort of equitable consensus is reached before deploying AI to
critical human scale issues.

------
otakucode
It seems strange to me that they would train the system on resumes submitted
to them over the past 10 years. I presume they judged the ones they hired as
'successes' and those they did not as 'failures'? Or did they judge all
submissions as successes, expecting to be testing for 'does this person fit in
with the group of people who submitted to us over the past 10 years?'

Neither matters much, neither makes sense. What they should have been doing
was training it on the resumes submitted by employees who then went on to be
very successful within the company. Those are the successes. Those are what
you want more of. And, probably far less likely to be weirdly biased by gender
or race or whatnot.

------
LinuxBender
What would happen if you removed gender from all HR and recruiting systems and
then retrain the AI? Or for that matter, remove ethnicity, age, creed, etc...
Is there any reason we need to be more specific?

Race: Human

Gender: Yes

Age of legal contractual consent: Yes

~~~
TangoTrotFox
The AI did not have access to gender. It was just word weighting and it turned
out that words that could be linked to females ended up with applicants that
had a negative outcome. Like the article says the AI ended up giving a
negative weight to any resume containing the word _women 's_ as in _women 's
[---] club_, or those that mentioned certain all women's colleges.

~~~
LinuxBender
It should be fairly easy to filter / replace all of that. The same logic can
apply. Can we add some simple filters and re-train it?

~~~
daenz
>Amazon edited the programs to make them neutral to these particular terms.
But that was no guarantee that the machines would not devise other ways of
sorting candidates that could prove discriminatory, the people said.

If you've ever trained a NN, you'll know that they are exceedingly clever in
finding patterns that fit what you're training for. You can remove the word
"women's" and other obvious things from being considered, but I promise you,
if there's another non-obvious patterns that are more likely to apply to the
women candidates, the AI will find them and use them.

------
mankypro
Probably being used for age bias as well...

------
kgwgk
I’m curious: how is it that this submission has not “reused” the submission I
made eight hours before? In my experience, sending a link that is already
there simply upvotes the existing submission.

~~~
adventured
After a certain amount of time, the same link can be posted by a different
user. I'm not sure if there is a point or point vs time qualification on that.
You can post old, popular stories again for example.

Your submission was made 11 hours ago. If they merely applied an upvote to
that existing story after 10 hours, the point time value rot would be such
that it would have greatly reduced impact at ranking the story to the front
page (ie it would be nearly useless for discovery purposes).

------
chiefalchemist
> "Gender bias was not the only issue. Problems with the data that underpinned
> the models’ judgments meant that unqualified candidates were often
> recommended for all manner of jobs, the people said. With the technology
> returning results almost at random, Amazon shut down the project, they
> said."

Doesn't this mean the headline is incorrect, incomplete and/or (intentionally)
misleading?

------
briandear
“Instead, the technology favored candidates who described themselves using
verbs more commonly found on male engineers’ resumes, such as “executed” and
“captured”

This is the most interesting part to me. It’s suggesting that men and women
think about things differently. Which, if true, suggests a lot of other
possibilities that have relevance when hiring for certain positions.

------
RangerScience
One of the first good critiques of the "oncoming AI apocalypse" that I've seen
is exactly this: the encoding of existing biases into the resulting AI system.

But, it also means that bias is now measurable...

...and there were more than a few papers at NIPS that were directly dealing
with "fairness" in a NN, aimed at addressing and using these issues and
effects.

~~~
Jtsummers
_Weapons of Math Destruction_ offers good discussion on this problem, if you
want to read more about it.

------
foxhop
I'm pretty sure this tool reached out to me at one point. Here is the email
excerpt:

[http://pad.yohdah.com/638/amazon-ai-recruiting-
email](http://pad.yohdah.com/638/amazon-ai-recruiting-email)

------
mankypro
Wouldn't an easy way to eliminate bias be to remove any algorithms that use
name recognition and gender? If the Ai doesn't have this data to "reason" from
wouldn't it level the playing field?

~~~
falcolas
If, and only if, those are the only differences between candidate resumes. I
don't think that's a reasonable assumption to make. Work history differences,
sentence structure, word choices - all of these can quietly reflect gender
differences.

------
Jyaif
Every product in development is going to be secret, but Reuters tries to make
it sounds as if some sort of conspiracy was going on. What's next? "Jeff Bezos
secretly goes to the restroom"?

------
foolfoolz
this sounds like a bad ML tool to begin with because the input data sucks.
it’s probay judging who can write a “good” resume. plenty of bad candidates
write good resumes. and good hires write bad ones

------
ramblerman
How do they measure non-bias? i.e when is it working as intended?

My only worry is that they are aiming for 50/50 which doesn't reflect the
underlying gender ratio of the developer pool.

------
apercu
The discussion here (versus other channels) is so much better. Not that I
expected any different. I'm just glad no one is using this media report to
confirm their own biases.

------
patrickg_zill
What if we trained a bunch of robots with AIs embedded and they turned out to
be racist misogynists? There has got to be a science fiction story about
that...

~~~
castlecrasher2
You mean like Tay?

------
arountheworld
Have they not removed protected characteristics from the data? If they had, AI
wouldn't know about gender and there would be no bias.

------
izzydata
It is no secret that more men pursue careers in this field. How can you expect
any algorithm to produce an equal number of men and woman applicants? If it
did this then that would be actually biased in favor of woman.

If this is what they want then they have to feed their neutral algorithm pre-
biased data to get their expected results.

~~~
briandear
I think it isn’t equality in numbers but in activity screening out women.

So there may still be a 8:1 ratio of men:women but according to the article,
it would seem than even that 1 women would have been negatively impacted.

I am not arguing the validity of their conclusions, just saying that I don’t
think that it’s about equal numbers: given a male and female applicant,
according to the article, women would have had an unequal chance of passing
the screening. (Now if that inequality was for valid reasons, that to me, is
an open question, but the article indicates that there was unjustifiable
bias.)

~~~
izzydata
Fair enough. I'm still skeptical of their interpretation without more details,
but anything is possible.

------
xarill
If the gender was passed as a "feature" in the machine learning model, then
the company may have been biased. If not, then the majority of the women's CVs
were inferior. This important information is not present in the article.

~~~
core-questions
This entire article is a strong hint that the AI system simply picked the
better resumes, where better = more likely to get hired, exactly as it was
designed to do. The fact that more men than women satisfy these requirements
should come as a surprise to precisely nobody.

Fact is, there are more men than women working in tech, especially seriously
hard-core stuff like these big companies need. This is most likely out of
their own personal volition, and no matter how much outreach we do, this is
likely to stay the case for generations.

Of course, that's also an egregiously wrongthink position to take. Double plus
ungood.

------
nobody271
This never has been a problem about gender issues. It's about finding bugs in
magical machine learning algorithms. Im sure the algorithm suffers from many
other deficiencies but gender bias is the one people write articles about.

------
lawnchair_larry
So why is it not ok that women are penalized by being statistically less
likely to get an offer yet it’s just fine to penalize men on auto insurance
for being statistically more likely to cause an accident?

~~~
maym86
Because men are statistically more likely to cause auto accidents. There is no
evidence that women are not statistically more likely to be bad at getting
good work done at companies like Amazon. It's pretty simple.

~~~
crimsonalucard
Do you have a source for your second statement?

------
paulcnichols
Stupid question probably but why not have two models.

~~~
T-hawk
Because evaluating differently based on gender _actually is discriminatory_.

~~~
commandlinefan
Well, that sounds about right, then - every proposed solution to subtle,
unconscious bias inevitably turns out to be explicit bias in the other
direction.

------
screye
The article never mentions what form of supervision Amazon used for building
this model.

I can easily see a bias towards a particular gender arising, even when a team
had intentions to do, simply because of how the data is selected.

________

Case 1 : They used similarity statistics between candidates that were hired
and applicants.

This is the easy one. No need to label datasets. The approach is semi-
supervised. It will also 100% cause a bias towards employees with similar
profiles as those already working at Amazon. (ie. men)

________

Case 2 : They manually labelled/ranked a dataset of resumes and assigned them
scores. (more likely)

Here the implicit bias of the mechanical turks / rubric would be visible. If
higher scores were assigned to traditionally masculine activities, then male
resumes would stand out. I doubt this was the case though, as people at Amazon
are generally competent enough to not make such a trivial mistake. Also,
gender only gets mentioned in non-technical skills (mean's team, women's club,
etc), which in general are not the most relevant part of the profile anyways.

________

Speculation:

1.

> Instead, the technology favored candidates who described themselves using
> verbs more commonly found on male engineers’ resumes, such as “executed” and
> “captured,” one person said.

I wonder if this has anything to do with gender at all. All good resumes that
I've read use action words irrespective of gender. Maybe type A personalities
use words like "captured" more often than type B ones, and the % of men in
type A categories are greater. Would discrimination against women in such a
case be unfair ?....Maybe.....Maybe not.

2.

Extracurricular activities are more prominent on weaker resumes than stronger
ones. So, the model may be weighing down resumes with too much extracurricular
fluff vs technical skills. Men's activities are rarely prefaced with the word
"men" in it. (they would just say Football team, Chess team). Women's
activities on the other hand, always have the word "women" attached to it. If
the extracurricular activities were penalized, then the words inside of them,
including "women" would also be penalized. Thus, the model learns a latent
gender bias without any bad intentions.

_________

In ML, one of my favorite statements is : "The model is only as good as the
data it is trained on." If the data is not sufficient, rich enough or prepared
in the correct manner, then unintended consequences are nearly guaranteed.

------
fromthestart
Things are going to get very interesting when AI inevitably reveals consistent
differences in performance among genders.

There's only so much you can sweep under a rug.

------
jiveturkey
Is it a Microsoft Tay problem?

------
just_myles
Gender. I am more concerned about race and ethnic background. Why is this
never addressed?

