For example, we're generally not allowed to discriminate based on gender, but even without a gender field, ML will happily imply a gender out of its firehose of other data, and discriminate based on it anyway.
"AI finds that employees are less likely to stay at the company if their names end in 'a' and they score higher on 'cultivation' rubrics" This same AI then recommends hiring applicant 2 over applicant 1. Is this allowable?
Bear in mind "bias" is just a perjorative for "generalisation" or alternatively "lesson learned". AI algorithms are good at detecting patterns in data and bad at being politically correct. This is not a flaw of the algorithms, it's a flaw in people who can't accept measured reality and go into denial.
So an AI trying to hire programmers discriminates against female sounding names, because it's learned that this is correlated with success? Apply it to hiring nurses or primary school teachers and it'll probably do the opposite. This is only "bias" if you start from ideologically driven blank slate assumptions. Otherwise it's just common sense.
Instead of someone drilling into your head that your initials = good, and others = bad via your environment, it happens through natural processes.
Most biases are just pattern matching which is necessary for efficient memory references and power quick on-the-spot judgements/decisions we need to make. People make too much of a big deal out of stereotypes like it's evil to hold them even though it's a basic function of the brain. It's entirely possible without constant vigilance to make misjudgements based on that but it will still happen to everyone, even the most socially aware people.
For example, imagine that you wanted to train an algorithm to distinguish photos of dogs from photos humans. So you collect a bunch of photos of both dogs and humans and use them to train a classifier. You do all the proper cross-validation, bootstrapping, etc. to ensure that you are not overfitting, and you get really good results. Then, looking at the mis-classifications, you notice something: all the photos that are taken looking at an angle down toward the ground are classified as dog photos, and all the photos taken looking straight ahead are classified as human photos. It turns out that in your training set, most of the dog photos are taken at a downward angle while must of the human photos are taken facing straight ahead, because humans are taller than dogs, and your machine learning algorithm identified this feature as the most reliable way to distinguish the two groups of photos in your training set.
In this hypothetical example, no overfitting occurred. The difference in photo angles is a real difference in the training sets that you provided to the algorithm, and the algorithm did its job and correctly identified this difference between the two groups of photos as a reliable predictor. The problem is that your training set has a variable (photo angle) that is highly correlated with what you want to classify (species). This is considered an unwanted bias (and not a reliable indicator) because the correlation is caused by the means of data collection (most photos are taken from human head height) and has nothing to do with the subject of the photos.
(Though maybe the term as used in industry is less strict.)
It would be like if your car was driving in circles and you called a mechanic to fix your steering, and they told you that the actual problem was that both right wheels were missing. That's not a steering problem, and no repair to the steering system will fix it. The only fix is to put new wheels on.
When AI makes a decision, right now, people only uses the probability output. Hiring A has .6 probability while hiring B has .4. then we will hire A instead of B. However, if we consider the confidence intervals, the decision might not be that clear. Say +/- .5 to hire A but .2 to hire B. If exploration is considered too, very likely that we will give B a chance.
AI is in the realm of probabilistic decision making, while normal people don't follow. The bias is not from the training side. It's the decision making process incorporating AI should change.
...which means that whether a model is "biased" depends on where and how it's applied. This is an important point that is missing from most discussions, articles and even research papers on the so-called "AI ethics".
If the biases are consistent with other real-world data, then it's not overfitting.
If the results are odious to us, it should be impetus to critically analyze not only the AI/ML systems, but also the underlying assumptions that they're built on. Instead, developers become defensive and cage-y about their processes.
If you don't want systems to have disparate impact, you have to be adamant about it in your design. If you think society is better off with systems that reflect preexisting biases, then fine, but be ready for the backlash.
At the end of the day, it really is up to what humans want to do with themselves. It's an opportunity to be truer to our intent, not a bug to be covered up.
When you say unconscious bias, you are kind of implying that the model learns something that is false. But more often it's the case that the model learns something true that we don't want it to learn. That's what makes the problem so hard, you are trying to hide the truth from a system you only half-understand processing data you only half-understand. There is a big risk the truth slips through the cracks if you aren't careful.
It's more that the model learns something that is undesirable. It could be the case, for example, that the true thing that the AI learns is that your resume screening process tends to exclude women. This is true, sure, but it could lead to the undesirable outcome where the presence of a female name on a resume might be weighted heavily against the candidate.
I agree that this process is one of resolving blind spots, but I disagree that the blind spots are simply areas devoid of light. AI/ML systems are frequently employed to augment or stand in for human perception, which is known to be necessarily incomplete with respect to reality. In other words, they can learn things that seem true to us but that are false from another perspective, or undesirable once exposed. What's exciting about them is that they provide an opportunity to interrogate the flaws in our individual perception with a systematized observation and analysis, in a much more sophisticated manner than in the past. But fulfilling that potential requires humility.
Their failure was not just in lacking diverse training sets, but diverse QA, or at least QA looking for those blindspots which eventually became evident.
So, correct, it's not as simple as having "sufficient" data.
Your expectation was the same one they had, and it was wrong, which is the crux of the issue.
If you asked the developers of the facial recognition library, "does your software have problems with very low contrast conditions" they'd surely have answered yes. Fully conscious of the issue but, that's software. It's hard to get everything right 100% of the time.
Do you have a source for data set mis-labelings being a problem?
However, ML is often sold as a solution for generating outcomes, not for finding truths, wether they be true or false.
The distinction is huge.
No matter what humans do, they will reap what they sow. Consequences and outcome matter more than "truth" (which may be in the eye and competence of the beholder).
What people think they should be is far, far more complicated and nebulous. I'm not sure I fully understand, but people have been fed a lot of nonsense about AI systems, from Deep Blue to Watson to AlphaGo, etc, etc, showing them as being very powerful in limited domains, and extrapolating that out into overestimations of what they could do.
The other main problem is that people seem to think that these AI systems will be a complete replacement for human thought and decision-making, which, frankly, knowing even a little about how the sausage of software is made, is completely terrifying.
The scary part is, they can do so on a different level than your average human. Pouring over larger sets of data quickly, than you or I have time to consume in our entire lives.
The largest issue, however, is that an AI can find and shed light on dislikable realities of the world.
Racism, sexism, culturism, opposing political opinions... Perspectives that are not "PC" still exist and permeate the digital world along with the physical one. Creating unbiased data is, imo, impossible, as I am also biased, and so are you. I don't know what unbiased data is.
I can certainly say that I have held racist, sexist, religious, and political views at various stages of my life - based on small sample set data, and biased trainers. I have grown a better understanding and no longer hold many of the naive beliefs that I held when I was younger, and will continue to realize how ignorant I am as I live.
The same process will probably happen for any AI.
What about just learning based on the entire web?
I think you're using the word "unbiased" to mean "heavily adjusted for US centric views on racism and sexism" which isn't what the word really means.
If you train an AI on everything written - all books, all web pages, all newspaper articles etc ... a not impossible task these days - then you can argue you're as close to bias free as possible.
However a small number of AI researchers don't like the results they get when they do this, because the AI learns the world that truly exists instead of the one they wish would exist. But that's not a bug in the software. It's a bug in the researchers.
Becoming less ignorant is a life-long process, as far as I can tell.
I'm just somewhat more aware of my capacity to over-generalize based on my individual experiences, and allow biases to settle in my subconscious in the form of racism, sexism, ageism, or what have you. I try to find where I have internalized these thoughts so I can do a little internal reforming. I also pay a lot more attention to "_______ is/are _______" statements, as they are almost always over-generalizations.
Being aware of this mental mechanism doesn't really stop me from doing it though. I know I'll tend to cluster experiences to create generalizations indefinitely, as it seems to be an evolved trait (makes sense for survival reasons, to assume the worst until you find evidence otherwise) - even though it's not perfect.
What we should be aiming for instead is not the total lack of bias (an impossibility if you are to learn anything at all from the data), but _explainability_. A system must be able to show me the set of statistics that led to a particular decision. I.e. women in Los Angeles area are known to be much safer drivers according to this subset of data, so we offer a lower insurance rate to women in Los Angeles area, to use just one hypothetical example. Such systems are a rarity nowadays, and research into them is relatively sparse.
It seems a lot of people don't realize that there's likely a large disconnect between what people think they're training an AI to do and what they're actually training it to do.
If they're lucky, the two are close enough that the learning program will be useful for what was hoped, and if they're unlucky it will look like it's useful and correct but will include behavior learned from the training data that was not predicted and which taints the result.
IE, if you train an AI with what you think is unbiased data but which includes a subtle bias, then the results you get from it may be biased... and the bias may be so subtle it's undetectable except by another AI, which is a problem if you assume that since your training data was unbiased, your AI must be unbiased.
Putting it another way... garbage in, garbage out.
If ML is trained on biased data where an optimality exists only at an unbiased solution (think a shaped reward function in RL to disincentivize class bias or something similar), then no, the ML is most certainly not supposed to echo the bias they've been fed.
On the other hand, if an optimal solution to the ML optimization problem exists at a biased solution, ex. a naive prediction of if a nurse is a man or a woman, then yeah, we would say that it was supposed to echo the bias.
All too often I feel people forget that ML is just an optimization problem. What you're trying to optimize really matters - generalizing about all ML without talking about the optimization problem in question is pointless.
As a trainer of an AI, simply looking at a picture and saying that it's a cat, and then telling the AI that it's a cat, is biased. You are automatically assuming you are correctly identifying it as a cat, and teaching the mind of an AI to follow your presumption.
The only way to make an AI less biased (but still biased), is to diversify your labeled training data. And by diversify, you need to diversify the presumptions on the labels of the data themselves. What is the confidence among several people that this picture is of a cat? Rather than one trainer.
This is a simplistic example. If you scale it to more complicated tasks, such as understanding natural language, body language, intent of users, or anything related to ethics, you'll likely end up with a crazy AI. And yes, I mean crazy. As in, simultaneously holding very seemingly opposing understandings of reality and spitting out data that reflects those oppositions.
If you've ever had more than one authoritative figure in your life give you contradicting advice to another - one parent holding differing beliefs to another, or one teacher to another in school, you may notice yourself starting to qualify your teacher's competence as a factor in the fidelity of their teachings.
A scary, but simultaneously cool thought: An AI that questions my capacity to teach it, while I'm teaching it.
Bias will always remain an element in the equation however.
One AI will likely have interacted with different trainers or data, and create contradicting understandings of reality to that of another AI.
I would say that a less biased (wiser) AI, would learn that this duck/rabbit image is, itself, questionable, and can be perceived both ways (high confidence values for both identifiers).
Me: Is this a duck or a rabbit, AI?
Padawan AI: It's a rabbit (51% confidence as rabbit. 49% as a duck).
Master Yoda AI: Rethink your question, you must. Yes, the most correct answer is.
> Passages from the Life of a Philosopher (1864), ch. 5 "Difference Engine No. 1"
A very long time indeed!
I think the key source of confusion (both in 1864 and today) is that non-technical people hope that the machine has some wisdom correlated with its arithmetic prowess. Charles Babbage's mechanical calculator did not have wisdom. Even when you give a bleeding-edge algorithm Big Data and lots of processing power, it doesn't yet have wisdom. But hope springs eternal.
I have worked closely with BERT and other language models for our startup. There is a disconnect between the capabilities and the state of AI research today and the public's expectations and imagination. There's also fundamental confusion between scale and intelligence. That is the public largely believes the efficacy of many "AI" models on large scale problems is equivalent to intelligence. That assumption is problematic in both overestimating the capabilities of these models and misdirecting the focus of critical inquiry.
Hopefully, there is more education around the limitations and capabilities of these technologies. We should be more cautious in apply them to usecases where there is high potential of negative consequences.
For instance, people have detected that AI models associate technical words with men more often than women and point to this as evidence of bias. I would argue that this is the opposite situation. There's social stigma attached to acknowledging differences between groups, so people have developed biased against acknowledging those differences. The AI on the other hand sees that 70-80% of technologists are men, and associates accordingly. So the problem isn't that the AI is biased it's that it lacks the biases we expect.
Now, there are good arguments why this bias may be a good thing. Unbiased does not automatically mean good.
A good example is the criminal justice system in the US. Minority communities are much more intensively policed, prosecuted, and jailed. A model will, without care, use race or poxies thereof to predict "criminality" without understanding or accounting for the bias inherent in the label. If that is used in policy, it runs the risk of amplifying existing social problems and injustices.
We alway have to ask: "bias for what?", or the conversation will be hopelessly confused.
There are some crimes, like drug use, that are indeed over-reported in some communities due to over-policing.
But there are other crimes, like homicide, that are nearly universally reported, regardless of where they are committed. And those crimes are much more frequent in minority-majority parts of the country.
Turning a blind eye to this problem is a disservice to those communities, because they are the same communities that are most commonly the victims of crimes.
In this case, the increased crime in these communities are not caused by their minority status, but rather from a multitude of other factors with historical and societal origins.
I'm not sure why causation vs. correlation is relevant here. When, say, someone is up for parole the judge will review their past criminal history, conduct in prison, whether or not they will have a support structure when they are released, etc. None of these factors are causal. A judge cannot point to a past crime, or to conduct in prison and say "this factor will cause you to reoffend". No, the decision is made based on factors that are correlated with higher rates of recidivism.
White Americans = 70% of pop. , $100 trillion in wealth.
African Americans = 14% of the pop, $2.9 trillion in wealth
African Americans in particular owned almost 10-12% of the land in this country (true wealth) and were promised more from the government in reparations (40 acres) not long ago, but discriminatory policies stripped that land away from them over 100 years.
Being in the USA for 150-200 years results in atleast 500k-1million $ in wealth purely due to land and home value appreciation.
The average black person has about ~500$ in wealth. This isn't a fluke, this was designed...
This is a country that criminalizes being poor more and more in many ways, so we can fall into the trap of revictimizing the underclass we created, this time using algos, if we are not careful.
When a judge is evaluating whether or not to grant parole, is he or she looking for a causal factor that will directly cause the convict to re-offend or is the judge evaluating the convict's situation with factors that are correlated with re-offending?
Showing causation with whole-group statistics is very, very hard.
The majority of the $500k-1m baseline in white middle-upper class wealth is in homes and land handed down as inheritances, this is not debatable. That property allows a certain amount of leverage to invest in education, businesses etc...
The ben & jerry's founder talked about this in vivid detail on the campaign trail with Bernie Sanders. If he was black where he grew up, no GI bill = no cheap housing = no appreciation of property/land over his childhood = no financial leverage to build his company.
I can take risks and fail without going bankrupt thanks to familial wealth, that is a tremendous luxury not afforded to the group I'm mentioning.
(And in any case, my point was just that it wasn't shown as causation in the stats you cited)
You are correct on that point. More than 50% of the 100$ trillion is held by the top 10%.
On the other point, my point is the first $500k to 1million of wealth was due to inheritance... not the total wealth of the 1%.
Follow up: At one point, any white male could move westward and get free land (whether indegeous owned or not). 40-100 acres after a century+ of property value appreciation is quite a bit of wealth. The $500k-1m number I use as a baseline for white wealth is very conservative.
This reminds me of the argument for the UBI. Instead of having thousands of tiny chairty programs sprinkled all over society, why not let the economy make itself as efficient as possible and then hand out the charity in units of dollars?
More specifically, the current breed of machine learning is correlation engine, the only strength these networks have is finding correlations in the absence of context or explanation.
Sure... and what about the chronic lead exposure issues in many of these areas, the effects of long-term overpolicing of minor crimes on family stability, and the impact of (lack of) generational wealth and education from being used as de facto or literal slaves for generations?
The 'minority' part here is a correlative factor, not a causative one.
Honestly, my preferred method of solving this would be to train the algorithm on a data set with all of the forbidden values included along with anything else the creator feels relevant - zip code, income, familial status, favorite sport, education - and then when running in production, against real people, don't give it the restricted information. Yes, you could theoretically extract race, gender and other protected stats from the information the algorithm actually uses in prod - but it has no incentive to, since a less-noisy signal is already provided.
For instance, suppose the optimal algorithm for your data set is some linear function of X, Y and Z - let's say X+Y+Z to keep things simple. X,Y and Z are all normally distributed variables, mean of 0 and the same standard deviation. Y has a 0.5 correlation with X, and a -0.5 correlation with Z. If not provided Y, your algorithm might come up with 1.5X+0.5Z as an approximation - extracting a bit of the signal for Y from the things it does have access to. It's suboptimal, but better than just X+Z. Unfortunately, Y is verboten - we're not allowed to discriminate on it, and this approximation ends up with results that track Y. So instead we train with X, Y and Z as inputs, so the derived model is X+Y+Z - and we can drop Y from that model in production, leading to a model that (while less accurate) shouldn't unfairly track Y.
The problem is that the output of those algorithms is used to drive decision making that has the effect of maintaining the status quo, by removing opportunities to change it.
I think your comment shows that you understand this. You accept that a decision may be correct, when measured in totally cold and statistical terms. But such decisions would not "change the status quo" and that would be a problem.
But that position is a deeply political one. Why should decisions at banks, tech firms, or wherever be deliberately biased to change the status quo? It's social engineering, a field with a long and terrible track record of catastrophic failure. Failure both to actually change reality, and failure in terms of the resulting human cost.
Injecting bias into otherwise unbiased decisions by manipulating ML models, or by manipulating people (threatening them if they don't toe the line), is never a good thing.
The whole point of using the algorithm was to make sure personal biases aren't impacting the decision. If we're going to alter the algorithm because we don't like the result, then why are we bothering to use an algorithm in the first place? Just use a human to make the decision. At least in that scenario potential biases have an identifiable source, as opposed to an opaque program that may have been made by engineers that deliberately tuned it to avoid any disparities because they think any disparity in outcome is fundamentally problematic
Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure? Or is the goal of the model to predict crime rates in the real world?
> Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure? Or is the goal of the model to predict crime rates in the real world?
The goal is to use the results of a model for something (pet peeve, I hate the use of "AI" to describe what are usually pretty standard statistical or ML models). The model you create, and how you apply/interpret it, depend entirely on what you're actually trying to accomplish or change with the results.
Depending on what that is, the kind of "bias reflection" we're discussing is hugely problematic.
For example, crime rates are not equal between men and women. If we force our AI to assign equal risk of crime to men and women then we will have introduced a bias that either under predicts the rate of male crime, or over prdicts the rate of female crime.
What matters is: What are the outcomes and consequences of active systems, AI or not.
For instance: How do the algo cope with derivatives of its own output being fed into itself as input at a later stage?
The reality is, truth is relevant and sometimes the truth is inconvenient. Tech workers may want to build an AI that measures risk of recidivism that produces uniform risks across race and gender. But the truth is, rates of recidivism is not the same across all groups. If we produce the desired outcome of equal reporting of risk, then the consequence is that men have their risk underreported to put them on parity with women, or vice versa.
The justice system should absolutely not use this system for any purposes, since justice is based on the circumstances of the individual case in front of it, not the societal statistics which apply in the aggregate but may not apply to that specific case.
And if an activist engineer deliberately biased the model to avoid indicating disparities in crime, then we will have sabotaged police's abilities to allocated resources. Hence, why this assumption that disparate outcomes are indicative of a biased model is a problem
We already know where crime is occurring. We don't need an AI model for that.
People aren't arguing about not using biased data, they're arguing that the model needs to be designed and trained so that the bias in the data doesn't affect the predictions in the model. And yes, that means deliberately de-engineering bias out of the model, which may involve introducing a counter-bias.
For example, you and others kept bringing up race earlier as a legitimate bias for criminal profiling. But socioeconomic status is far more correlated with propensity to commit criminal acts than race. A model of crime in LA based on race, for example, would assume that people in Ladera Heights are just as likely to commit crime as people in South LA because they have the same race...but Ladera Heights has a fraction of the crime as South LA (and several times the average income). Similarly, you would expect South LA to have less crime than the largely Caucasian Joshua Tree or Fontana...but both cities have higher crime rates than South LA, and for a period were some of the most dangerous cities in California. (Joshua Tree was the inspiration for, and original setting of, Breaking Bad. Fontana used to be known as the Felony Flats.)
What I am saying is that pointing to disparities in the outcome of these models to claim that the models are biased is not, on it's own, a reasonable conclusion. As you point out, people in Joshua Tree and South LA have higher rates of crime than average. So if our model flags people from these areas as high risk more frequently than other places, is our algorithm biased? If we deliberately make the model to produce uniform results across different locations because an engineer feels that it's problematic to have a model that produces different results between different geographic cohorts, then have we mitigated bias? No, that engineer intentionally introduced our own bias to make the model adhere to his or her worldview.
 See the 1999 documentary Blast from the Past.
But I think we're diverging considerably from the original point: that forcing an AI to produce equal outcomes despite unequal behavior in the real world is not the elimination of bias, it's the deliberate introduction of bias. If we have an AI that predicts recidivism rates, and we engineer it to produce equal predicted rates across all groups despite different between rates groups in the real world then we are deliberate introducing bias. The truth, regrettable thought may be, is that a magical AI that operates with 100% accuracy - the only people who it flags would have re-offended - is going to produce disparities because recidivism rates are not equal.
The blindspots end up being extremely problematic.
Statisticians refer to this process as "controlling for confounding factors." What really matters is what questions you're asking. Data is too often abused, not always intentionally, by people with vague questions.
If you're going to use a model trained on simple, biased data, to get, say, insurance estimates for a for-profit company, the model will probably successfully increase profits, so it was a good model.
On the other hand, if you're going to use the same model to help with sentencing, where your goal is to see equality and justice, then the model will do very badly, since it will punish many people for the community/skin they happened to be born in.
For instance, men commit more crimes than women. If we are building an AI that predicts risk of committing crime (say, estimating rates of recidivism) and we forcibly make it report equal rates between men and women then we will be creating a discriminatory system because it will either under report the risk of men or over report the risk of women in order to achieve parity. Engineering parity of outcome in the model when the real world outcomes have disparities necessarily results in bias.
What a terrible example. There are many other features being omitted that would predict crime rates. If anything, this is an example of enforcing your own bias on the model by not including all relevant features.
Problems like violence have multiple causes, but are widely understood to be linked to problems of poverty, inequality, and marginalization. Violence is also self-perpetuating through social networks. When an incautious user of statistical tools fails to investigate the causal story behind such findings, they're going to get the wrong answer. When a policymaker acts on incomplete or misleading findings, they're going to make the problem worse.
*Unless you're arguing that it's the fact of being non-white which makes people more violent, in which case I have nothing to say to you.
This is not how I interpreted OPs comment at all. I just read it as a note that data shows correlation between minority communities and crime. Minority is a flexible, relative term, to my understanding - which, notably in recent years is frequently attributed to "non-white", but over time has been attributed to many different groups of people (caucasian as well).
Interpreting it as meaning "non-white" in this context is yet another good example of our predisposed biases and relativistic understandings at play, I suppose.
Not to harp on you - just noting the differences in our perceptions.
At the end of the day, there's enough real, vile bigotry out there in our society, I think it's important to be extra clear when discussing topics like this.
I agree, there is are many negative biases in the real world. More so, I'd argue, on the internet, where people feel safer at a distance to state disagreeable opinions.
It's worth noting, especially when we are discussing the nature of how an AI learns.
As I wrote in my original comment, there are potential justifications as to why we might want to introduce bias in our machine learning models to prevent the from identifying things we don't want. But the point remains: building a model that, for example, associates technology with men and women at equal rates despite the disparity that exists in the reality is not the elimination of bias. It's the deliberate introduction of bias to make the model produce a result that is in line with what we consider a more ideal worldview.
I would argue that the case is precisely the opposite. I was literally talking to someone from Colombia yesterday who was complaining to me about how all Venezuelans are lazy criminals that are stealing their jobs, when in an American context, people wouldn't even distinguish between them -- they'd both be 'hispanics' or 'latinos'.
That person took a difference caused by a temporary political and economic situation and applied it to a population of people as an inherent property. I think that is actually the way that _most_ people think, and to make people think otherwise requires a lot of education.
(Not saying wether it's "true", "false" or 78% likely here)
Many people have been repressed their whole lives. It doesn't get better by digging heels in, but sometimes compromise and not trying to win all wars, may turn out improved outcomes.
A big problem is the whole dull-minded "right vs left" duality. Tribal mentality that replaces actual issues and solutions.
With enough education, you can make most people think just about anything.
That is true for people in the western world that have been heavily dosed with the ideas coming out of the western social sciences in the past 100 years or so, and not really anyone anywhere else, ever.
You're assuming that that disparity is not itself a result of bias in any way.
Again, there may be justifications to deliberately introduce bias in the model to produce higher rates of association between women and technology than would normally occur if the creators of the AI so desire. But this is not elimination of bias, this is deliberate introduction of bias to make th model match the creators' worldviews.
This is incredibly reductive - we use models for MANY purposes - prediction, inference, description, evaluation. Until you outline a specific use case for the model, understand where the data its trained on comes from and how it may or may not reflect reality, and think about whether the phenomena which you're measuring are important/sufficient for your application you CANNOT possibly justify your broad claims.
Say you have an AI that predicts probability of re-offending for prospective parolees. If we observe that this system is flagging men as high risk more frequently than women, then this is not indicative of bias.
But!, one might say, men are socialized to be more violent and are more frequently recruited by gangs. These factors are being absorbed by the AI. Yes, they are, and this does not make the AI biased. The job of the AI to predict risk of re-offending in real world conditions. Not risk of re-offending in a hypothetical world where everyone has the same environment and experiences.
If we were to manually adjust the model to make it display equal risks for both men and women then that is not elimination of bias. It is deliberate introduction of bias to make the output fit our ideal worldview.
Already you've made one of the most common mistakes in this area: confusing the label for the latent truth. We don't measure "re-offending", we measure rearrest and re-incarceration, which as I've discussed elsewhere are biased (in the sense of "unjust").
> If we were to manually adjust the model to make it display equal risks for both men and women then that is not elimination of bias. It is deliberate introduction of bias to make the output fit our ideal worldview.
Again, you are using the word "bias" in a vague and meaningless way. In this context, our goals are constrained by the law - in the US we guarantee people certain rights, and prohibit certain kinds of discrimination. In this example, it would indeed be wrong (and illegal) to punish men more harshly just for being men, since gender is a protected class. In the same way, it would be wrong (and illegal) to punish black people more harshly just for being black. In my state, whites and blacks use cannabis at equivalent rates, and black people are arrested 8 times as often per capita - an algorithm looking at arrests will see that and make the problem worse.
You've staked out some sort of naiive and bizarre "AI Purity" stance which completely ignores how such models work, and which misunderstands how we use models to learn things and solve problems. You're also mixing up different definitions of "bias". Some facts which you might find interesting:
* Penalization/shrinkage, which is ubiquitous and incredibly useful in many predictive models, is the deliberate introduction of bias to improve prediction performance
* Adjusting for confounding in regression/classification type models is one way of accounting for or removing bias in a statistically rigorous way - as you (imprecisely and inaccurately) say, "modifying the model to make it display equal risks". This allows us to measure effects, conditional on known risks or existing patterns.
* All statistical and machine learning algorithms suffer from sampling bias, where the data we observe doesn't match the reality in important ways - a fact which you're completely ignoring. If a model reflects the bias in a bad sample, why on earth should we accept that?
If all it's doing is associating being a man with a higher risk factor without taking into account anything else in that man's life then, yes, it's being biased. "Being a man" in this situation is a correlation to, not casuation of, actual factors making a person more likely to commit crime.
This is a real problem that has been happening for years in Florida: the automated risk assessments used for a variety of decisions, including bail and sentencing factors, labels black people as inherently being more likely to reoffend than white people, even after controlling for actual crime rates and recidivism rates. https://www.propublica.org/article/machine-bias-risk-assessm...
Briefly, (and simplifying for clarity) it worked like this: the ML algorithm scored the risk of criminals reoffending. It gave a "high risk" score if someone was 80% likely to re-offend, and a "low risk" score if they had a 20% chance. Out of 100 blacks and 100 whites, here are the scores the ML algorithm gave, and how many ultimately did re-offend:
100 black criminals
* 50 "high risk", 40 re-offended
* 50 "low risk", 10 re-offended
100 white criminals
* 10 "high risk", 8 re-offended
* 90 "low risk", 18 re-offended
On the other hand, the failure mode disproportionately punished the black criminals: 10 were "high risk" who never re-offended, while only 2 white criminals were. Meanwhile, 18 "low risk" white criminals did re-offend while only 10 black criminals did. So the score was more strict for some unlucky blacks and more lenient for some lucky whites.
However, a key point is that because the underlying re-offense rates were different between the populations (50/100 for the black criminals, 26/100 for the white criminals), the algorithm could not have done otherwise. That is, given some n% re-offenders, if you have to fit them into 20% and 80% buckets, your "high risk" and "low risk" counts are mathematically fixed. In other words, it wasn't the ML by itself that was the problem, but the COMPAS score that it was trying to compute that had these issues inherent in it.
I think this is a good example where ML wasn't biased (at least, not anymore than reality), but where people were too eager to turn to it for a poorly considered project. By wrapping up important questions in high tech algorithms, it's too easy to fool yourself that what you're doing is the best thing you could be doing and miss problems in the fundamental framing.
2) The whole point of systems is to impact the world, not replicate already recorded history or escalate problems.
Can the ML algo predict what the outcomes and consequences of its outputs are, and assess that? No?
Would you prefer an outcome where half of the white men were also marked as high risk, but with 16 reoffending? (to give the low risk pools the same recidivism rate) Now instead of unjustly marking ten black men and two white men as high risk, you're doing so for thirty-four white men. Is that somehow less discriminatory?
Because in the end, the choices are "accept that there will be a correlation between the outputs and race", "use no system that produces estimates or imperfect outputs", or "explicitly discriminate based on race to remove the race-result correlation". That's it.
I agree that an AI shouldn't use race or gender as an input. But we should not surprised when AIs have disparities when predicting risk of committing crimes, when there are disparities in the rates of actually committing crimes.
Perhaps you think there are certain feedback loops that need to be broken (your model is usually a static representation of the reality), or perhaps you prefer low rate of false negatives / false positives, or perhaps if a model is uncertain you would like to defer to a human.
If a model predicts high probability of re-offence, you might decide to delay an action as an example, or re-examine the case in more detail.
I get a feeling that machine learning practitioners are not properly trained to recognize such subtleties, everything is a binary softmax output these days :)
EDIT: This seems to stem from the obsession with purely discriminative models that either directly model a binary function y = f(x) or model a function that returns some kind of score, which is later compared to a threshold y = f(x) > theta. Neither of these lend themselves nicely to this conceptual separation between modeling and decision making.
Just because bad data may tend (in hindsight) to yield inaccurate decisions, or decisions based on faulty data, doesn't promote data as the main driver behind sound decisions. That would just be another belief system, and we don't have all possible data or omniscience to prove it!
Not a rhetorical question. Reading through the comment thread I'm slowly shifting more towards being on the fence, as opposed to having a "don't hack the AI" position.
But rather than hacking the AI, why not just get rid of the AI? What is the point of the AI in the first place if you're going to hack it to get the results you want anyway?
This is not a meaningful phrase - it adds literally nothing to the conversation but confusion. To bend over backwards to give you a reasonable answer: when you're interested in conditional effects.
Let's say you're interested in the risk of cancer associated with alcohol consumption. People who drink some are often found to have lower cancer rates than people who don't drink at all. Reasonable models adjust for wealth/income - estimating the risk of cancer association with alcohol CONDITIONING ON wealth/income changes the picture; the positive association is clear.
Adjusting for confounding is "removing bias", changing the effect estimates.
In a predictive context, using ensembles, NNs etc, the problems don't go away, they're just sometimes harder to detect (and they're dressed up in sexy marketing-speak like "AI").
Repeat after me three times:
"AI is not a magic truth telling oracle"
"AI is not a magic truth telling oracle"
"AI is not a magic truth telling oracle"
Your cancer example doesn't answer the question, which is a pity because I mean it when I say it's not a rhetorical question. I honestly want to understand your point of view better. In the cancer example, we wouldn't go in and filter the training set to force the AI to think that rich people have higher cancer than they actually do, etc. But that's precisely what it seems like people want to happen with AI: they don't want us to condition on wealth/income, rather, they flat-out want us to filter the training set to force the AI to think group A and group B are equal when that's not what the data says.
Isn't "biased in a way we don't expect", itself, a bias?
Unless I'm misunderstanding you, you're saying the AI is biased, and people are biased. Unbiased is an AI that will not miss an outlier.
My field is medical imaging. The most difficult thing to get the AI vendors that come calling on us to understand is that being able to recognize well known patterns in data is of limited use to us. That's what the radiologists at remote sites do all day. It's being able to recognize pathologies that 99.999% of rads would definitely have missed that has value. And the AI's always fail this test.
I've come to believe that "bias" is the primary reason these AI's perform so poorly. They are basically "biased towards the known stuff" in layman's terms. At least in my field, we need AI's that are able to make connections that humans cannot. I've been going to RSNA for decades now, and I would say we haven't seen a single AI that could be said to pass this kind of a test yet.
This isn't quite true as it has been successful in limited scopes since the 90's (i.e. well before the existence and rise of deep learning). See for example microcalc TP rates on CAD breast screening that was beating average radiologists back then, but not on general screening. This is somewhat specialized though, and there are important constraints to screening also, as opposed to diagnostic radiology.
But you are right in an important way - it isn't a feature for the current throw-a-ton-of-data-at-a-deep-model approach that a lot of "AI companies" are built around, and I agree they are mostly knocking on the wrong doors in medical imaging.
Importantly, if you are trying to replace people at a relatively simple task and make it cheaper your approach will look much different than the case you describe, where you are trying to learn unusual configurations and long-tail sort of things.
ML in medical imaging is complicated by a lot of factors, but people that think a little transfer learning and a few hundred case reports is going to get somewhere useful are being incredibly naive. Without exception, all the promising work I have seen has involved painful and expensive labeling and a lot of modeling and pre-processing effort.
AI, at least current AI, cannot do this. I'm really confused at the way you say AI perform poorly because they're biased towards "known stuff". We train AI on known data. AI being biased towards "known stuff" is fundamental to the way our current AI technology functions.
This is a poor result. Would you argue that the model should not be adjusted?
Yes, if people feed in a bad data set the results are going to be bad. Like how a pentagon project tried to use AI to classify Russian vs NATO vehicles. The photos of the former were taken on a sunny day and the latter on an overcast day. The result was that their model was just detecting the average brightness of the input.
But what happens when the data is valid, but people don't like the results because it doesn't match their worldview? Altering the model to match one's world view is not elimination of bias, it's deliberately introducing bias.
So when we're evaluating these systems, (again, systems that always fail due to bias), our only concern from a diagnostic viewpoint are patient-centric metrics. The most important being improving outcomes.
By way of illustration, let's suppose a woman goes for a mammogram. Now a radiologist and these AI's can all identify tumors in the resultant DX study at about the same point in said tumor's development. This leads to our current 2 and 5 year survivability rates. Consider thoughtfully here, the name of the game is not liberalism, or conservatism, the object in our game is to improve those survivability metrics.
Here's the thing though, due to the fact that the AI is biased towards things we already know, using it doesn't allow us to identify problems any earlier. Being able to spot issues earlier would improve 2 and 5 year survivability metrics, but current state of the art AI's won't do that. This is the problem in a nutshell.
The problem is not that the resultant data does not fit the "narratives" of our radiologists.
I'm not certain you understand that this is not a political argument. The AI's that vendors are attempting to apply towards problems in this domain simply don't improve healthcare outcomes. If they did, we would aid in deploying them. Without hesitation.
Now my assertion is that these AI's fail to improve healthcare outcomes due to the bias inherent in the data they use to train on. Again, that's not a political statement. It's just fact. The AI's seem mostly trained to internalize, and replicate as output, all of the well known patterns in the data. This is why these AI's are of severely limited use.
Some see this fact as problematic, and advocate that we should deliberate bias the results in order to create a more equal outcome. And the arguments of whether or not we should bias our models are frequently political in nature. The complaints that AI is biased is frequently because it produces results that some people find are problematic, but the underlying pattern it's identifying is true. It's really that the AI matches patterns without our cultural sensitivities about what topics we're supposed to tiptoe around.
The AI results are the way they are, because the data that is input tells the AI to spit out those results. In other words, the AI isn't telling you anything that a human didn't tell it to tell you. The valuable insights are insights that no human saw coming.
So let's take your industry as an example with which you can maybe better relate. Humans give the AI data that says that africans and men have higher recidivism. The AI then spits out results that says that africans and men have higher recidivism. Basically spitting back the information that we already gave it. OK, so far, so expected.
Now here's the thing, the valuable insight would be for the AI to suss out some counter intuitive result that no one would ever have gotten. The result that this particular african, or this particular man will have lower recidivism. That's the valuable insight. That insight saves the enterprise money that the enterprise would not have saved otherwise.
My point is that current AI's can't do that, which makes them not terribly useful. At the moment, they just parrot what any human would tell you anyway. So why pay 2 mil for that? Can you see what I'm saying? This problem is even more acute in diagnostic imaging, because the AI is not improving outcomes. So what, exactly, would we be paying 2 mil for? It's absolutely problematic for me to go out and put an enterprise on the hook to the tune of 2 mil for a black box that sits in a room, looks cool, and tells us things we already know. (In fact, the box would tell us things we told it to tell us.) We can save the money, by a 30 dollar digital recorder, and have it play the doctors' diagnoses back to the doctor after every patient meeting if that enterprise is really that much into the parroting.
My problem with the bias in the AI's I've seen is not political, it's practical. This stuff, as it is, is not terribly practical because it is too biased to offer any truly unique actionable insights that add value which was not there previously.
The allegations of bias in AI are frequently about inter-grouo differences in things like job advertisement, credit ratings, parole decisions, etc. which are not explicitly political but the allegations of often political in nature.
Sure, there's not much controversial about predicting tumor growth. But are the allegations of bias in AI about models predicting tumor growth? No, they're usually about the topics I listed above.
Suppose an AI does detect that men are more technical, on average, than women. It would be a colossal mistake to then assume that our society should be a society where men are on average more technical than women based merely on the fact that historically that has been the case.
Would this sort of AI have entrenched past social evils, such as slavery, if in the 1860s it classified people according to the data available at the time? The fact that an AI classifies things accurately given the statistical data at the time should not be confused with giving a normative claim that that is the desired state of affairs of the world.
Neither. The problem referred to is that if the data sets contain biases then the machine learning will learn them.
Sure, some may point out that biases in other parts of society may be part of the reason why tech is 70-80% male, but the AI itself is not biased. If we were to engineer the AI to produce equal results despite unequal input we are engineering biases into the AI, not removing them.
There are few meanings of bias that are important to keep clear about when discussing things. (1) There is the statistical sense of a biased estimator - one with a consistent trend in it's error. (2) There is the notion of bias introduced by data sampling. No matter how perfect your algorithm is, if the training data is a poor sampling of the general population you are targeting, you are likely introducing systematic bias (for example, early face detection approaches had best performance on Caucasian, male, college aged faces - people were using the data it was easiest for them to collect. Finally (3) if the data you have access to has encoded a systematic bias, even if the first 2 have been avoided you are at best able to reinforce that bias.
Here we are mostly talking about latter one, and problems being encountered it (and a bit of 2). This is exacerbated by a combination of machine-learning and AI people being fairly unsophisticated about data on average (as opposed to data handling), and popular techniques these days (specifically, deep learning) de-emphasizing feature design making it harder sometimes to see what is happening.
Nobody serious I have seen is advocating engineering outcomes into AI to adjust outcomes.
For the sake of argument let's assume there is in fact an over representation of men (compared to women) in technology relative to capability and desire. And that we are designing an AI to make or aid hiring decisions for entry level jobs. The answer then is not to engineer in a quota for women applicants, but to remove gender entirely from the training and evaluation inputs. This would give you exactly the desired outcome, no?
You refer to the type (2) bias issue if we systematically under represent women in tech in this case, which would be a problem. However the article is focused on the type (3), which is not merely an issue of what is in the data set, but what you are trying to do with it.
The deep issue here is that deep learning approach of throw everything at the inputs and let the network sort it out will capture both (2) and (3) types of biases whether or not we are aware of them. At least in areas where there is very objective proof of potential problems in the historical data (e.g. redlining impact on mortgage decisions) we could get ahead of it and normalize the inputs. But what about areas where it is less clear cut or more contentious?
There is the fact that humans fail in exactly the same way. If I'm making the decision to hire, it's very difficult for me to avoid my own biases. I think one of the things people are concerned about with applying ML techniques to things like this is it gives the appearance of being more likely to be unbiased, and potentially provide cover from those who would like to benefit from it.
I suspect the real answer is to start thinking about these things the same way we have learned to thing about security and cryptography. In other words, for serious work a system is not considered ready for prime time until real professionals have tried to find the weak points and break it.
It is also worth noting that this is not necessarily a problem so much as an operational feature of the approaches one should be aware of when designing and using them. And in some applications it probably a very significant problem.
 one of the interesting thing about this and similar problems is how completely predictable it was, and how surprising it was for practitioners to discover it. In this way the ML community needlessly recapitulated lessons learned by other disciplines decades earlier.
If AI's going to be useful it would be able to find out deeper underlying things, faster than we can ourselves. If bias exists in people today, it's going to skew outcomes, so looking at stats on current outcomes alone isn't going to tell you much. An AI that's blind to conditioning and upbringing and everything else that's part of society is not a particularly useful AI, you could replace them with any average person off the street's naive "common sense."
If only... An AI with common sense is an AI researcher's dream come true.
(As a philosophical question, why do we expect the future to be like the past? Well, because that is how things have worked in the past. And thus we arrive at a nice circular argument. :)
So far we aren't doing a very good job of coming up with ways to live in harmony with AI, instead of in an antagonistic relationship. We can do better than this.
Regardless, the cause of the disparity is not relevant to the discussion at hand. The real world population of people involved in technology is disproportionately of one gender, so an unbiased machine learning model meant to associate terms with genders will associate technology terms with men at higher rates. Sure, some may claim that the reason why disproportionately more men go into technology is because society is biased. But the AI's role was not to model a hypothetical world with no biases. Its role was to model the real world.
As I wrote in my root comment, some may see value in biasing their models to achieve results that fit what they see as an ideal worldview. But the point remains: to do this is to deliberately introduce biases, not remove them.
Of course, simply saying this triggers some creativity: You could generate artificial data and train on that. However, the problem is then to generate "unbiased" artificial data, and somehow make it useful! ;-)
So who's vision should be realized, then becomes the problem, or solution for some subset of people.
It might be a subtle difference, but "My mom is a doctor" is as correct grammatically as "My dad is a doctor". If a language model is assigning a much lower probability to the former, then it's modelling language _use_ but not language _structure_.
So it's not doing the job it's supposed to be doing.
I hope that AI can be used as a mirror for the engineers working with it. AI allows amplification of the subtle assumptions we make in design and hopefully that amplification leads to better understand and appropriate measures to reduce bias.
So would any reasonable human. I think the real problem these advocacy groups have with AI models is that they're not biased. They reflect the real world based on data and evidence, rather than conforming to progressive dogma. I fear that instead of using AI to overcome any wrong assumptions we may have, we're just going to get AI diversity officers to "correct" models that draw any uncomfortable conclusions.
That's the entire point of the article, our biases, as reflected in the world, are learned by models. Consider the following:
* More computer programmers are men than women (descriptive statement, no problem)
* A predictive model correctly identifies that more computer programmers are men than women (a prediction based on observed data, no problem)
* A recruiting agency uses a predictive model to recruit computer programmers. Due to the way the model was trained, it excludes qualified women (not OK, a clear misapplication of an algorithm)
Feel free to try this thought experiment in other contexts where existing biases can be amplified through algorithms.
It may well be the case that the gender split is 80% / 20% male to female (as is roughly the case today). It may not. However, the left-leaning zeitgeist opinion would seem to be that this outcome is impossible, and current observed differences are only due to systematic oppression. The right-leaning zeitgeist opinion would seem to be that this outcome would make sense.
I tend to think that the left-leaning opinion on this is so wrapped up in double-think that it can't even understand itself- it tends to argue too much in favor of "biological differences would mean permanent, uncorrectable injustice, therefore it is impossible that biological differences exist."
Ask the same question in 1965 and get the opposite result.
There’s a lot of motivated reasoning by men that their innate characteristics are selected for computer programming. There’s a lot of hostile behavior by men towards women who are in computer science. You can view this like any other resource scarcity turf protection gambit.
It’s impossible to separate any apparent belief in the natural order of male dominance of the field from this behavior, so to draw conclusions from it is extremely dangerous.
To be fair, though, a LOT has changed about the nature of computer programming since 1965.
Has it though? The Von-Neumann-architecture was already a thing back then, programming languages (BASIC, Fortran, ...) already had many of the things you are still using today (such as for-loops) and any algorithm and data structure thought up back then is still perfectly usable today in most modern languages.
Sure, the whole tooling and library situation is not comparable to back then, but the fundamentals haven't really changed.
Consider these things:
- Binary search trees (1960)
- Linked lists (1955)
- Quicksort (1959)
- Hash tables (1953)
Looking back I rather have to say I am not impressed with advances in practical computer programming since then. The only major change was the introduction of type systems and OO imho, though these were technically a thing already back then too on an academic level.
When you examine the fundamental nature of programming though, not much has changed. In fact things have become significantly easier over time.
Really? Reference, please.
Women were litterally employed as computers ca. WWII. When the transition was made to programmable machines, they were in majority the first ones to use, operate, program and design algorithms for them. The field was regarded as uninteresting, tedious, and not real engineering by their male counterparts.
That is entirely your opinion. Framing biological differences as turf protection is laughable in my opinion.
In particular, Eastern Europe of all places is relatively egalitarian in men-vs-women in programming.
For example men on average have more muscle mass than women. If you found a random population of women that had proportionally higher muscle masses compared with their male peers, that doesn't discount the biological fact.
One that is refuted by the history of computer programming, which used to be a female-dominated field back when programming was significantly harder than it is today.
Please note that this doesn't discount any actual gender bias that may exist. The biological claim is only that men and women in general have a given set of personality traits. That isn't controversial science. It also isn't controversial because it doesn't claim that men are better engineers or better anything, just that more in general may naturally gravitate towards certain activities.
Careful on misframing the thought experiment: no such assertion was made. The assertion through this thought experiment was that biological nature has an effect, no more no less. The degree to which the effect matters is uncertain, but it is reasonable to expect it does play a role. The reason biological explanation is mentioned at all is because many seem determined to think that no such effect is possible, full-stop.
If you agree that biological differences has a non-zero effect on preference, then there shouldn't be anything to disagree with here.
"It may well be the case that the gender split is 80% / 20%"
That's a preposterous hypothetical. You're also incorrect in your implication - the existence of minor (and as yet unsubstantiated) fundamental differences in propensity to pursue technology in no way precludes the (well established) social biases and inequities that result in the same. Neither would it prevent these issues from being reflected in statistical models.
Thought experiments about magical perfect AIs are completely and totally irrelevant to this real problem which everyone who uses models should be aware of.
To get on my cranky, gatekeeping high horse, I find it super frustrating to see people who call themselves data scientists misunderstanding this problem. I teach this in introductory statistics to non-technical audiences. I'd hope for better from engineers and computer scientists.
Thought experiments are relevant because they can tease out our moral intuitions. Calling it a preposterous hypothetical is not useful.
gbrown- I think that you have read my original post with a hostile interpretation. I left open the possibility of biological differences in preferences OR zero biological differences in preference. You act as if there is scientific consensus that biological difference in preference is impossible, and that all current differences in outcomes are based on culturally-imposed biases. This is not the case.
>You're also incorrect in your implication - the existence of minor (and as yet unsubstantiated) fundamental differences in propensity to pursue technology in no way precludes the (well established) social biases and inequities that result in the same.
I or other posters did not imply this. You are conflating my hypothetical case (where cultural bias was made irrelevant as much as possible) with real life. Real life does have bias. The reason the hypothetical was presented was for comparison.
I've got a couple (low-ball, civil) questions as a sanity check-
1. Would you agree that men have a stronger biological preference to be warriors than women?
2. Would you agree that there is a greater cultural expectation for men to be warriors than for women?
I understand this, but I don't agree that your thought experiment usefully does so. You're essentially begging the question: "Well, what if this is the way it's 'supposed' to be?". My understanding of the science is that there's little actual evidence of difference in fundamental propensity to enjoy certain types of intellectual labor, but lots of evidence of the impact of socialization on the development of young humans. As has been addressed elsewhere in the thread, we have a directly relevant historical example: the distribution of tech labor was quite different when computing was seen as "womens' work". To beg the question as you have, in the face of evidence to the contrary, is unhelpful. One can easily imagine the same hypothetical form applied to other groups - minorities, language groups, etc. While you've couched your argument in terms of "propensity", the structure works just as well (or poorly) for "ability" - and there's a long history in science and society of laundering the latter into the former.
> I think that you have read my original post with a hostile interpretation.
You are entirely correct - both with respect to the framing of your argument, and your apparent understanding of the methods discussed in the article. As to the former, you can't expect to receive a generous response when you accuse those you disagree with of being hopeless left-wing double thinkers. As to the latter, I'm not trying to be dismissive or condescending, but this is literally my area of expertise. I'm also an educator, and it is my responsibility to fight against explicit or implicit biases which affect my students (and which affect who is likely to become my student).
> I or other posters did not imply this. You are conflating my hypothetical case (where cultural bias was made irrelevant as much as possible) with real life. Real life does have bias. The reason the hypothetical was presented was for comparison.
Drawing the analogy between your hypothetical "perfect" system (which I maintain is still under-defined) and the actual problems being discussed is itself a misleading thing to do. There is not a meaningful analogy between (AI/ML/Stat) as practiced today and "perfect" AGI systems.
> 1. Would you agree that men have a stronger biological preference to be warriors than women?
Maybe, though I actually think this framing is problematic. "Warrior" is a social role, and changes in definition and scope over history and geography. Certainly there exists physical sexual dimorphism with males tending to be stronger and larger, if that's what you're asking.
> 2. Would you agree that there is a greater cultural expectation for men to be warriors than for women?
Sure, I think that's reasonable, subject to the previous caveats. Without evidence, I don't know that I'd immediately assume this will continue to be the case as physical ability has less and less to do with conflict - especially over the long term as we continue to evolve physically and socially.
To conclude, my understanding is that we have strong evidence of social structures influencing vocation choice and success. We have little to no evidence that suggests our current social organization with respect to intellectual labor is driven by primarily biological phenomena. In this context, I believe that trying to invent hypothetical scenarios which would justify (by their construction) current inequalities, in the face of evidence to the contrary, is a harmful act. Not only is it scientifically unfounded, it's part of the cultural problem. This kind of discourse creates exactly the environment which would serve to push women away from tech.
"Given an unbiased society, would I expect an equal number of male and female bricklayers?" I would not.
"Given an unbiased society, would I expect an equal number of male and female biologists?" I would not.
"Nurses?" I would not.
For almost any given profession, I would expect an unequal number of workers by gender. To the degree that the observed ratio differs from what I would predict, there lies the surprise. Computer programming is a strange activity, and shares enough in common with other male-dominated engineering fields that I wouldn't be surprised that it is equally male-dominated.
One of the reasons I think that programming is such a tilted activity is that it is a really weird activity. By what strange circumstance did monkeys descend from the trees to formalize logical constructions into software? Given how strange it is to adapt biological creatures to this task, you would expect outliers to participate in the task- it is not unusual to expect the personality differences between genders to dominate in who participates, when the outliers are the only individuals who participate to start with.
Regarding the warrior example- I would argue that even if we all fought wars with robots, such that physical stature was irrelevant, men would still self-select to become warriors (robot-pilots) more often than women. On the OCEAN model, men are less agreeable than women, and across the most cultures of the world, men are more agressive than women. This will likely remain true for millenia.
I'm presenting most of my arguments here amorally. I think the reason you moralize my arguments is that they are construed as justifying existing oppression by gender. I do my best to judge individuals as individuals. I cannot pretend to deny the existence of larger patterns while judging an individual, but I can understand that they will influence my judgement no matter how hard I try. To pretend otherwise is blinding myself. To the degree that I broadcast these opinions, I hope to do so in a way that leads people to only judge other groups in accordance with the predictive power such judgements can actually afford, to hold such judgements weakly, and to always understand that variation between individuals is critical more than anything else. My manner of thinking does risk failing to fight the good fight against oppression- however, I think most injustices in the world are cases of individual conflict, and tinting the daily conficts I resolve on a daily basis with overtones of wider societal struggle does more to confuse than clarify.
My main remaining question to you- if you take my last paragraph in good faith- is whether you think that my manner of thinking can yield good results.
That's because the issue of men vs women in programming really is an all-or-nothing topic. Either you believe that a female is capable of being an equivalently skilled programmer to a male, or you don't.
Referencing biological differences always cascades to a question about the innate ability of a female to program. The best example I can point to is the infamous internal Google manifesto on male vs female programmers. If you read that text, it appears reasonable enough: the author thinks that there are biological differences, and these differences might lead to differences in programming strengths. But it is a wolf in sheep's clothing; as soon as you believe that there are differences, it follows that one set of differences must be advantageous to the other.
I can understand that there are biological differences between females and males, but I absolutely and vehemently choose to believe that there is no inherent difference in ability - females are 100% as capable as males when it comes to programming. Full stop.
Yes, this is double think. But I'd rather be a hypocrite than hold a secret belief that my biological sex makes me a better engineer.
I also disagree about the quote below:
>That's because the issue of men vs women in programming really is an all-or-nothing topic. Either you believe that a female is capable of being an equivalently skilled programmer to a male, or you don't.
When populations are on bell curves, this statement is nonsensical. Imagining the statement "Either you believe that a female is capable of being an equivalently skilled competitive wrestler to a male, or you don't" would be similar.
Does those things even play a role? If yes, how does that play a role? How large of a role? If it does play a role, why? What's important to women in career choice vs what's important to men? Why is that the case? Is it nature or nurture or both (and to what degree of each influence those choices)? Is it upbringing? Is it pressure from society? Is it barrier's to entry? And to what degree does all that play a role?
I see the potential for a much more nuanced conversation with this topic.
Saying women aren't capable of being a programmer or being successful in STEM is in my mind a garbage assertion.
Good, then, that nobody has said, or even implied, that.
"That's because the issue of men vs women in programming really is an all-or-nothing topic."
Was there subtext I missed?
A. The average male and the average female afford applying the same level of effort in becoming programmers, with no extraneous constraints.
B. Motherhood represents a non-trivial portion of the average female lifetime effort. Though not as large as in preindustrial times, where the norm was conceiving, feeding and raising ten children, most of them not making to adulthood, leaving little energy for anything else.
To the extent child rearing cost remains unequally distributed between males and females, we're going to see statistical disparities in occupations, especially in occupations with high skill acquisition cost. Or we banish motherhood, and go extinct.
Almost no one thinks this way and it's definitely not a popular enough opinion to qualify as a common spirit of our times, left-leaning or otherwise.
You are arguing against a very weak straw-man.
You would hope. But don't you remember the huge backlash from Demore's memo? Many people (or at least a very very vocal population) do think this way, unfortunately.
So no, not a strawman.
"controls against...." doesn't mean anything here - you're using the language of modeling, and attempting to discuss a concrete issue, but not connecting the two. Is your goal just to get an accurate prediction based on the status quo? Congrats, you've got a model that still requires nuance and understanding in application.
As part of a 1,000,000 year long project, 1,000,000 groups of 1,000 human babies (with different groups having different gender ratios) are installed on remote habitable planets and raised from birth by genderless robots, tabula rasa. They grow up and form languages and societies that last for 1,000,000 years. Robots are used to observe their choices and outcomes. The distribution of cultural traits is gathered as data, and cultures which create good outcomes for individuals in accordance with their preferences and for society as a whole are noted as benefit-maximizing. Additionally, the degree to which each society deviates from mean gender-bias characteristics is noted, and the degree to which these gender-bias expectations mold the choices of each individual to a degree greater than mean gender-bias is noted. This data is used to train the "sorting hat" robot which will be used in the example in my original post.
They are not biased if they represent reality based on facts.
In fact, the main issue is that while these algorithms fit reality as it actually is, the critics crying "bias" are complaining that these algorithms don't fit what they believe reality should be.
In other words, the algorithms output predictions that matches reality but critics argue that they should instead be manipulated to output results that suits their ideals.
But that has nothing to do with having a bias.
Would you mind responding to the third bullet point from the parent post, that discusses the dangers of naive and biased applications of facts that are based on reality?
No, it does not. All it does is point possible explanations of why reality is the way it is. Yet, reality is still reality. If models are expected to predict reality then their output will match what we observe in reality. If someone expects predictive models to not predict reality and instead output results that do not match reality and instead comply with someone's personal ideals then that's an entirely different problem: predicting someone's personal goals instead of reality.
The problem is not that the model outputs reality. The problem is that the model _incorrectly_ produces not reality, based on facts it _did_ learn from reality.
That is, there is a fact pattern in reality, but the model draws an illogical conclusion based on that fact pattern.
So, I'll ask again. Could you directly address this point from the parent post:
"A recruiting agency uses a predictive model to recruit computer programmers. Due to the way the model was trained, it excludes qualified women (not OK, a clear misapplication of an algorithm)"
Philosophically wrong or not, perhaps excluding those hypothetical women from consideration is the quote unquote optimal approach to allocate recruiting resources. Why this is the case is irrelevant; the onus is on society to 'shape up' and remove the reasons the group 'women' are disqualifying themselves from the fair model.
Of course assuming good faith in model design, training, implementation, etc.
 Ditto for men as kindergarten teachers - employment (e.g.) being a zero(ish) sum game, so long as more women are teachers, some other profession will contain an outsized amount of men (and vice versa). There are reasons why more women are kindergarten teachers than men, and any fair algorithm and effective algorithm would be expected to make prediction on these reasons or proxies thereof.
 Or there would be an even gender ratio.
If I can, let me explain using basic quantity supplied and demanded.
Consider the supply equation: Q_s = a + bP + cX, and the demand equation, Q_d = e - fP + gZ. Data only observes where Q_s = Q_d given P, X, and Z.
So what most ML models attempt to do is predict Q as a single fit function of P, X, and Z -- not entirely unreasonable, but misses the structure of the full system. So two things to consider:
(1) Implementing a rule policy based on "measuring as it actually is" may result in a feedback cycle (personally I think there may be one in common real estate property valuation "AVM" models). By recruiters relying on a biased model the class imbalance may actually increase over doing nothing.
(2) Normative ideals, like being blind to anything but skill and opportunity or equalizing background (two common political ideals in the US) cannot be appropriately evaluated. This is Judea Pearl's claim of Pr(Y|X) versus Pr(Y|do(X)).
So there's at least two sorts of biases. There's both biases inside the model, which is what you're thinking of when you use the word "bias", and also biases outside the model, which inform the model's design and construction in ways that cannot even be quantified with only the model's metrics.
Considering that the frequency of intersex people in a population lies somewhere between 0.05% and 0.07%, not accounting for a secondary trait such as whether a biological M or F happens to be intersex is an irrelevant classification error that has a negligible (if any) impact on a model's predictive ability.
And by the way, any model can be regenerated if any attribute left out is found to be significant.
Never trust someone who says they deal only with facts.
I can't figure out why you would use such an algorithm though. Most people who want to hire programmers look at their resumes from people who have self-selected as programmers (possibly on a job search forum) and interview them. If you decide there are not enough programmers you might hire people at random ("would you be willing to learn to be a programmer if paid?" "yes" "hired, you start Monday") - there might be factors to exclude some people because they can never be great programmers, but I'm not aware of what such factors might be - gender doesn't seem to be one though.
Of being a programmer, not being a good programmer. There are more men than women in programming, but the recruiters want to tell good programmers from bad ones, not programmers from nurses. The data clearly shows that a randomly chosen man is more likely to be a programmer than a randomly chosen woman, but that's irrelevant. The likelihood that a randomly chosen female programmer is good should be about the same as the likelihood that a randomly chosen male programmer is good, and that's what hiring managers care about.
In the real world, incompetence may not be a huge hurdle when selling complex systems. Also, these biases are invisible, until one thinks about them or spots them in the wild.
You're correct that there should not be any affect.
The problem is that many systems are designed such that there is an affect.
* Most programmers have 10 toes. Fact.
* A predictive model correctly identifies that more programmers have 10 toes than 9 toes.
* A recruiting agency uses the predictive model and exclude programmers with 9 toes, but why? What possible use could they have with that model?
The problem of recruiting agency that using biased predictive models is when the data is relevant for the recruitment purpose but create socially unacceptable conclusions such as:
* There are more computer programmers that leave after the first year who identify themselves as women than men. (descriptive statement)
* A predictive model use gender as data point to predict employee retention.
* An internal recruiting agency within a company has goals of hiring employee with high retention in order to reduce training costs and uses the predictive model. This result in excluding women.
The attribute being measured in the later case is relevant for the purpose of the recruiting, but it amplify gender segregation. When discussing AI recruiting bias it is this kind of problem we should focus on rather than looking at attributes which has no bearing on the purpose of recruiting. Attributes which has nothing to do with the goals of recruiting should not exist in a model for recruiting.
Conceptually, reinforcement learning seems very similar to biological positive feedback mechanisms. These mechanisms have guided human development, and cognitive bias is an innate aspect of being human so it stands to reasons that non-biological systems developed in the same way would exhibit biases.
It is not, however, a problem with AI, software, etc. It's entirely the fault of fools at the agency, and perhaps shysters selling them crap software.
Garbage in, garbage out.
If the goal is to learn whether programming associated with gender, it's unbiased. If the goal is to learn whether the definition of programming is associated with gender, it's biased. The usual goal of NLP is to learn language and (waves hands) meaning. The current tools do this by learning associations, but that's the method, not the goal. The critique is on the methods. The critiques are saying that it's led to trouble, which we'll have to correct for. We want to make it unbiased in the definition based sense, where programming is totally unrelated to gender.
The next step is establishing just how much of the male-female difference is explained by biology. I haven't studied that part of it. But the "null hypothesis", that all gender differences in behavior are due solely to social conditioning, is wrong.
 Consult https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778233/ and the studies it cites in the second paragraph.
This is interesting and important science, and you're right that clearly there are some gender differences... BUT
1. Given the history of (pseudo)science in justifying and worsening disparities between groups, incredible caution is needed - especially when making policy.
2. The existence of biological phenomena in no way discounts the strong social cues and pressures which shape young humans.
3. The existence of detectable, biologically originating average differences between genders in no way justifies the use of algorithms in such a way that individuals are punished for their group membership.
On HN, these threads always seem to go straight to "BUT THERE REALLY ARE DIFFERENCES!!", which in my mind is completely missing the point.
In terms of algorithms: What do you want your algorithm to produce? Gender parity? A distribution that matches what would be reality if biology were an influence but social conditioning weren't? Maybe what would be reality if biology were an influence, and people were encouraged to follow their passion, whatever it is, but scrupulously avoided recommending any particular activities to particular children until after learning what their passion was? ... Or do you want it to analyze reality as it is and produce decisions that are useful in this reality?
Ideally, the algorithm would be fed enough data about the people it's judging that it could make a properly fair decision. If all the algorithm gets is "Male, 22, minored in CS; female, 22, minored in CS", then the only thing it has to go on is the sex difference, which is probably a significant one, and then we can argue about what it should do. However, if it gets "Female, 22, minored in CS, majored in math and was top of her class, has spent 400 hours on personal programming projects / Project Euler / Topcoder / chatting about data structures with her friends, has been using the Unix command line on a regular basis since age 12", that looks like a much stronger candidate—and, importantly for this discussion, a male with those same qualifications would probably be an equally good candidate. I think that, if you have the same level of intelligence (perhaps specifically the math/symbol-manipulation areas), the same level of enthusiasm, and spend the same number of hours on the same activities, then you should get roughly the same results no matter what your sex is; sex is only useful to guess at the enthusiasm and hours spent when those variables are missing from the inputs.
That's the solution as far as I'm concerned: More input data so the algorithm can be fair. It'll consider all the people who've spent 1000 hours studying and practicing programming and achieved the same level, and it'll judge them the same, and that is fair, even if it happens that 80% of those people are men. Unfair would be if the algorithm only gets the superficial data and has to guess, based on group averages, on everything important, and penalizes the truly capable women because it's unable to distinguish them from the average woman.
2. Your description of algorithms and how they can/should behave is completely beside the point. Thats the problem with vague marketing language like "AI". The article isn't about artificial generalized intelligence, it's about machine learning algorithms used widely today. These are problems now, not hypothetical discussions about how we want some idealized "AI" to behave.
Among other things, he visits a doctor who has to decide which gender to choose for babies who obviously need some work ( sorry for the wording ).
The only conditioning I've ever seen any real evidence of is _against_ what would be called "social norms": teachers, parents and media work overtime to try to get girls interested in math, science, and programming and if anything actively _discourage_ boys from it.
Society shapes children even before they can speak.
I am also around a lot of nerds, and I also see plenty of reasons why young women opt out of nerdy circles.
For her 4th birthday, we had a reptile guy show up. She was the only girl child that would touch any of them, let alone let snakes slither around on her. She loved it.
It's not like we shut her off from girl stuff - her only cousin was an older girl that liked "girl stuff", and our kid looked up to her. We just always presented them as equal options for her.
Once she got into preschool things started changing. After a year, now there are boy toys and girl toys. Colors are gendered. Bugs and reptiles are icky. Dirt is yucky. And so on.
I'm not going to make a claim that any girl I've never met has or hasn't been affected by society to like "girl" things. But it would take an act of god to convince me our daughter wasn't.
It can also be used to make predictions and decisions.
The question "Is this person likely to be a programmer?" would probably accurately identify a man as more likely to be a programmer, revealing the bias in the data and our society.
But "Is this person suited to be a programmer?" sounds similar but is a very different question and we shouldn't confuse the two, especially when letting a machine decide.
That doesn't mean accept it, it means to implement the exceptions and a way to update them or understand sensitivities
Sounds neat, until you trust it with your life.