Hacker News new | past | comments | ask | show | jobs | submit login
A.I. Systems Echo Biases They’re Fed, Putting Scientists on Guard (oodaloop.com)
113 points by wglb 28 days ago | hide | past | web | favorite | 282 comments

Am I wrong in thinking that "echo the biases they're fed" is a pretty complete, concise description for what "AI"/ML systems are supposed to do?

One big problem is there are biases we're supposed to ignore, but we're not good at (able to?) ensuring this deliberate blindness in ML.

For example, we're generally not allowed to discriminate based on gender, but even without a gender field, ML will happily imply a gender out of its firehose of other data, and discriminate based on it anyway.

"AI finds that employees are less likely to stay at the company if their names end in 'a' and they score higher on 'cultivation' rubrics" This same AI then recommends hiring applicant 2 over applicant 1. Is this allowable?

A bit of an aside, but humans do what you describe and have name letter preferences: https://en.m.wikipedia.org/wiki/Name-letter_effect

We have lots of inherent biological biases like that.

Why would a name letter bias be biological caused? Or do you mean bias about biological factors?

Bear in mind "bias" is just a perjorative for "generalisation" or alternatively "lesson learned". AI algorithms are good at detecting patterns in data and bad at being politically correct. This is not a flaw of the algorithms, it's a flaw in people who can't accept measured reality and go into denial.

So an AI trying to hire programmers discriminates against female sounding names, because it's learned that this is correlated with success? Apply it to hiring nurses or primary school teachers and it'll probably do the opposite. This is only "bias" if you start from ideologically driven blank slate assumptions. Otherwise it's just common sense.

Biologically in the sense that we have affinities towards our own names or initials which studies have shown. Just a consequence of how our brains organize information.

Instead of someone drilling into your head that your initials = good, and others = bad via your environment, it happens through natural processes.

Most biases are just pattern matching which is necessary for efficient memory references and power quick on-the-spot judgements/decisions we need to make. People make too much of a big deal out of stereotypes like it's evil to hold them even though it's a basic function of the brain. It's entirely possible without constant vigilance to make misjudgements based on that but it will still happen to everyone, even the most socially aware people.

Are we suggesting that humans don’t do the same thing?

Kind of. AI/ML systems are supposed to learn things from the data they are fed to accomplish some sort of more generalizable goal or task. Echoing the biases they're fed is probably more akin to overfitting to their training data.

No, it's not the same as overfitting. It's nothing to do with the algorithm at all. The problem is with the training data, which contains patterns induced by the way the data was collected rather than the reality of what you want to classify.

For example, imagine that you wanted to train an algorithm to distinguish photos of dogs from photos humans. So you collect a bunch of photos of both dogs and humans and use them to train a classifier. You do all the proper cross-validation, bootstrapping, etc. to ensure that you are not overfitting, and you get really good results. Then, looking at the mis-classifications, you notice something: all the photos that are taken looking at an angle down toward the ground are classified as dog photos, and all the photos taken looking straight ahead are classified as human photos. It turns out that in your training set, most of the dog photos are taken at a downward angle while must of the human photos are taken facing straight ahead, because humans are taller than dogs, and your machine learning algorithm identified this feature as the most reliable way to distinguish the two groups of photos in your training set.

In this hypothetical example, no overfitting occurred. The difference in photo angles is a real difference in the training sets that you provided to the algorithm, and the algorithm did its job and correctly identified this difference between the two groups of photos as a reliable predictor. The problem is that your training set has a variable (photo angle) that is highly correlated with what you want to classify (species). This is considered an unwanted bias (and not a reliable indicator) because the correlation is caused by the means of data collection (most photos are taken from human head height) and has nothing to do with the subject of the photos.

I think you’re arguing semantics a bit. What you’re saying checks out, but one could say that overfitting was occurring but the test dataset distribution was not wide enough to catch it.

As I understand it, if it's overfitting, testing on a second random sample gathered in the same way as the training set should degrade performance. That's not the case here.

(Though maybe the term as used in industry is less strict.)

It's not semantics. Overfitting is a completely unrelated problem. Overfitting is an issue with the machine learning algorithm and how you're applying it and can be fixed by changing how you use the algorithm, while what we're taking about here is a problem with the training data. There is nothing that can fix a biased training data set short of getting new training data that doesn't contain that bias.

It would be like if your car was driving in circles and you called a mechanic to fix your steering, and they told you that the actual problem was that both right wheels were missing. That's not a steering problem, and no repair to the steering system will fix it. The only fix is to put new wheels on.

Overfitting is about being too precise because of the sample inputs, such as if "downward angle" + "brown blob" (one specific dog breed) + "leash" + "lots of green" (grass) was required to identify a dog. GP's example wasn't that, it was just identifying the wrong thing.

Instead of overfitting , it's more related to exploitation vs exploration. We see more men related to programming might be just that women are not given opportunities to explore the programming as a career.

When AI makes a decision, right now, people only uses the probability output. Hiring A has .6 probability while hiring B has .4. then we will hire A instead of B. However, if we consider the confidence intervals, the decision might not be that clear. Say +/- .5 to hire A but .2 to hire B. If exploration is considered too, very likely that we will give B a chance.

AI is in the realm of probabilistic decision making, while normal people don't follow. The bias is not from the training side. It's the decision making process incorporating AI should change.

>AI/ML systems are supposed to learn things from the data they are fed to accomplish some sort of more generalizable goal or task.

...which means that whether a model is "biased" depends on where and how it's applied. This is an important point that is missing from most discussions, articles and even research papers on the so-called "AI ethics".

Such a model is objectively biased regardless of where and how it's applied - it prefers X over Y, and that's bias by definition. The hard question here is deciding whether any given bias is ethical or not.

The test for overfitting is whether the model holds up for data outside the training set.

If the biases are consistent with other real-world data, then it's not overfitting.

The line between fit and overfit is arbitrary, though. What is happening is that the systems are revealing unconscious biases present in the more traditional systems the AI/ML are supposed to replicate, build upon, or speed up.

If the results are odious to us, it should be impetus to critically analyze not only the AI/ML systems, but also the underlying assumptions that they're built on. Instead, developers become defensive and cage-y about their processes.

If you don't want systems to have disparate impact, you have to be adamant about it in your design. If you think society is better off with systems that reflect preexisting biases, then fine, but be ready for the backlash.

At the end of the day, it really is up to what humans want to do with themselves. It's an opportunity to be truer to our intent, not a bug to be covered up.

> The line between fit and overfit is arbitrary, though. What is happening is that the systems are revealing unconscious biases present in the more traditional systems the AI/ML are supposed to replicate, build upon, or speed up.

When you say unconscious bias, you are kind of implying that the model learns something that is false. But more often it's the case that the model learns something true that we don't want it to learn. That's what makes the problem so hard, you are trying to hide the truth from a system you only half-understand processing data you only half-understand. There is a big risk the truth slips through the cracks if you aren't careful.

> When you say unconscious bias, you are kind of implying that the model learns something that is false.

It's more that the model learns something that is undesirable. It could be the case, for example, that the true thing that the AI learns is that your resume screening process tends to exclude women. This is true, sure, but it could lead to the undesirable outcome where the presence of a female name on a resume might be weighted heavily against the candidate.

Well, in many cases, they are indeed learning something false (or, to be more charitable, something that is not sufficiently true). For example, a facial recognition system that has trouble detecting dark-skinned faces. Black and brown people clearly have faces, but because the systems were not trained on their facial data, and the capability to adequately adjust exposure or other aspects of the sensor stack was not built into the system, it provides an objectively wrong solution given perfectly reasonable inputs.

I agree that this process is one of resolving blind spots, but I disagree that the blind spots are simply areas devoid of light. AI/ML systems are frequently employed to augment or stand in for human perception, which is known to be necessarily incomplete with respect to reality. In other words, they can learn things that seem true to us but that are false from another perspective, or undesirable once exposed. What's exciting about them is that they provide an opportunity to interrogate the flaws in our individual perception with a systematized observation and analysis, in a much more sophisticated manner than in the past. But fulfilling that potential requires humility.

Yeah that's a good point, about having sufficient examples in the training set. I think it's not as simple as that though, several years after that made headlines people have still not been able to close the face recognition accuracy gap. Experts are debating different explanations for this.


That wasn't entirely my point, however. The training set was only a part of the problem, which in its totality was that the developers failed to consider that the functioning of their system could be affected by the biases they didn't even know they had, in this case of being more likely to see light-skinned faces as belonging to a human.

Their failure was not just in lacking diverse training sets, but diverse QA, or at least QA looking for those blindspots which eventually became evident.

So, correct, it's not as simple as having "sufficient" data.

Normally the developers wouldn't classify images themselves (their time is too valuable). In any case, I think failing to detect a face as human should be rare unless the photo has very poor lighting or the person is almost hidden.

The developers, or the project managers, are the ones setting the parameters: the expected outputs and the processes necessary for that output to be generated. That they don't classify the images themselves doesn't take away from the failure of the system to work as expected being a result of their failure to fully understand the problem, which is itself reflected in their unconcious biases.

Your expectation was the same one they had, and it was wrong, which is the crux of the issue.

There's only been one case of being unable to recognise black faces that I know of, and it was shown later to be due to the lighting conditions the guy was using leading to very low contrast imagery. The same problem was replicated with white faces: there was no racism anywhere as you would expect given that unconscious bias hasn't been shown to exist at all (the studies that claim to show it have all collapsed).

If you asked the developers of the facial recognition library, "does your software have problems with very low contrast conditions" they'd surely have answered yes. Fully conscious of the issue but, that's software. It's hard to get everything right 100% of the time.

There have actually been several cases of recognition and classification issues, and it is an ongoing problem.

> Your expectation was the same one they had, and it was wrong

Do you have a source for data set mis-labelings being a problem?

I didn't say they were. You're beginning to present yourself as someone speaking in bad faith.

You're correct. A statistical model could uncover "an inconvenient truth", and even make us question our biases and become more adamant in finding better solutions.

However, ML is often sold as a solution for generating outcomes, not for finding truths, wether they be true or false.

The distinction is huge.

No matter what humans do, they will reap what they sow. Consequences and outcome matter more than "truth" (which may be in the eye and competence of the beholder).

I guess the question is how well to we expect AI/ML systems to be able to generalize on a biased subset data. If you are trying to use one of these systems to make hiring decisions but you train it on data from the hires you made, should we really be surprised if biases in the original hiring(education level, gender, race, income, personality) are reflected in the AI's decisions?

A glance over some dictionaries reveals I'm leaning on an essentially archaic and poetic sense of "bias" here to evade its meaning exclusively an unreasonable judgement, it seems.

There is often confusion about what AI systems are, and what people think they should be. What they are is Bayesian black boxes that use pretty basic statistics and probabilities.

What people think they should be is far, far more complicated and nebulous. I'm not sure I fully understand, but people have been fed a lot of nonsense about AI systems, from Deep Blue to Watson to AlphaGo, etc, etc, showing them as being very powerful in limited domains, and extrapolating that out into overestimations of what they could do.

The other main problem is that people seem to think that these AI systems will be a complete replacement for human thought and decision-making, which, frankly, knowing even a little about how the sausage of software is made, is completely terrifying.

This is exactly what they are designed to do.

The scary part is, they can do so on a different level than your average human. Pouring over larger sets of data quickly, than you or I have time to consume in our entire lives.

The largest issue, however, is that an AI can find and shed light on dislikable realities of the world.

Racism, sexism, culturism, opposing political opinions... Perspectives that are not "PC" still exist and permeate the digital world along with the physical one. Creating unbiased data is, imo, impossible, as I am also biased, and so are you. I don't know what unbiased data is.

I can certainly say that I have held racist, sexist, religious, and political views at various stages of my life - based on small sample set data, and biased trainers. I have grown a better understanding and no longer hold many of the naive beliefs that I held when I was younger, and will continue to realize how ignorant I am as I live.

The same process will probably happen for any AI.

Creating unbiased data is, imo, impossible, as I am also biased, and so are you

What about just learning based on the entire web?

I think you're using the word "unbiased" to mean "heavily adjusted for US centric views on racism and sexism" which isn't what the word really means.

If you train an AI on everything written - all books, all web pages, all newspaper articles etc ... a not impossible task these days - then you can argue you're as close to bias free as possible.

However a small number of AI researchers don't like the results they get when they do this, because the AI learns the world that truly exists instead of the one they wish would exist. But that's not a bug in the software. It's a bug in the researchers.

The human race is also biased. You aren't getting away from bias that easily.

It's good to hear you are now no longer ignorant! :-D

"No longer ignorant", I certainly am not. Thank you for the positive reinforcement though.

Becoming less ignorant is a life-long process, as far as I can tell.

I'm just somewhat more aware of my capacity to over-generalize based on my individual experiences, and allow biases to settle in my subconscious in the form of racism, sexism, ageism, or what have you. I try to find where I have internalized these thoughts so I can do a little internal reforming. I also pay a lot more attention to "_______ is/are _______" statements, as they are almost always over-generalizations.

Being aware of this mental mechanism doesn't really stop me from doing it though. I know I'll tend to cluster experiences to create generalizations indefinitely, as it seems to be an evolved trait (makes sense for survival reasons, to assume the worst until you find evidence otherwise) - even though it's not perfect.

Such articles are crafted to create outrage among the uninitiated. They like to pretend that "bias" did not exist before, and AI is introducing it. Every time we, as humans, make any decision, we're "echoing biases we're fed". It's 100% intractable to rationally think about every decision, so we almost never bother, and when we do, we second guess like crazy because rational decisions often do not agree with our biases. And guess what, most of the time our biases are _right_. It wouldn't make any sense from an energy conservation standpoint to have them otherwise.

What we should be aiming for instead is not the total lack of bias (an impossibility if you are to learn anything at all from the data), but _explainability_. A system must be able to show me the set of statistics that led to a particular decision. I.e. women in Los Angeles area are known to be much safer drivers according to this subset of data, so we offer a lower insurance rate to women in Los Angeles area, to use just one hypothetical example. Such systems are a rarity nowadays, and research into them is relatively sparse.

It is, except that people are discovering that the programs faithfully learn from their training material, even including traits and biases present that are not consciously included by the person that selected the training data.

It seems a lot of people don't realize that there's likely a large disconnect between what people think they're training an AI to do and what they're actually training it to do.

If they're lucky, the two are close enough that the learning program will be useful for what was hoped, and if they're unlucky it will look like it's useful and correct but will include behavior learned from the training data that was not predicted and which taints the result.

IE, if you train an AI with what you think is unbiased data but which includes a subtle bias, then the results you get from it may be biased... and the bias may be so subtle it's undetectable except by another AI, which is a problem if you assume that since your training data was unbiased, your AI must be unbiased.

Putting it another way... garbage in, garbage out.

It really depends on the specifics of the optimization problem in question.

If ML is trained on biased data where an optimality exists only at an unbiased solution (think a shaped reward function in RL to disincentivize class bias or something similar), then no, the ML is most certainly not supposed to echo the bias they've been fed.

On the other hand, if an optimal solution to the ML optimization problem exists at a biased solution, ex. a naive prediction of if a nurse is a man or a woman, then yeah, we would say that it was supposed to echo the bias.

All too often I feel people forget that ML is just an optimization problem. What you're trying to optimize really matters - generalizing about all ML without talking about the optimization problem in question is pointless.

Nope. If I feed an NLP model a bunch of shakespeare, it'll "echo the biases it's fed," but it won't do what you probably wanted it to. A less silly example is feeding the system a bunch of wall street journal articles and then trying to use it on informal communications like twitter. The goal is to learn something that solves a particular problem. It turns out that the current methodology solves problems in a biased way.

more generally, don't we expect General AI systems to be racist if they have racist parents? That's just proof that the AI works human-like.

Try and wrap your mind around the idea of an "unbiased" AI. It is not possible.

As a trainer of an AI, simply looking at a picture and saying that it's a cat, and then telling the AI that it's a cat, is biased. You are automatically assuming you are correctly identifying it as a cat, and teaching the mind of an AI to follow your presumption.

The only way to make an AI less biased (but still biased), is to diversify your labeled training data. And by diversify, you need to diversify the presumptions on the labels of the data themselves. What is the confidence among several people that this picture is of a cat? Rather than one trainer.

This is a simplistic example. If you scale it to more complicated tasks, such as understanding natural language, body language, intent of users, or anything related to ethics, you'll likely end up with a crazy AI. And yes, I mean crazy. As in, simultaneously holding very seemingly opposing understandings of reality and spitting out data that reflects those oppositions.

If you've ever had more than one authoritative figure in your life give you contradicting advice to another - one parent holding differing beliefs to another, or one teacher to another in school, you may notice yourself starting to qualify your teacher's competence as a factor in the fidelity of their teachings.

A scary, but simultaneously cool thought: An AI that questions my capacity to teach it, while I'm teaching it.

Bias will always remain an element in the equation however.

One AI will likely have interacted with different trainers or data, and create contradicting understandings of reality to that of another AI.

"Duck or rabbit" is another simple example that would require bias in visual interpretation to reliably decide.


Hah. Thank you for this reminder. This is a great example to illustrate the point.

I would say that a less biased (wiser) AI, would learn that this duck/rabbit image is, itself, questionable, and can be perceived both ways (high confidence values for both identifiers).

Me: Is this a duck or a rabbit, AI?

Padawan AI: It's a rabbit (51% confidence as rabbit. 49% as a duck).

Master Yoda AI: Rethink your question, you must. Yes, the most correct answer is.

"Garbage In, Garbage Out" has long been a foundational saying in computer science. I'm genuinely curious why so many in this generation of AI researchers and developers seem to think they're immune to this.

> On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

> Passages from the Life of a Philosopher (1864), ch. 5 "Difference Engine No. 1"

A very long time indeed!

I think the key source of confusion (both in 1864 and today) is that non-technical people hope that the machine has some wisdom correlated with its arithmetic prowess. Charles Babbage's mechanical calculator did not have wisdom. Even when you give a bleeding-edge algorithm Big Data and lots of processing power, it doesn't yet have wisdom. But hope springs eternal.

The oodaloop article is a summary of the NYtimes (https://www.nytimes.com/2019/11/11/technology/artificial-int...) article which is more detailed. Also pleasantly surprised in the NYTimes accurate description of what BERT was trained to do (masked word prediction and next sentence prediction) and the implications of that.

I have worked closely with BERT and other language models for our startup. There is a disconnect between the capabilities and the state of AI research today and the public's expectations and imagination. There's also fundamental confusion between scale and intelligence. That is the public largely believes the efficacy of many "AI" models on large scale problems is equivalent to intelligence. That assumption is problematic in both overestimating the capabilities of these models and misdirecting the focus of critical inquiry.

Hopefully, there is more education around the limitations and capabilities of these technologies. We should be more cautious in apply them to usecases where there is high potential of negative consequences.

Is the problem that AI is biased? Or is the problem that AI lacks the biases that we expect people to have?

For instance, people have detected that AI models associate technical words with men more often than women and point to this as evidence of bias. I would argue that this is the opposite situation. There's social stigma attached to acknowledging differences between groups, so people have developed biased against acknowledging those differences. The AI on the other hand sees that 70-80% of technologists are men, and associates accordingly. So the problem isn't that the AI is biased it's that it lacks the biases we expect.

Now, there are good arguments why this bias may be a good thing. Unbiased does not automatically mean good.

I think you've very severely missed the point. Differences like those you describe generally reflect biases in our socal structure, which are then (correctly, but problematically) reflected in statistical models and algorithms of all stripes. This becomes more problematic when we overinterpret the resulting findings, or uncritically incorporate the resulting predictions into real systems.

A good example is the criminal justice system in the US. Minority communities are much more intensively policed, prosecuted, and jailed. A model will, without care, use race or poxies thereof to predict "criminality" without understanding or accounting for the bias inherent in the label. If that is used in policy, it runs the risk of amplifying existing social problems and injustices.

We alway have to ask: "bias for what?", or the conversation will be hopelessly confused.

It's unpopular to talk about, but there is quantifiably more crime in some minority communities in the US.

There are some crimes, like drug use, that are indeed over-reported in some communities due to over-policing.

But there are other crimes, like homicide, that are nearly universally reported, regardless of where they are committed. And those crimes are much more frequent in minority-majority parts of the country.

Turning a blind eye to this problem is a disservice to those communities, because they are the same communities that are most commonly the victims of crimes.

The problem is that these systems are fundamentally unable to distinguish causation from correlation. Which is admittedly a hard prob even for humans, but at least we have some capacity to tease these out.

In this case, the increased crime in these communities are not caused by their minority status, but rather from a multitude of other factors with historical and societal origins.

> The problem is that these systems are fundamentally unable to distinguish causation from correlation. Which is admittedly a hard prob even for humans, but at least we have some capacity to tease these out.

I'm not sure why causation vs. correlation is relevant here. When, say, someone is up for parole the judge will review their past criminal history, conduct in prison, whether or not they will have a support structure when they are released, etc. None of these factors are causal. A judge cannot point to a past crime, or to conduct in prison and say "this factor will cause you to reoffend". No, the decision is made based on factors that are correlated with higher rates of recidivism.

It matters which correlating factors you choose. While the criteria you mentioned are merely correlated for a specific individual, on a macro level there are definite causative reasons why a lack of a support structure etc could lead someone to recidivism. A human is able to look at all of those and consider whether they are plausible causes and, importantly, test their assumptions.

The USA has spent 4+ centuries kicking certain communities down the Maslow Heirarchy, this was done on purpose through laws introducing redlining, Jim Crow, no access to the GI Bill, slavery etc...I'd like a world where we could judge people fairly on the same standard, but we purposely created economic underclasses and that comes with a certain amount of desperation that leads to crime.

White Americans = 70% of pop. , $100 trillion in wealth.

African Americans = 14% of the pop, $2.9 trillion in wealth

African Americans in particular owned almost 10-12% of the land in this country (true wealth) and were promised more from the government in reparations (40 acres) not long ago, but discriminatory policies stripped that land away from them over 100 years.

Being in the USA for 150-200 years results in atleast 500k-1million $ in wealth purely due to land and home value appreciation.

The average black person has about ~500$ in wealth. This isn't a fluke, this was designed... This is a country that criminalizes being poor more and more in many ways, so we can fall into the trap of revictimizing the underclass we created, this time using algos, if we are not careful.

Yes, African Americans are disadvantaged as compared to whites. But how is that relevant to me previous comment?

When a judge is evaluating whether or not to grant parole, is he or she looking for a causal factor that will directly cause the convict to re-offend or is the judge evaluating the convict's situation with factors that are correlated with re-offending?

And you're also confusing correlation with causation here. Yes, "governmental policies have discriminated against African Americans and prevented the accumulation of wealth" is a valid hypothesis, but the data you've presented doesn't show it.

Showing causation with whole-group statistics is very, very hard.

It's not a hypothesis. It's hard to track money flows but not as hard as the anti-reparations crowd make it seem.

The majority of the $500k-1m baseline in white middle-upper class wealth is in homes and land handed down as inheritances, this is not debatable. That property allows a certain amount of leverage to invest in education, businesses etc...

The ben & jerry's founder talked about this in vivid detail on the campaign trail with Bernie Sanders. If he was black where he grew up, no GI bill = no cheap housing = no appreciation of property/land over his childhood = no financial leverage to build his company.

I can take risks and fail without going bankrupt thanks to familial wealth, that is a tremendous luxury not afforded to the group I'm mentioning.

Unfortunately for that argument, the vast majority of white wealth is not in the hands of the middle-upper class, and I highly doubt that the wealth of the 1% (or .1%) is mostly in homes and land inheritance. And if you're counting "middle-upper" as 80-95%, I think the top 5% has ~70% of the total wealth in the US... (for reference, 95th percentile is ~2.5m, or comfortably above your 500k-1m baseline. 1m is 88th percentile, 500k is 80th)

(And in any case, my point was just that it wasn't shown as causation in the stats you cited)

>...the vast majority of white wealth is not in the hands of the middle-upper class

You are correct on that point. More than 50% of the 100$ trillion is held by the top 10%.

On the other point, my point is the first $500k to 1million of wealth was due to inheritance... not the total wealth of the 1%.

Follow up: At one point, any white male could move westward and get free land (whether indegeous owned or not). 40-100 acres after a century+ of property value appreciation is quite a bit of wealth. The $500k-1m number I use as a baseline for white wealth is very conservative.

Cynically, does a bank care why zipcode is correlated with failure to repay loans? Forcing the bank to act as if the probability distributions are different from how they really are sounds like a very awkward way to redistribute wealth. Maybe the right answer would be reparations for whatever society did to everyone in those communities and then total freedom of association for banks and others to give loans as they see fit. Another option would be to give a bad credit subsidy to people in the cases where their low estimated trustworthiness is judged to be due to someone else's error. For example if a factory gives me lead poisoning, my interest rates go up because people with high blood lead content are less likely to keep good credit, but the factory has to pay the difference because they were liable.

This reminds me of the argument for the UBI. Instead of having thousands of tiny chairty programs sprinkled all over society, why not let the economy make itself as efficient as possible and then hand out the charity in units of dollars?

> unable to distinguish causation from correlation

More specifically, the current breed of machine learning is correlation engine, the only strength these networks have is finding correlations in the absence of context or explanation.

> there is quantifiably more crime in some minority communities

Sure... and what about the chronic lead exposure issues in many of these areas, the effects of long-term overpolicing of minor crimes on family stability, and the impact of (lack of) generational wealth and education from being used as de facto or literal slaves for generations?

The 'minority' part here is a correlative factor, not a causative one.

An AI classifier doesn't know, or care, why a particular population has higher crime and lower income, it only recognizes and reports on the pattern.

...which is exactly the kind of "treating culturally- and situationally-contextual results like inherent facts" bias the article is talking about.

And if those results change, so will the algorithm's outputs! But asking the algorithm to make the change seems to be a bit much.

Honestly, my preferred method of solving this would be to train the algorithm on a data set with all of the forbidden values included along with anything else the creator feels relevant - zip code, income, familial status, favorite sport, education - and then when running in production, against real people, don't give it the restricted information. Yes, you could theoretically extract race, gender and other protected stats from the information the algorithm actually uses in prod - but it has no incentive to, since a less-noisy signal is already provided.

For instance, suppose the optimal algorithm for your data set is some linear function of X, Y and Z - let's say X+Y+Z to keep things simple. X,Y and Z are all normally distributed variables, mean of 0 and the same standard deviation. Y has a 0.5 correlation with X, and a -0.5 correlation with Z. If not provided Y, your algorithm might come up with 1.5X+0.5Z as an approximation - extracting a bit of the signal for Y from the things it does have access to. It's suboptimal, but better than just X+Z. Unfortunately, Y is verboten - we're not allowed to discriminate on it, and this approximation ends up with results that track Y. So instead we train with X, Y and Z as inputs, so the derived model is X+Y+Z - and we can drop Y from that model in production, leading to a model that (while less accurate) shouldn't unfairly track Y.

> And if those results change, so will the algorithm's outputs! But asking the algorithm to make the change seems to be a bit much.

The problem is that the output of those algorithms is used to drive decision making that has the effect of maintaining the status quo, by removing opportunities to change it.

The real problem here is a deep political schizophrenia in modern society, or at least parts of it, which demands decisions be deliberately biased towards the outcomes they politically desire. These people then turn around and describe results that are not biased as "biased", which is utterly Orwellian.

I think your comment shows that you understand this. You accept that a decision may be correct, when measured in totally cold and statistical terms. But such decisions would not "change the status quo" and that would be a problem.

But that position is a deeply political one. Why should decisions at banks, tech firms, or wherever be deliberately biased to change the status quo? It's social engineering, a field with a long and terrible track record of catastrophic failure. Failure both to actually change reality, and failure in terms of the resulting human cost.

Injecting bias into otherwise unbiased decisions by manipulating ML models, or by manipulating people (threatening them if they don't toe the line), is never a good thing.

Maintaining the status quo is also a political position, though. In general, there's simply no way to interact with other people at scale without politics coming into play. It can be inadvertent, in a sense that there was no specific intent for "social engineering" - but if one's ethics prioritizes outcome over intent, it doesn't really matter.

Which loops back around to my original point, which is that the notion that we should alter or influence the algorithm because its output does not match our worldview or politics is not the removal of bias it is the deliberate injection of our personal biases.

The whole point of using the algorithm was to make sure personal biases aren't impacting the decision. If we're going to alter the algorithm because we don't like the result, then why are we bothering to use an algorithm in the first place? Just use a human to make the decision. At least in that scenario potential biases have an identifiable source, as opposed to an opaque program that may have been made by engineers that deliberately tuned it to avoid any disparities because they think any disparity in outcome is fundamentally problematic

This is exactly the problem. An AI classifier without sufficient context is simply a very efficient discrimination machine. We must make sure that AI systems have enough context and protection from human bias before large scale deployment.


But I think you're missing the point. Deliberately altering the model to produce equal results despite unequal patterns in reality is not the elimination of bias, it's introducing bias.

Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure? Or is the goal of the model to predict crime rates in the real world?

The answer entirely depends - "Deliberately altering the model to produce equal results despite unequal patterns in reality" is so vague as to be meaningless in this discussion.

> Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure? Or is the goal of the model to predict crime rates in the real world?

The goal is to use the results of a model for something (pet peeve, I hate the use of "AI" to describe what are usually pretty standard statistical or ML models). The model you create, and how you apply/interpret it, depend entirely on what you're actually trying to accomplish or change with the results.

Depending on what that is, the kind of "bias reflection" we're discussing is hugely problematic.

And that "something" we want the AI to do is usually grounded in the real world in some way, be it shopping patterns, or crime. And the real world has disparities.

For example, crime rates are not equal between men and women. If we force our AI to assign equal risk of crime to men and women then we will have introduced a bias that either under predicts the rate of male crime, or over prdicts the rate of female crime.

"Truth" is irrelevant and academic.

What matters is: What are the outcomes and consequences of active systems, AI or not.

For instance: How do the algo cope with derivatives of its own output being fed into itself as input at a later stage?

If truth is irrelevant then just return random output and call it a day.

The reality is, truth is relevant and sometimes the truth is inconvenient. Tech workers may want to build an AI that measures risk of recidivism that produces uniform risks across race and gender. But the truth is, rates of recidivism is not the same across all groups. If we produce the desired outcome of equal reporting of risk, then the consequence is that men have their risk underreported to put them on parity with women, or vice versa.

It depends on what you're using it for and why. If you're concretely distributing police resources, you probably want short-term prediction of actual aggregate crime rates (but also want to consider the risk of overpolicing based on crime rates only recorded higher because of historical overpolicing). On the other hand, if you want to understand anything about the populations involved on some abstract level, merely plugging in predicted crime rates based on historical data won't help at all.

This comment chain is in reference to law enforcement and the justice system. I would hope that our law enforcement and justice systems are operating on real world data, rather than a hypothetical world's data.

Yes, law enforcement should use this model to determine where to assign police resources to minimize crime.

The justice system should absolutely not use this system for any purposes, since justice is based on the circumstances of the individual case in front of it, not the societal statistics which apply in the aggregate but may not apply to that specific case.

> Yes, law enforcement should use this model to determine where to assign police resources to minimize crime.

And if an activist engineer deliberately biased the model to avoid indicating disparities in crime, then we will have sabotaged police's abilities to allocated resources. Hence, why this assumption that disparate outcomes are indicative of a biased model is a problem

You're still not getting it.

We already know where crime is occurring. We don't need an AI model for that.

People aren't arguing about not using biased data, they're arguing that the model needs to be designed and trained so that the bias in the data doesn't affect the predictions in the model. And yes, that means deliberately de-engineering bias out of the model, which may involve introducing a counter-bias.

For example, you and others kept bringing up race earlier as a legitimate bias for criminal profiling. But socioeconomic status is far more correlated with propensity to commit criminal acts than race. A model of crime in LA based on race, for example, would assume that people in Ladera Heights are just as likely to commit crime as people in South LA because they have the same race...but Ladera Heights has a fraction of the crime as South LA (and several times the average income). Similarly, you would expect South LA to have less crime than the largely Caucasian Joshua Tree or Fontana...but both cities have higher crime rates than South LA, and for a period were some of the most dangerous cities in California. (Joshua Tree was the inspiration for, and original setting of, Breaking Bad. Fontana used to be known as the Felony Flats.)

No, I have repeatedly and explicitly stated that protected classes like race should not be used as inputs into any sort of model. The notion that I have said that we should use race as an input to a model is completely false.

What I am saying is that pointing to disparities in the outcome of these models to claim that the models are biased is not, on it's own, a reasonable conclusion. As you point out, people in Joshua Tree and South LA have higher rates of crime than average. So if our model flags people from these areas as high risk more frequently than other places, is our algorithm biased? If we deliberately make the model to produce uniform results across different locations because an engineer feels that it's problematic to have a model that produces different results between different geographic cohorts, then have we mitigated bias? No, that engineer intentionally introduced our own bias to make the model adhere to his or her worldview.

I would absolutely not want the justice system to use historical statistical information without considering factors like lead exposure and past overpolicing, and even pure boots-on-ground distributions of law enforcement resources should take into account historical over- and under-policing and its effects on the statistics being used to distribute those resources.

Okay, that's your position. Plenty of people do not want an AI that, say, gives parole to someone that is a high risk of re-offending because someone engineered the AI to be more lenient towards people who grew up in areas with greater lead exposure.

Consider if someone grew up in an area of high lead exposure but was, one way or another, protected from it as a child [1]. Should they be treated as having the same recidivism risk factor as someone who suffered the full effects of lead exposure? If not, how do you separate the effects of lead exposure from other factors to recidivism rates?

[1] See the 1999 documentary Blast from the Past.

The better approach is to define what inputs are going to be used, explicitly. Some like race and religion are going not going to be used, as discrimination on the basis of race or religion is off limits. But age, past offences, and the like probably are

But I think we're diverging considerably from the original point: that forcing an AI to produce equal outcomes despite unequal behavior in the real world is not the elimination of bias, it's the deliberate introduction of bias. If we have an AI that predicts recidivism rates, and we engineer it to produce equal predicted rates across all groups despite different between rates groups in the real world then we are deliberate introducing bias. The truth, regrettable thought may be, is that a magical AI that operates with 100% accuracy - the only people who it flags would have re-offended - is going to produce disparities because recidivism rates are not equal.

I'm sure an algorithm looking at the current state of the world would surmise Europeans are geniuses when compared to the rest of the world population. Totally ignoring the $150+ trillion of wealth stolen at gunpoint over the last 2+ centuries.

The blindspots end up being extremely problematic.

Can you elaborate on how European imperialism is related to whether or not we should take lead exposure into account during parole hearings in the United States?

>Deliberately altering the model to produce equal results despite unequal patterns in reality is not the elimination of bias, it's introducing bias.

Statisticians refer to this process as "controlling for confounding factors." What really matters is what questions you're asking. Data is too often abused, not always intentionally, by people with vague questions.

You're missing the point when asking the final question. Predicting crime rates is not the goal in itself (unless you're running some kind of crime rate bet).

If you're going to use a model trained on simple, biased data, to get, say, insurance estimates for a for-profit company, the model will probably successfully increase profits, so it was a good model.

On the other hand, if you're going to use the same model to help with sentencing, where your goal is to see equality and justice, then the model will do very badly, since it will punish many people for the community/skin they happened to be born in.

The problem is, some groups do commit crimes at higher rates than others. If we are engineering the model to produce outputs with no disparities when disparities do exist in the real world, then our model is going to be biased.

For instance, men commit more crimes than women. If we are building an AI that predicts risk of committing crime (say, estimating rates of recidivism) and we forcibly make it report equal rates between men and women then we will be creating a discriminatory system because it will either under report the risk of men or over report the risk of women in order to achieve parity. Engineering parity of outcome in the model when the real world outcomes have disparities necessarily results in bias.

Is it introducing bias or is it introducing appropriate weighting? The most crime filled neighborhood in America may not be the inner city you have pictured in your mind but Lower Manhattan.

Is the goal of the AI model to predict crime rates in a hypothetical world where everyone has equal rates of lead exposure?

What a terrible example. There are many other features being omitted that would predict crime rates. If anything, this is an example of enforcing your own bias on the model by not including all relevant features.

Great then go ahead and include additional features. The problem is, recidivism rates in real life are not equal so as the model increases in accuracy it will inevitably produce disparities in its results.

If you already know what you want the model to say ahead of time and are tweaking it to fit that narrative then there is no point in creating a model in the first place.

You are correct about the distribution of certain crimes, but you've entirely missed the point.*

Problems like violence have multiple causes, but are widely understood to be linked to problems of poverty, inequality, and marginalization. Violence is also self-perpetuating through social networks. When an incautious user of statistical tools fails to investigate the causal story behind such findings, they're going to get the wrong answer. When a policymaker acts on incomplete or misleading findings, they're going to make the problem worse.

*Unless you're arguing that it's the fact of being non-white which makes people more violent, in which case I have nothing to say to you.

> Unless you're arguing that it's the fact of being non-white which makes people more violent

This is not how I interpreted OPs comment at all. I just read it as a note that data shows correlation between minority communities and crime. Minority is a flexible, relative term, to my understanding - which, notably in recent years is frequently attributed to "non-white", but over time has been attributed to many different groups of people (caucasian as well).

Interpreting it as meaning "non-white" in this context is yet another good example of our predisposed biases and relativistic understandings at play, I suppose.

Not to harp on you - just noting the differences in our perceptions.

I totally understand, and maybe I shouldn't have included that, but I do see phrases like "It's unpopular to talk about" get used as dog whistles all the time in other forums (and in real life). There's a whole subculture dedicated to dressing up racism in pseudo-scientific language (race realism etc.)

At the end of the day, there's enough real, vile bigotry out there in our society, I think it's important to be extra clear when discussing topics like this.

I'm glad you did. It's a great example of the problem we are discussing.

I agree, there is are many negative biases in the real world. More so, I'd argue, on the internet, where people feel safer at a distance to state disagreeable opinions.

It's worth noting, especially when we are discussing the nature of how an AI learns.

There is more crime in poor communities everywhere, and minority communities are more likely to be poor. Moreover, the sorts of crimes that poor people commit are more likely to be prosecuted, and minorities are more likely to be prosecuted for them.

Whether or not the differences in society are the product of bias is it's own can of worms. But what you're writing is demonstrative of what I'm saying. People, unlike machines, tend to regard differences between groups with suspicion and are biased against acknowledging these differences. Machines, on the other hand, do not harbor these biases and build a model that reflects the reality we live in.

As I wrote in my original comment, there are potential justifications as to why we might want to introduce bias in our machine learning models to prevent the from identifying things we don't want. But the point remains: building a model that, for example, associates technology with men and women at equal rates despite the disparity that exists in the reality is not the elimination of bias. It's the deliberate introduction of bias to make the model produce a result that is in line with what we consider a more ideal worldview.

> People, unlike machines, tend to regard differences between groups with suspicion and are biased against acknowledging these differences.

I would argue that the case is precisely the opposite. I was literally talking to someone from Colombia yesterday who was complaining to me about how all Venezuelans are lazy criminals that are stealing their jobs, when in an American context, people wouldn't even distinguish between them -- they'd both be 'hispanics' or 'latinos'.

That person took a difference caused by a temporary political and economic situation and applied it to a population of people as an inherent property. I think that is actually the way that _most_ people think, and to make people think otherwise requires a lot of education.

Sure, I suppose I should have qualified that statement to refer to left leaning Americans (which are the primary demographic in US tech companies). Among these circles it is not socially acceptable to acknowledge disparities between groups, particularly when those disparities are related to demographics considered marginalized.

This narrative is your own though..

(Not saying wether it's "true", "false" or 78% likely here)

The narrative is what I have observed and experienced working in the SF Bay area, and it's what the overwhelming majority of my colleagues report as well. You can also see it playing out in this thread, where people are not-so-subtly suggesting that people who point out disparities between rates of crime between groups are white supremacists: https://news.ycombinator.com/item?id=21527985

The intelligent response is to find ways to convey a message without alienating the PC crowd (who incidentally just undermines their own authority when claiming to own absolute truth).

Many people have been repressed their whole lives. It doesn't get better by digging heels in, but sometimes compromise and not trying to win all wars, may turn out improved outcomes.

A big problem is the whole dull-minded "right vs left" duality. Tribal mentality that replaces actual issues and solutions.

empath75 28 days ago [flagged]

I’m going to not subtly at all say that if you’re pointing that out, you’re a racist.

I'm not sure I follow. I state that there is stigma attached to acknowledging differences between groups. Calling people racist for doing so seems to reinforce my claim that it is stigmatized, no?

empath75 27 days ago [flagged]

Some stigmas are well deserved.

>and to make people think otherwise requires a lot of education.

With enough education, you can make most people think just about anything.

Education should be to make people think for themselves, come up with new ideas, inventions, better conventions or just enjoy life more.

That would be nice. Too bad the people with the power to direct the education of the masses don't benefit from that and have no loyalty to the masses either.

>People, unlike machines, tend to regard differences between groups with suspicion and are biased against acknowledging these differences.

That is true for people in the western world that have been heavily dosed with the ideas coming out of the western social sciences in the past 100 years or so, and not really anyone anywhere else, ever.

> building a model that, for example, associates technology with men and women at equal rates despite the disparity that exists in the reality is not the elimination of bias

You're assuming that that disparity is not itself a result of bias in any way.

Maybe it is, maybe it isn't. But this is not relevant to what I'm saying. The goal of the AI is not to associate words in a hypothetical world where there is no discrimination or bias. The goal is to build a model that accurately reflects the real world.

Again, there may be justifications to deliberately introduce bias in the model to produce higher rates of association between women and technology than would normally occur if the creators of the AI so desire. But this is not elimination of bias, this is deliberate introduction of bias to make th model match the creators' worldviews.

> The goal is to build a model that accurately reflects the real world.

This is incredibly reductive - we use models for MANY purposes - prediction, inference, description, evaluation. Until you outline a specific use case for the model, understand where the data its trained on comes from and how it may or may not reflect reality, and think about whether the phenomena which you're measuring are important/sufficient for your application you CANNOT possibly justify your broad claims.

I'm well aware that models are used for a variety of applications. But regardless of the application, we usually want the model to reflect reality in some way.

Say you have an AI that predicts probability of re-offending for prospective parolees. If we observe that this system is flagging men as high risk more frequently than women, then this is not indicative of bias.

But!, one might say, men are socialized to be more violent and are more frequently recruited by gangs. These factors are being absorbed by the AI. Yes, they are, and this does not make the AI biased. The job of the AI to predict risk of re-offending in real world conditions. Not risk of re-offending in a hypothetical world where everyone has the same environment and experiences.

If we were to manually adjust the model to make it display equal risks for both men and women then that is not elimination of bias. It is deliberate introduction of bias to make the output fit our ideal worldview.

> Say you have an AI that predicts probability of re-offending for prospective parolees.

Already you've made one of the most common mistakes in this area: confusing the label for the latent truth. We don't measure "re-offending", we measure rearrest and re-incarceration, which as I've discussed elsewhere are biased (in the sense of "unjust").

> If we were to manually adjust the model to make it display equal risks for both men and women then that is not elimination of bias. It is deliberate introduction of bias to make the output fit our ideal worldview.

Again, you are using the word "bias" in a vague and meaningless way. In this context, our goals are constrained by the law - in the US we guarantee people certain rights, and prohibit certain kinds of discrimination. In this example, it would indeed be wrong (and illegal) to punish men more harshly just for being men, since gender is a protected class. In the same way, it would be wrong (and illegal) to punish black people more harshly just for being black. In my state, whites and blacks use cannabis at equivalent rates, and black people are arrested 8 times as often per capita - an algorithm looking at arrests will see that and make the problem worse.

You've staked out some sort of naiive and bizarre "AI Purity" stance which completely ignores how such models work, and which misunderstands how we use models to learn things and solve problems. You're also mixing up different definitions of "bias". Some facts which you might find interesting:

* Penalization/shrinkage, which is ubiquitous and incredibly useful in many predictive models, is the deliberate introduction of bias to improve prediction performance

* Adjusting for confounding in regression/classification type models is one way of accounting for or removing bias in a statistically rigorous way - as you (imprecisely and inaccurately) say, "modifying the model to make it display equal risks". This allows us to measure effects, conditional on known risks or existing patterns.

* All statistical and machine learning algorithms suffer from sampling bias, where the data we observe doesn't match the reality in important ways - a fact which you're completely ignoring. If a model reflects the bias in a bad sample, why on earth should we accept that?

> If we observe that this system is flagging men as high risk more frequently than women, then this is not indicative of bias.

If all it's doing is associating being a man with a higher risk factor without taking into account anything else in that man's life then, yes, it's being biased. "Being a man" in this situation is a correlation to, not casuation of, actual factors making a person more likely to commit crime.

This is a real problem that has been happening for years in Florida: the automated risk assessments used for a variety of decisions, including bail and sentencing factors, labels black people as inherently being more likely to reoffend than white people, even after controlling for actual crime rates and recidivism rates. https://www.propublica.org/article/machine-bias-risk-assessm...

That propublica article is the perfect example of the many aspects of the discussion here. It argues just one side of it, but the scenario it raises is one where an ML algorithm correctly found a solution to a badly framed problem.

Briefly, (and simplifying for clarity) it worked like this: the ML algorithm scored the risk of criminals reoffending. It gave a "high risk" score if someone was 80% likely to re-offend, and a "low risk" score if they had a 20% chance. Out of 100 blacks and 100 whites, here are the scores the ML algorithm gave, and how many ultimately did re-offend:

    100 black criminals
      * 50 "high risk", 40 re-offended
      * 50 "low risk", 10 re-offended
    100 white criminals
      * 10 "high risk", 8 re-offended
      * 90 "low risk", 18 re-offended
On the one hand, the algorithm was completely correct: 80% of high risk individuals re-offended, and 20% of low risk individuals did, and this is true even looking at just the black or the white criminals. It was unbiased according to its goals.

On the other hand, the failure mode disproportionately punished the black criminals: 10 were "high risk" who never re-offended, while only 2 white criminals were. Meanwhile, 18 "low risk" white criminals did re-offend while only 10 black criminals did. So the score was more strict for some unlucky blacks and more lenient for some lucky whites.

However, a key point is that because the underlying re-offense rates were different between the populations (50/100 for the black criminals, 26/100 for the white criminals), the algorithm could not have done otherwise. That is, given some n% re-offenders, if you have to fit them into 20% and 80% buckets, your "high risk" and "low risk" counts are mathematically fixed. In other words, it wasn't the ML by itself that was the problem, but the COMPAS score that it was trying to compute that had these issues inherent in it.

I think this is a good example where ML wasn't biased (at least, not anymore than reality), but where people were too eager to turn to it for a poorly considered project. By wrapping up important questions in high tech algorithms, it's too easy to fool yourself that what you're doing is the best thing you could be doing and miss problems in the fundamental framing.

1) Using skin color (or proxy) as basis for judgement or systematic discrimination, is illegal = punishable.

2) The whole point of systems is to impact the world, not replicate already recorded history or escalate problems.

Can the ML algo predict what the outcomes and consequences of its outputs are, and assess that? No?

The point is that the algorithm wasn't using skin color as an input, (directly, at least) and got those outputs anyways.

Would you prefer an outcome where half of the white men were also marked as high risk, but with 16 reoffending? (to give the low risk pools the same recidivism rate) Now instead of unjustly marking ten black men and two white men as high risk, you're doing so for thirty-four white men. Is that somehow less discriminatory?

Just because another option is worse doesn't make an unjust and repressive kind of social profiling better. The entire premise of using prediction may be found to have dire consequences, so the entire premise is dubious. There is no binary choice in this matter, but there needs to be accountability.

And if the decision isn't based on race, but on things that happen to correlate with it? Whether your father was present, for instance - since that's a major predictive factor for all sorts of life outcomes even when controlling for race. Should that not be used just because certain groups are more likely to come from single-parent households than others?

Because in the end, the choices are "accept that there will be a correlation between the outputs and race", "use no system that produces estimates or imperfect outputs", or "explicitly discriminate based on race to remove the race-result correlation". That's it.

It depends on the crime. But for crimes like, say, sexual assault being male is indeed a huge factor in likelihood to offend. Disparities between rates of women committing sexual assault and rates of men is immense - one or two orders of magnitude depending on the exact crimes counted.

I agree that an AI shouldn't use race or gender as an input. But we should not surprised when AIs have disparities when predicting risk of committing crimes, when there are disparities in the rates of actually committing crimes.

AI does not, in and of itself, have a goal. AI exists to be used for some human's goal. If the human's goal is to understand the level of association between women and technology without confounding factors like decades of marketing video games and computers to boys more than girls, the historical bias of such factors in the data fed to the AI becomes extremely important.

This is a common misunderstanding. When models impact the world, what matters more are their outcomes and consequences.

Right. And when you force a model to avoid disparities when disparities really do exist, the outcome is inaccurate data and the consequence is that decisions are made based on inaccurate data.

What people in this thread seem to be missing is that modeling and decision making do not have to be done simultaneously. You always want the most accurate model of reality as you can get, but what you do with that model is up to you.

Perhaps you think there are certain feedback loops that need to be broken (your model is usually a static representation of the reality), or perhaps you prefer low rate of false negatives / false positives, or perhaps if a model is uncertain you would like to defer to a human.

If a model predicts high probability of re-offence, you might decide to delay an action as an example, or re-examine the case in more detail.

I get a feeling that machine learning practitioners are not properly trained to recognize such subtleties, everything is a binary softmax output these days :)

EDIT: This seems to stem from the obsession with purely discriminative models that either directly model a binary function y = f(x) or model a function that returns some kind of score, which is later compared to a threshold y = f(x) > theta. Neither of these lend themselves nicely to this conceptual separation between modeling and decision making.

Sure, and there is also the assumption that good decisions are made on perfect/good data, which is also just an assumption. We can't even define good decisions well enough to be sure.

Just because bad data may tend (in hindsight) to yield inaccurate decisions, or decisions based on faulty data, doesn't promote data as the main driver behind sound decisions. That would just be another belief system, and we don't have all possible data or omniscience to prove it!

No, you can test and gauge the quality of that data. At this point, you're not really even making a point about whether or not we should introduce biases in our models to make them match our worldview - you're saying the entire endeavor of trying to build a model is worthless. If what you're saying is true, that we can't measure what decisions are successful or unsuccessful, then there's no point in trying to build a model because we no way of knowing whether the model is working or not.

It is the claim of vendors that their models works. It is irresponsible to just take their word for it, and such tests and critique need be ongoing.

Hypothetical question. Suppose no-one ever noticed the lead in the Flint water supply. People put test scores into an AI and the AI emotionlessly concludes that people from Flint perform worse than average. Should you hack the AI to force it to stop saying something like that, because that's obviously racist? Or should you begin to wonder why people from Flint perform worse, and what is causing that (the undiagnosed lead in the water), and ultimately how to correct the problem (fix the water)?

Incidentally, some actual number crunching [0] shows that the lead levels in Flint were never that high, being less than the average level in many states, and like everywhere else in America had been declining significantly over decades. What's more, in the era of leaded gasoline, almost all children had blood lead levels about 10x what was seen in Flint.

0: https://www.nytimes.com/2018/07/22/opinion/flint-lead-poison...

As always, it depends entirely on what you're doing with the (predictions/estimates/inferences). This is an entirely false dilemma.

When would you want to hack the AI?

Not a rhetorical question. Reading through the comment thread I'm slowly shifting more towards being on the fence, as opposed to having a "don't hack the AI" position.

But rather than hacking the AI, why not just get rid of the AI? What is the point of the AI in the first place if you're going to hack it to get the results you want anyway?

> When would you want to hack the AI?

This is not a meaningful phrase - it adds literally nothing to the conversation but confusion. To bend over backwards to give you a reasonable answer: when you're interested in conditional effects.

Let's say you're interested in the risk of cancer associated with alcohol consumption. People who drink some are often found to have lower cancer rates than people who don't drink at all. Reasonable models adjust for wealth/income - estimating the risk of cancer association with alcohol CONDITIONING ON wealth/income changes the picture; the positive association is clear.

Adjusting for confounding is "removing bias", changing the effect estimates.

In a predictive context, using ensembles, NNs etc, the problems don't go away, they're just sometimes harder to detect (and they're dressed up in sexy marketing-speak like "AI").

Repeat after me three times:

"AI is not a magic truth telling oracle" "AI is not a magic truth telling oracle" "AI is not a magic truth telling oracle"

I meant in my example about undiagnosed lead in Flint. Once you notice that the AI detects people in Flint under-perform, why would you want to modify the AI to avoid the AI detecting that?

Your cancer example doesn't answer the question, which is a pity because I mean it when I say it's not a rhetorical question. I honestly want to understand your point of view better. In the cancer example, we wouldn't go in and filter the training set to force the AI to think that rich people have higher cancer than they actually do, etc. But that's precisely what it seems like people want to happen with AI: they don't want us to condition on wealth/income, rather, they flat-out want us to filter the training set to force the AI to think group A and group B are equal when that's not what the data says.


Isn't "biased in a way we don't expect", itself, a bias?

Unless I'm misunderstanding you, you're saying the AI is biased, and people are biased. Unbiased is an AI that will not miss an outlier.

My field is medical imaging. The most difficult thing to get the AI vendors that come calling on us to understand is that being able to recognize well known patterns in data is of limited use to us. That's what the radiologists at remote sites do all day. It's being able to recognize pathologies that 99.999% of rads would definitely have missed that has value. And the AI's always fail this test.

I've come to believe that "bias" is the primary reason these AI's perform so poorly. They are basically "biased towards the known stuff" in layman's terms. At least in my field, we need AI's that are able to make connections that humans cannot. I've been going to RSNA for decades now, and I would say we haven't seen a single AI that could be said to pass this kind of a test yet.

I've been working in data science or in academia for most of the last decade and would argue that the primary benefits of using statistical models for automation are that they are fast and cheap compared to a trained professional, which frees those trained professionals up to spend more time on the difficult judgment calls. I've seen very few examples of superhuman performance on complex, practical tasks (although it is very satisfying to train a model to that level).

> And the AI's always fail this test.

This isn't quite true as it has been successful in limited scopes since the 90's (i.e. well before the existence and rise of deep learning). See for example microcalc TP rates on CAD breast screening that was beating average radiologists back then, but not on general screening. This is somewhat specialized though, and there are important constraints to screening also, as opposed to diagnostic radiology.

But you are right in an important way - it isn't a feature for the current throw-a-ton-of-data-at-a-deep-model approach that a lot of "AI companies" are built around, and I agree they are mostly knocking on the wrong doors in medical imaging.

Importantly, if you are trying to replace people at a relatively simple task and make it cheaper your approach will look much different than the case you describe, where you are trying to learn unusual configurations and long-tail sort of things.

ML in medical imaging is complicated by a lot of factors, but people that think a little transfer learning and a few hundred case reports is going to get somewhere useful are being incredibly naive. Without exception, all the promising work I have seen has involved painful and expensive labeling and a lot of modeling and pre-processing effort.

> I've come to believe that "bias" is the primary reason these AI's perform so poorly. They are basically "biased towards the known stuff" in layman's terms. At least in my field, we need AI's that are able to make connections that humans cannot.

AI, at least current AI, cannot do this. I'm really confused at the way you say AI perform poorly because they're biased towards "known stuff". We train AI on known data. AI being biased towards "known stuff" is fundamental to the way our current AI technology functions.

You really sound like this topic of bias in AI is new to you. Here’s a real world example: a medical school used to be male-only. Recently they began admitting women. More recently, they began running “AI” trained on past student applications to figure out which applications were likely to be successful students (where successful == graduating normally). Not surprisingly at all, it turned out that being female wasn’t associated with being a good student there, because it was an uncommon attribute of previous students.

This is a poor result. Would you argue that the model should not be adjusted?

The field of AI is not new to me. Quite the opposite, I find that many people in this thread are making fundamental misunderstandings about what AI and machine learning are about. Hence the commenters like the one above which say AI is limited because it is biased towards past data, which sounds ridiculous to anyone that works with AI because using past data to train the model is fundamental to how AI works.

Yes, if people feed in a bad data set the results are going to be bad. Like how a pentagon project tried to use AI to classify Russian vs NATO vehicles. The photos of the former were taken on a sunny day and the latter on an overcast day. The result was that their model was just detecting the average brightness of the input.

But what happens when the data is valid, but people don't like the results because it doesn't match their worldview? Altering the model to match one's world view is not elimination of bias, it's deliberately introducing bias.

In fairness here, we're not talking about data that doesn't match some world view or some such nonsense. This is diagnostic imaging. The data just doesn't have any world views. It's a tumor, or not. It's malignant, or not. The stenosis calculation is what it is. None of these things are liberal or conservatives. They don't have a narrative.

So when we're evaluating these systems, (again, systems that always fail due to bias), our only concern from a diagnostic viewpoint are patient-centric metrics. The most important being improving outcomes.

By way of illustration, let's suppose a woman goes for a mammogram. Now a radiologist and these AI's can all identify tumors in the resultant DX study at about the same point in said tumor's development. This leads to our current 2 and 5 year survivability rates. Consider thoughtfully here, the name of the game is not liberalism, or conservatism, the object in our game is to improve those survivability metrics.

Here's the thing though, due to the fact that the AI is biased towards things we already know, using it doesn't allow us to identify problems any earlier. Being able to spot issues earlier would improve 2 and 5 year survivability metrics, but current state of the art AI's won't do that. This is the problem in a nutshell.

The problem is not that the resultant data does not fit the "narratives" of our radiologists.

I'm not certain you understand that this is not a political argument. The AI's that vendors are attempting to apply towards problems in this domain simply don't improve healthcare outcomes. If they did, we would aid in deploying them. Without hesitation.

Now my assertion is that these AI's fail to improve healthcare outcomes due to the bias inherent in the data they use to train on. Again, that's not a political statement. It's just fact. The AI's seem mostly trained to internalize, and replicate as output, all of the well known patterns in the data. This is why these AI's are of severely limited use.

I don't think you're understanding the point I'm making. I agree, we should make our models that are as accurate as possible. But this view is not universal. There are some that point out that prioritizing accuracy may create disparities in the results of the model. A model that is meant to predict recidivism will flag African American more often than Whites and men more often than women because the former of these groups re-offend at higher than the latter.

Some see this fact as problematic, and advocate that we should deliberate bias the results in order to create a more equal outcome. And the arguments of whether or not we should bias our models are frequently political in nature. The complaints that AI is biased is frequently because it produces results that some people find are problematic, but the underlying pattern it's identifying is true. It's really that the AI matches patterns without our cultural sensitivities about what topics we're supposed to tiptoe around.

Well again, you're making political arguments.

The AI results are the way they are, because the data that is input tells the AI to spit out those results. In other words, the AI isn't telling you anything that a human didn't tell it to tell you. The valuable insights are insights that no human saw coming.

So let's take your industry as an example with which you can maybe better relate. Humans give the AI data that says that africans and men have higher recidivism. The AI then spits out results that says that africans and men have higher recidivism. Basically spitting back the information that we already gave it. OK, so far, so expected.

Now here's the thing, the valuable insight would be for the AI to suss out some counter intuitive result that no one would ever have gotten. The result that this particular african, or this particular man will have lower recidivism. That's the valuable insight. That insight saves the enterprise money that the enterprise would not have saved otherwise.

My point is that current AI's can't do that, which makes them not terribly useful. At the moment, they just parrot what any human would tell you anyway. So why pay 2 mil for that? Can you see what I'm saying? This problem is even more acute in diagnostic imaging, because the AI is not improving outcomes. So what, exactly, would we be paying 2 mil for? It's absolutely problematic for me to go out and put an enterprise on the hook to the tune of 2 mil for a black box that sits in a room, looks cool, and tells us things we already know. (In fact, the box would tell us things we told it to tell us.) We can save the money, by a 30 dollar digital recorder, and have it play the doctors' diagnoses back to the doctor after every patient meeting if that enterprise is really that much into the parroting.

My problem with the bias in the AI's I've seen is not political, it's practical. This stuff, as it is, is not terribly practical because it is too biased to offer any truly unique actionable insights that add value which was not there previously.

At this point we are diverging almost entirely from questions about whether AI is biased, and more about whether AI is cost effective.

The allegations of bias in AI are frequently about inter-grouo differences in things like job advertisement, credit ratings, parole decisions, etc. which are not explicitly political but the allegations of often political in nature.

Sure, there's not much controversial about predicting tumor growth. But are the allegations of bias in AI about models predicting tumor growth? No, they're usually about the topics I listed above.

That's true, a lot social activists are very much in favour of putting biases in systems. That's also pretty much what most gov policy is, various competing special interests fighting to intervene in various systems in a favourable direction.

A danger of this view is that it can turn a descriptive view of the world into a normative view of the world.

Suppose an AI does detect that men are more technical, on average, than women. It would be a colossal mistake to then assume that our society should be a society where men are on average more technical than women based merely on the fact that historically that has been the case.

Would this sort of AI have entrenched past social evils, such as slavery, if in the 1860s it classified people according to the data available at the time? The fact that an AI classifies things accurately given the statistical data at the time should not be confused with giving a normative claim that that is the desired state of affairs of the world.

> Is the problem that AI is biased? Or is the problem that AI lacks the biases that we expect people to have?

Neither. The problem referred to is that if the data sets contain biases then the machine learning will learn them.

Sure. But what it means to "contain biases" is a very ambiguous statement. If, say, a dataset was fed to the model that had an underrepresentation of women in tech that would be a biases input. But if the AI was fed a an input that had 20-25% women in tech, which is a representative sample, and the AI associated men with tech because men make up 75-80% of people in tech thats not a biased sample.

Sure, some may point out that biases in other parts of society may be part of the reason why tech is 70-80% male, but the AI itself is not biased. If we were to engineer the AI to produce equal results despite unequal input we are engineering biases into the AI, not removing them.

It really isn't ambiguous. What is less clear is what to do about it, and where that even makes sense.

There are few meanings of bias that are important to keep clear about when discussing things. (1) There is the statistical sense of a biased estimator - one with a consistent trend in it's error. (2) There is the notion of bias introduced by data sampling. No matter how perfect your algorithm is, if the training data is a poor sampling of the general population you are targeting, you are likely introducing systematic bias (for example, early face detection approaches had best performance on Caucasian, male, college aged faces - people were using the data it was easiest for them to collect[1]. Finally (3) if the data you have access to has encoded a systematic bias, even if the first 2 have been avoided you are at best able to reinforce that bias.

Here we are mostly talking about latter one, and problems being encountered it (and a bit of 2). This is exacerbated by a combination of machine-learning and AI people being fairly unsophisticated about data on average (as opposed to data handling), and popular techniques these days (specifically, deep learning) de-emphasizing feature design making it harder sometimes to see what is happening.

Nobody serious I have seen is advocating engineering outcomes into AI to adjust outcomes.

For the sake of argument let's assume there is in fact an over representation of men (compared to women) in technology relative to capability and desire. And that we are designing an AI to make or aid hiring decisions for entry level jobs. The answer then is not to engineer in a quota for women applicants, but to remove gender entirely from the training and evaluation inputs. This would give you exactly the desired outcome, no?

You refer to the type (2) bias issue if we systematically under represent women in tech in this case, which would be a problem. However the article is focused on the type (3), which is not merely an issue of what is in the data set, but what you are trying to do with it.

The deep issue here is that deep learning approach of throw everything at the inputs and let the network sort it out will capture both (2) and (3) types of biases whether or not we are aware of them. At least in areas where there is very objective proof of potential problems in the historical data (e.g. redlining impact on mortgage decisions) we could get ahead of it and normalize the inputs. But what about areas where it is less clear cut or more contentious?

There is the fact that humans fail in exactly the same way. If I'm making the decision to hire, it's very difficult for me to avoid my own biases. I think one of the things people are concerned about with applying ML techniques to things like this is it gives the appearance of being more likely to be unbiased, and potentially provide cover from those who would like to benefit from it.

I suspect the real answer is to start thinking about these things the same way we have learned to thing about security and cryptography. In other words, for serious work a system is not considered ready for prime time until real professionals have tried to find the weak points and break it.

It is also worth noting that this is not necessarily a problem so much as an operational feature of the approaches one should be aware of when designing and using them. And in some applications it probably a very significant problem.

[1] one of the interesting thing about this and similar problems is how completely predictable it was, and how surprising it was for practitioners to discover it. In this way the ML community needlessly recapitulated lessons learned by other disciplines decades earlier.

Your definition of "AI" seems to be "summary statistics" if all it's doing is saying "here's the current distribution." Men vs women seems like a pretty unsophisticated and crude feature.

If AI's going to be useful it would be able to find out deeper underlying things, faster than we can ourselves. If bias exists in people today, it's going to skew outcomes, so looking at stats on current outcomes alone isn't going to tell you much. An AI that's blind to conditioning and upbringing and everything else that's part of society is not a particularly useful AI, you could replace them with any average person off the street's naive "common sense."

> you could replace them with any average person off the street's naive "common sense."

If only... An AI with common sense is an AI researcher's dream come true.

AI is biased. It is biased towards the past. We feed data from the past into our statistical models and "AIs" and then present the results as though it's what will happen in the future. But the future will not be like the past, not exactly, and thus we have bias.

(As a philosophical question, why do we expect the future to be like the past? Well, because that is how things have worked in the past. And thus we arrive at a nice circular argument. :)

People love the idea of being data-driven, right up until the point where they feed the data in, and what comes out the other side doesn't match their preconceived notions of what it ought to show. Then they start poking and prodding the model, and slicing and dicing the data until it rounds into what they wanted to see.

More broadly, if AI enforces the status quo, then it becomes an enemy of progress. Or rather, it risks becoming the last progress we make.

So far we aren't doing a very good job of coming up with ways to live in harmony with AI, instead of in an antagonistic relationship. We can do better than this.

You seem to be trying to very politely slip in the claim here that some inherent difference between genders is why there are more men 'technologists', but that percentage varies wildly between different countries and companies.

It does not vary wildly between countries. The overwhelming majority of countries fall into the range of 10-30% of tech roles filled by women and the remaining 70-90% by men. I don't think that a wide variety of countries falling into a band of ~20% is what most people would characterize as "varies wildly".

Regardless, the cause of the disparity is not relevant to the discussion at hand. The real world population of people involved in technology is disproportionately of one gender, so an unbiased machine learning model meant to associate terms with genders will associate technology terms with men at higher rates. Sure, some may claim that the reason why disproportionately more men go into technology is because society is biased. But the AI's role was not to model a hypothetical world with no biases. Its role was to model the real world.

As I wrote in my root comment, some may see value in biasing their models to achieve results that fit what they see as an ideal worldview. But the point remains: to do this is to deliberately introduce biases, not remove them.

You don't think that the difference between a 1:9 ratio and a 1:3 ratio constitutes "wide variety"?

That's a proportional measurement. A small difference in an already small value can result in large proportional differences even when the absolute difference is small. This is sort of like how small towns might see their murder rate skyrocket by 300%, but in reality what happened was that there was one murder last year and three murders the following year.

Sure, but 30% of the population is not a small value, either.

Nearly all countries have rates of women in tech in a band between 10 and 30%. No country has a majority female tech workforce. The notion that there is a "wild variation" in the rates of women's representation in tech is simply not true.

I’m not sure how this happened, but somewhere along the line the claim that there are no inherent differences between genders became less extreme than the claim that there might be some. You’re not making the former claim, but I see it all the time and it’s just as nuts as saying that 100% of observed difference are due to biology.

The difference in treatment is probably because these days the people who talk loudest about the "some" are often trying to use it as a cover to support gender-essentialist defenses of an unequal status quo, while people who talk about "none" are obviously inaccurate in some ways but are for the most part harmless.

Well, I don’t know about you but I like my claims to reflect reality and not some ideal that is touted as “probably wrong but mostly harmless!”

The problem is that the problem of bias is undefined, so there's no correct solution or even authority to turn to. We do see problems with systems taking data as gospel, but when the data itself have undesirable patterns, we're at loss at what to do.

Of course, simply saying this triggers some creativity: You could generate artificial data and train on that. However, the problem is then to generate "unbiased" artificial data, and somehow make it useful! ;-)

So who's vision should be realized, then becomes the problem, or solution for some subset of people.

Well, a language model is supposed to model the way that language works, not the way that language is used.

It might be a subtle difference, but "My mom is a doctor" is as correct grammatically as "My dad is a doctor". If a language model is assigning a much lower probability to the former, then it's modelling language _use_ but not language _structure_.

So it's not doing the job it's supposed to be doing.

Humans really don't like to feel like their infinite potential is less than infinite. Pair that with the inherent feedback loop of bias in human controlled systems and viola! Never ending cycles of soul crushing but realistic statistical thinking, and glorious but idealistic "we can be free" thinking.

I am concerned about AI algorithms and especially recommendations algorithms in regards to biases. If someone is in a temporary psychological condition. The ad recommendation algorithms will keep reinforcing that bias behavior in order to maximize ad revenue. As long as the viewer is hooked on the site and keep coming back for more content. The algorithms have little or no sense what bias they are reinforcing. I wish there was more public debate and awareness from programmers about that. Ie we have moral obligations on what algorithms recommend.

Engineers at Instagram, Facebook dot com, Youtube, and Twitter DO NOT want to think about how they are affecting the mental health of their users. All they want to do is spout some platitudes and wash their hands of all responsibility.

I'm curious if some of these are actual discriminatory bias' or just harmless cultural norms. Something like associating men with football can hardly be considered bias.

It goes both ways: Even though you remove the sex/gender variables, the system could correlate football to identify men, and sockermoms!

Scientists should be sophisticated computer users who understand the GIGO (garbage-in, gabrage-out) principle; that's what should put them on guard with regard to any manner of massaging their data.

the new york times article that this links to:



I hope that AI can be used as a mirror for the engineers working with it. AI allows amplification of the subtle assumptions we make in design and hopefully that amplification leads to better understand and appropriate measures to reduce bias.

Less a mirror of the engineers working on it and more a mirror of the data it's fed, which is a really important difference. Nobody thinks "today I'll create a model that hates women", they just aim their shiny new model at e.g. every New York Times article since 1851, then only check for the results they're interested in.

Is there an opening here for some enterprising persons to try to develop systems to try to determine the biases present in training data?

This reminds of of Tay from a few years ago.

>and is more likely to associate men with computer programming

So would any reasonable human. I think the real problem these advocacy groups have with AI models is that they're not biased. They reflect the real world based on data and evidence, rather than conforming to progressive dogma. I fear that instead of using AI to overcome any wrong assumptions we may have, we're just going to get AI diversity officers to "correct" models that draw any uncomfortable conclusions.

> "So would any reasonable human."

That's the entire point of the article, our biases, as reflected in the world, are learned by models. Consider the following:

* More computer programmers are men than women (descriptive statement, no problem)

* A predictive model correctly identifies that more computer programmers are men than women (a prediction based on observed data, no problem)

* A recruiting agency uses a predictive model to recruit computer programmers. Due to the way the model was trained, it excludes qualified women (not OK, a clear misapplication of an algorithm)

Feel free to try this thought experiment in other contexts where existing biases can be amplified through algorithms.

Let's say that there is a population bell curve of the variable "propensity to enjoy being a programmer". One curve for men, and one curve for women. In some dystopian future, everyone takes a career aptitude test, and if an individual falls within the top 10% of the overall population, they get turned into a programmer by the government. This is assuming a perfectly-trained AI judges the variable, and that it controls against the underlying variables of societal bias in order to ensure that long-term human resources are properly allocated, and that long-term biases, informed by people's interactions with eachother, trend toward actual biological differences.

It may well be the case that the gender split is 80% / 20% male to female (as is roughly the case today). It may not. However, the left-leaning zeitgeist opinion would seem to be that this outcome is impossible, and current observed differences are only due to systematic oppression. The right-leaning zeitgeist opinion would seem to be that this outcome would make sense.

I tend to think that the left-leaning opinion on this is so wrapped up in double-think that it can't even understand itself- it tends to argue too much in favor of "biological differences would mean permanent, uncorrectable injustice, therefore it is impossible that biological differences exist."

> It may well be the case that the gender split is 80% / 20% male to female (as is roughly the case today)

Ask the same question in 1965 and get the opposite result.

There’s a lot of motivated reasoning by men that their innate characteristics are selected for computer programming. There’s a lot of hostile behavior by men towards women who are in computer science. You can view this like any other resource scarcity turf protection gambit.

It’s impossible to separate any apparent belief in the natural order of male dominance of the field from this behavior, so to draw conclusions from it is extremely dangerous.

> Ask the same question in 1965

To be fair, though, a LOT has changed about the nature of computer programming since 1965.

> To be fair, though, a LOT has changed about the nature of computer programming since 1965.

Has it though? The Von-Neumann-architecture was already a thing back then, programming languages (BASIC, Fortran, ...) already had many of the things you are still using today (such as for-loops) and any algorithm and data structure thought up back then is still perfectly usable today in most modern languages.

Sure, the whole tooling and library situation is not comparable to back then, but the fundamentals haven't really changed.

Consider these things:

- Binary search trees (1960)

- Linked lists (1955)

- Quicksort (1959)

- Hash tables (1953)

Looking back I rather have to say I am not impressed with advances in practical computer programming since then. The only major change was the introduction of type systems and OO imho, though these were technically a thing already back then too on an academic level.

True, programming is much easier now that you can use words to code, and have programs do all of the actual work of turning that code into usable machine code for you.

Probably the biggest change was the introduction of personal computers, which were primarily marketed towards boys at the time. It should come at no surprise that the result of a huge marketing apparatus appealing solely to one gender would create a generation or two of predominately male programmers.

When you examine the fundamental nature of programming though, not much has changed. In fact things have become significantly easier over time.

> Ask the same question in 1965 and get the opposite result.

Really? Reference, please.

Well, any history book should do.


Women were litterally employed as computers ca. WWII. When the transition was made to programmable machines, they were in majority the first ones to use, operate, program and design algorithms for them. The field was regarded as uninteresting, tedious, and not real engineering by their male counterparts.

Early programming was viewed as "women's work" because of the similarity between between the work and sewing [1]. I don't think it was quite 80:20 women:men in 1965, but it was certainly higher than 20:80.

[1] https://en.wikipedia.org/wiki/Core_rope_memory

> There’s a lot of motivated reasoning by men that their innate characteristics are selected for computer programming. There’s a lot of hostile behavior by men towards women who are in computer science. You can view this like any other resource scarcity turf protection gambit.

That is entirely your opinion. Framing biological differences as turf protection is laughable in my opinion.

You're looking at a societal bias, and inferring a biological difference. In other societies, this bias is much reduced, which would not likely be the case if differences were truly biological.

In particular, Eastern Europe of all places is relatively egalitarian in men-vs-women in programming.

Easter European here. I rather agree that Easter Europe is relatively egalitarian, but it doesn't seem to affect ratio of men and women in IT. There is less than 20 % female students in IT related fields and smaller percent of women stays in the field after graduation compared to men.

I could just as easily say you are looking at a biological difference and inferring a societal bias. Just because it doesn't exist in the same proportion everywhere doesn't mean that the difference is not biologically based. There are endless confounding factors that would lead to variations in different locations. The fact that the overall difference stems from biological differences doesn't require universal equivalence of the effect.

For example men on average have more muscle mass than women. If you found a random population of women that had proportionally higher muscle masses compared with their male peers, that doesn't discount the biological fact.

You're citing an actual biological difference in human biology to try and support an unsupported claim of a nonbiological difference...

One that is refuted by the history of computer programming, which used to be a female-dominated field back when programming was significantly harder than it is today.

I chose a clear biological difference to illustrate the point that was being argued, about one sample population disproving the larger trend. And it's not an unsupported claim there is plenty of science and argument for why men and women choose different careers.

Please note that this doesn't discount any actual gender bias that may exist. The biological claim is only that men and women in general have a given set of personality traits. That isn't controversial science. It also isn't controversial because it doesn't claim that men are better engineers or better anything, just that more in general may naturally gravitate towards certain activities.

Blame the "left leaning zeitgeist" all you want, I've seen no such evidence. What I have seen is a Hacker News thread full of people misunderstanding statistics and machine learning, advocating for bad models, and obsessing about biological explanations for the crazy gender disparity in tech. Whatever real, fundamental differences exist, it's preposterous to assert that our current situation is entirely biological in nature - especially when the toxicity is on such full and self-assured display.

> it's preposterous to assert that our current situation is entirely biological in nature

Careful on misframing the thought experiment: no such assertion was made. The assertion through this thought experiment was that biological nature has an effect, no more no less. The degree to which the effect matters is uncertain, but it is reasonable to expect it does play a role. The reason biological explanation is mentioned at all is because many seem determined to think that no such effect is possible, full-stop.

If you agree that biological differences has a non-zero effect on preference, then there shouldn't be anything to disagree with here.

I disagree with your reading entirely:

"It may well be the case that the gender split is 80% / 20%"

That's a preposterous hypothetical. You're also incorrect in your implication - the existence of minor (and as yet unsubstantiated) fundamental differences in propensity to pursue technology in no way precludes the (well established) social biases and inequities that result in the same. Neither would it prevent these issues from being reflected in statistical models.

Thought experiments about magical perfect AIs are completely and totally irrelevant to this real problem which everyone who uses models should be aware of.

To get on my cranky, gatekeeping high horse, I find it super frustrating to see people who call themselves data scientists misunderstanding this problem. I teach this in introductory statistics to non-technical audiences. I'd hope for better from engineers and computer scientists.

It's interesting that this thread is still going today, and I'm glad that it is civil.

Thought experiments are relevant because they can tease out our moral intuitions. Calling it a preposterous hypothetical is not useful.

gbrown- I think that you have read my original post with a hostile interpretation. I left open the possibility of biological differences in preferences OR zero biological differences in preference. You act as if there is scientific consensus that biological difference in preference is impossible, and that all current differences in outcomes are based on culturally-imposed biases. This is not the case.

>You're also incorrect in your implication - the existence of minor (and as yet unsubstantiated) fundamental differences in propensity to pursue technology in no way precludes the (well established) social biases and inequities that result in the same.

I or other posters did not imply this. You are conflating my hypothetical case (where cultural bias was made irrelevant as much as possible) with real life. Real life does have bias. The reason the hypothetical was presented was for comparison.

I've got a couple (low-ball, civil) questions as a sanity check- 1. Would you agree that men have a stronger biological preference to be warriors than women? 2. Would you agree that there is a greater cultural expectation for men to be warriors than for women?

> Thought experiments are relevant because they can tease out our moral intuitions.

I understand this, but I don't agree that your thought experiment usefully does so. You're essentially begging the question: "Well, what if this is the way it's 'supposed' to be?". My understanding of the science is that there's little actual evidence of difference in fundamental propensity to enjoy certain types of intellectual labor, but lots of evidence of the impact of socialization on the development of young humans. As has been addressed elsewhere in the thread, we have a directly relevant historical example: the distribution of tech labor was quite different when computing was seen as "womens' work". To beg the question as you have, in the face of evidence to the contrary, is unhelpful. One can easily imagine the same hypothetical form applied to other groups - minorities, language groups, etc. While you've couched your argument in terms of "propensity", the structure works just as well (or poorly) for "ability" - and there's a long history in science and society of laundering the latter into the former.

> I think that you have read my original post with a hostile interpretation.

You are entirely correct - both with respect to the framing of your argument, and your apparent understanding of the methods discussed in the article. As to the former, you can't expect to receive a generous response when you accuse those you disagree with of being hopeless left-wing double thinkers. As to the latter, I'm not trying to be dismissive or condescending, but this is literally my area of expertise. I'm also an educator, and it is my responsibility to fight against explicit or implicit biases which affect my students (and which affect who is likely to become my student).

> I or other posters did not imply this. You are conflating my hypothetical case (where cultural bias was made irrelevant as much as possible) with real life. Real life does have bias. The reason the hypothetical was presented was for comparison.

Drawing the analogy between your hypothetical "perfect" system (which I maintain is still under-defined) and the actual problems being discussed is itself a misleading thing to do. There is not a meaningful analogy between (AI/ML/Stat) as practiced today and "perfect" AGI systems.

> 1. Would you agree that men have a stronger biological preference to be warriors than women?

Maybe, though I actually think this framing is problematic. "Warrior" is a social role, and changes in definition and scope over history and geography. Certainly there exists physical sexual dimorphism with males tending to be stronger and larger, if that's what you're asking.

> 2. Would you agree that there is a greater cultural expectation for men to be warriors than for women?

Sure, I think that's reasonable, subject to the previous caveats. Without evidence, I don't know that I'd immediately assume this will continue to be the case as physical ability has less and less to do with conflict - especially over the long term as we continue to evolve physically and socially.

To conclude, my understanding is that we have strong evidence of social structures influencing vocation choice and success. We have little to no evidence that suggests our current social organization with respect to intellectual labor is driven by primarily biological phenomena. In this context, I believe that trying to invent hypothetical scenarios which would justify (by their construction) current inequalities, in the face of evidence to the contrary, is a harmful act. Not only is it scientifically unfounded, it's part of the cultural problem. This kind of discourse creates exactly the environment which would serve to push women away from tech.

I havent dove into evidence much during this discussion, but I agree that it is a good place to argue from. (I would tend to echo a good few Jordan Peterson-style points, such as gender employment ratios in scandinavian egalitarian countries, differences by gender in OCEAN personality factors, etc). I do think there is substantial evidence that personality trait differences between women and men correlate highly across the globe. This should add up to substantial (although not conclusive) evidence that preferences would also be different between genders. Conclusive evidence is impossible without having some hypothetical cultureless test case. I also believe that social science as practiced today is poorly equipped to conclusively answer these questions. Any individual must therefore decide for themselves what their predictions would be on a number of gender-related issues.

"Given an unbiased society, would I expect an equal number of male and female bricklayers?" I would not.

"Given an unbiased society, would I expect an equal number of male and female biologists?" I would not.

"Nurses?" I would not.

For almost any given profession, I would expect an unequal number of workers by gender. To the degree that the observed ratio differs from what I would predict, there lies the surprise. Computer programming is a strange activity, and shares enough in common with other male-dominated engineering fields that I wouldn't be surprised that it is equally male-dominated.

One of the reasons I think that programming is such a tilted activity is that it is a really weird activity. By what strange circumstance did monkeys descend from the trees to formalize logical constructions into software? Given how strange it is to adapt biological creatures to this task, you would expect outliers to participate in the task- it is not unusual to expect the personality differences between genders to dominate in who participates, when the outliers are the only individuals who participate to start with.

Regarding the warrior example- I would argue that even if we all fought wars with robots, such that physical stature was irrelevant, men would still self-select to become warriors (robot-pilots) more often than women. On the OCEAN model, men are less agreeable than women, and across the most cultures of the world, men are more agressive than women. This will likely remain true for millenia.

I'm presenting most of my arguments here amorally. I think the reason you moralize my arguments is that they are construed as justifying existing oppression by gender. I do my best to judge individuals as individuals. I cannot pretend to deny the existence of larger patterns while judging an individual, but I can understand that they will influence my judgement no matter how hard I try. To pretend otherwise is blinding myself. To the degree that I broadcast these opinions, I hope to do so in a way that leads people to only judge other groups in accordance with the predictive power such judgements can actually afford, to hold such judgements weakly, and to always understand that variation between individuals is critical more than anything else. My manner of thinking does risk failing to fight the good fight against oppression- however, I think most injustices in the world are cases of individual conflict, and tinting the daily conficts I resolve on a daily basis with overtones of wider societal struggle does more to confuse than clarify.

My main remaining question to you- if you take my last paragraph in good faith- is whether you think that my manner of thinking can yield good results.

> in favor of "biological differences would mean permanent, uncorrectable injustice, therefore it is impossible that biological differences exist."

That's because the issue of men vs women in programming really is an all-or-nothing topic. Either you believe that a female is capable of being an equivalently skilled programmer to a male, or you don't.

Referencing biological differences always cascades to a question about the innate ability of a female to program. The best example I can point to is the infamous internal Google manifesto on male vs female programmers. If you read that text, it appears reasonable enough: the author thinks that there are biological differences, and these differences might lead to differences in programming strengths. But it is a wolf in sheep's clothing; as soon as you believe that there are differences, it follows that one set of differences must be advantageous to the other.

I can understand that there are biological differences between females and males, but I absolutely and vehemently choose to believe that there is no inherent difference in ability - females are 100% as capable as males when it comes to programming. Full stop.

Yes, this is double think. But I'd rather be a hypocrite than hold a secret belief that my biological sex makes me a better engineer.

It's really strange of you to admit this double-think. You are admitting to yourself that you hold two contradictory opinions simultaneously. I think the best thing to do (which you might be arguing for but cutting corners?) is to understand that population-level trends exist but to still judge individuals as individuals, not as members of their groups.

I also disagree about the quote below:

>That's because the issue of men vs women in programming really is an all-or-nothing topic. Either you believe that a female is capable of being an equivalently skilled programmer to a male, or you don't.

When populations are on bell curves, this statement is nonsensical. Imagining the statement "Either you believe that a female is capable of being an equivalently skilled competitive wrestler to a male, or you don't" would be similar.

What if the question isn't one of capability, but a question of self selection and preference?

Does those things even play a role? If yes, how does that play a role? How large of a role? If it does play a role, why? What's important to women in career choice vs what's important to men? Why is that the case? Is it nature or nurture or both (and to what degree of each influence those choices)? Is it upbringing? Is it pressure from society? Is it barrier's to entry? And to what degree does all that play a role?

I see the potential for a much more nuanced conversation with this topic.

Saying women aren't capable of being a programmer or being successful in STEM is in my mind a garbage assertion.

> Saying women aren't capable of being a programmer or being successful in STEM is in my mind a garbage assertion.

Good, then, that nobody has said, or even implied, that.

That's literally the argument that sgslo was countering. I agree that it does not seem like the most charitable position to argue against though.

That isn't even remotely what the person that sgslo was arguing with said.

Propensity to enjoy a job != capability to do the job

...maybe I misunderstood this statement then.

"That's because the issue of men vs women in programming really is an all-or-nothing topic."

Was there subtext I missed?

Commendable you found the strength to admit the double think. The questions of "innate ability" is ill posed. Let's agree that both males and females can achieve similar levels of skill, given similar level of effort applied in acquiring the skill. There is still a difference. Either:

A. The average male and the average female afford applying the same level of effort in becoming programmers, with no extraneous constraints.

B. Motherhood represents a non-trivial portion of the average female lifetime effort. Though not as large as in preindustrial times, where the norm was conceiving, feeding and raising ten children, most of them not making to adulthood, leaving little energy for anything else.

To the extent child rearing cost remains unequally distributed between males and females, we're going to see statistical disparities in occupations, especially in occupations with high skill acquisition cost. Or we banish motherhood, and go extinct.

Your example assumes that the AI isn't taking your 80/20 split and integrating that back into the test. Even if your ever so nebulous "biological differences" exist, you have already acknowledged that an objective measure of programming ability can be measured independent of gender, what TFA and "the left" are talking about is the distortion of that objective measure by lazily applying that correlation back into the measuring system, eg. making a 80/20 into a 90/10 split, then a 95/5 split, which in your example would be a waste of society's resources and would earn whoever made the AI a death sentence for incompetence.

And if the algorithm came out with the reverse result - an 80 female/20 male split, then the right-leaning opinion would call the outcome impossible (tech does have a strong left-wing bias!). I'm not sure your theoretical example proves anything except most people won't believe reality even if it smacks them in the face.

>However, the left-leaning zeitgeist opinion would seem to be that this outcome is impossible, and current observed differences are only due to systematic oppression.

Almost no one thinks this way and it's definitely not a popular enough opinion to qualify as a common spirit of our times, left-leaning or otherwise.

You are arguing against a very weak straw-man.

> Almost no one thinks this way

You would hope. But don't you remember the huge backlash from Demore's memo? Many people (or at least a very very vocal population) do think this way, unfortunately.

So no, not a strawman.

Your response doesn't at all address gbrown's example of how a misapplied algorithm can exacerbate a systemic inequality. Specifically, you provide an example of gbrown's points 1 and 2 while ignoring point 3, which is the crux of the issue.

The detail "This is assuming a perfectly-trained AI judges the variable, and that it controls against the underlying variables of societal bias in order to ensure that long-term human resources are properly allocated, and that long-term biases, informed by people's interactions with eachother, trend toward actual biological differences" should address that concern.

"Assume a perfect and impossible thing, which is not precisely defined enough to formally reason about, but which nevertheless by construction supports my point, and you'll see that I'm clearly correct."

"controls against...." doesn't mean anything here - you're using the language of modeling, and attempting to discuss a concrete issue, but not connecting the two. Is your goal just to get an accurate prediction based on the status quo? Congrats, you've got a model that still requires nuance and understanding in application.

Well, to create a well-enough formalized method of control for the AI-

As part of a 1,000,000 year long project, 1,000,000 groups of 1,000 human babies (with different groups having different gender ratios) are installed on remote habitable planets and raised from birth by genderless robots, tabula rasa. They grow up and form languages and societies that last for 1,000,000 years. Robots are used to observe their choices and outcomes. The distribution of cultural traits is gathered as data, and cultures which create good outcomes for individuals in accordance with their preferences and for society as a whole are noted as benefit-maximizing. Additionally, the degree to which each society deviates from mean gender-bias characteristics is noted, and the degree to which these gender-bias expectations mold the choices of each individual to a degree greater than mean gender-bias is noted. This data is used to train the "sorting hat" robot which will be used in the example in my original post.

> That's the entire point of the article, our biases, as reflected in the world, are learned by models.

They are not biased if they represent reality based on facts.

In fact, the main issue is that while these algorithms fit reality as it actually is, the critics crying "bias" are complaining that these algorithms don't fit what they believe reality should be.

In other words, the algorithms output predictions that matches reality but critics argue that they should instead be manipulated to output results that suits their ideals.

But that has nothing to do with having a bias.

The post you are replying to has three clear bullet points which indicate where the problem lies. It also directly addresses the point you're making about accurate depictions of reality.

Would you mind responding to the third bullet point from the parent post, that discusses the dangers of naive and biased applications of facts that are based on reality?

> The post you are replying to has three clear bullet points which indicate where the problem lies.

No, it does not. All it does is point possible explanations of why reality is the way it is. Yet, reality is still reality. If models are expected to predict reality then their output will match what we observe in reality. If someone expects predictive models to not predict reality and instead output results that do not match reality and instead comply with someone's personal ideals then that's an entirely different problem: predicting someone's personal goals instead of reality.

The third bullet point of the above question (which you still haven't addressed) directly addresses this.

The problem is not that the model outputs reality. The problem is that the model _incorrectly_ produces not reality, based on facts it _did_ learn from reality.

That is, there is a fact pattern in reality, but the model draws an illogical conclusion based on that fact pattern.

So, I'll ask again. Could you directly address this point from the parent post:

"A recruiting agency uses a predictive model to recruit computer programmers. Due to the way the model was trained, it excludes qualified women (not OK, a clear misapplication of an algorithm)"

> "A recruiting agency uses a predictive model to recruit computer programmers. Due to the way the model was trained, it excludes qualified women (not OK, a clear misapplication of an algorithm)"

Philosophically wrong or not, perhaps excluding those hypothetical women from consideration is the quote unquote optimal approach to allocate recruiting resources. Why this is the case is irrelevant; the onus is on society to 'shape up' and remove the reasons the group 'women'[1] are disqualifying themselves from the fair model.

Of course assuming good faith in model design, training, implementation, etc.

[1] Ditto for men as kindergarten teachers - employment (e.g.) being a zero(ish) sum game, so long as more women are teachers, some other profession will contain an outsized amount of men (and vice versa). There are reasons[2] why more women are kindergarten teachers than men, and any fair algorithm and effective algorithm would be expected to make prediction on these reasons or proxies thereof.

[2] Or there would be an even gender ratio.

Yes, it is explicitly statistical bias, and it comes from extrapolating a system of equations from single point observable outcomes, a situation Economics tends to solve with instrumental variables.

If I can, let me explain using basic quantity supplied and demanded.

Consider the supply equation: Q_s = a + bP + cX, and the demand equation, Q_d = e - fP + gZ. Data only observes where Q_s = Q_d given P, X, and Z.

So what most ML models attempt to do is predict Q as a single fit function of P, X, and Z -- not entirely unreasonable, but misses the structure of the full system. So two things to consider:

(1) Implementing a rule policy based on "measuring as it actually is" may result in a feedback cycle (personally I think there may be one in common real estate property valuation "AVM" models). By recruiters relying on a biased model the class imbalance may actually increase over doing nothing.

(2) Normative ideals, like being blind to anything but skill and opportunity or equalizing background (two common political ideals in the US) cannot be appropriately evaluated. This is Judea Pearl's claim of Pr(Y|X) versus Pr(Y|do(X)).

I wonder if a Bayesian approach couldn't be taken to try to remain free of that bias, since it's basically designed for this (a system you can only observe part of).

For example, the model under discussion, which can tell the difference between men and women and believes in differences between them, probably doesn't know that intersex people exist. It probably cannot learn such a distinction, either. Therefore this algorithm probably doesn't fit what we know reality to resemble.

So there's at least two sorts of biases. There's both biases inside the model, which is what you're thinking of when you use the word "bias", and also biases outside the model, which inform the model's design and construction in ways that cannot even be quantified with only the model's metrics.

> For example, the model under discussion, which can tell the difference between men and women and believes in differences between them, probably doesn't know that intersex people exist.

Considering that the frequency of intersex people in a population lies somewhere between 0.05% and 0.07%, not accounting for a secondary trait such as whether a biological M or F happens to be intersex is an irrelevant classification error that has a negligible (if any) impact on a model's predictive ability.

And by the way, any model can be regenerated if any attribute left out is found to be significant.

Mind giving an example of something that could either be bias or fact?

The definition of bias used in Machine Learning is quite clear, as is the definition of training, validation and test data sets.

So-you do mind giving an example. Telling.

Never trust someone who says they deal only with facts.

I'm fail to see how step 2 leads to step 3. The fact that there are less woman programmers overall should not have any affect on P(good|woman,programmer).

That's a central topic of the article linked, that in practice people are finding it difficult to design and train systems without having this effect.

If you are randomly selecting people from the general population it follows. That is if you just grab the basic info of unemployed and call some in for a programming interview culling females from the list before scheduling the interview appointment will save you a lot of time before you find/hire a good programmer.

I can't figure out why you would use such an algorithm though. Most people who want to hire programmers look at their resumes from people who have self-selected as programmers (possibly on a job search forum) and interview them. If you decide there are not enough programmers you might hire people at random ("would you be willing to learn to be a programmer if paid?" "yes" "hired, you start Monday") - there might be factors to exclude some people because they can never be great programmers, but I'm not aware of what such factors might be - gender doesn't seem to be one though.

2 leads to 3 because in a naively designed system, being a man is a stronger predictor of being a good programmer than being a woman, because there are more men than women in high-ranking and high-paying programmer positions.

>a stronger predictor of being a good programmer than being a woman

Of being a programmer, not being a good programmer. There are more men than women in programming, but the recruiters want to tell good programmers from bad ones, not programmers from nurses. The data clearly shows that a randomly chosen man is more likely to be a programmer than a randomly chosen woman, but that's irrelevant. The likelihood that a randomly chosen female programmer is good should be about the same as the likelihood that a randomly chosen male programmer is good, and that's what hiring managers care about.

And yet this kind of problem is happening over and over and over again in real life, such as the automated risk assessments used in Florida deciding that black people are inherently more likely to be criminal than white people. https://www.propublica.org/article/machine-bias-risk-assessm...

Yes but only in very poorly designed systems. We have those now with AI it's no different. I still see no reason to jump from 2 to 3 other than incompetence in design.

The other poster stated "predictive model". A predictive model could easily identify males to be "more fit" (higher p-value) than females, in aggregate. The programmer tries to evade this bias, by ie. removing gender values in the system. However, the system STILL identify males "more fit", due to correllated values (likes Mancester United/Liverpool). Machine Learning is not much more than advanced statistics, so data biases gets amplified, thus naive application is often a poor fit. Even worse, the experts are at loss on how to make such predictive systems "perform better", without introducing biases on their own, since the data and categorizations themselves may be full of biases.

In the real world, incompetence may not be a huge hurdle when selling complex systems. Also, these biases are invisible, until one thinks about them or spots them in the wild.

Lots of people think "AI" is a magic bullet that will obviate any need to actually think about this stuff.

> I'm fail to see how step 2 leads to step 3. The fact that there are less woman programmers overall should not have any affect on P(good|woman,programmer).

You're correct that there should not be any affect.

The problem is that many systems are designed such that there is an affect.

Are there such systems? Every paper I saw use trivial models which are unlikely to be used in real life. Like this article, which checks gender associations for words like “money” - that’s not how one would be using AI for hiring in any practical way!

What is the predictive model predicting, and why would a recruiting agency use those attributes? Any model using nonsense data in predictive model can imminently be ruled out, and I can make a trivial example like.

* Most programmers have 10 toes. Fact.

* A predictive model correctly identifies that more programmers have 10 toes than 9 toes.

* A recruiting agency uses the predictive model and exclude programmers with 9 toes, but why? What possible use could they have with that model?

The problem of recruiting agency that using biased predictive models is when the data is relevant for the recruitment purpose but create socially unacceptable conclusions such as:

* There are more computer programmers that leave after the first year who identify themselves as women than men. (descriptive statement)

* A predictive model use gender as data point to predict employee retention.

* An internal recruiting agency within a company has goals of hiring employee with high retention in order to reduce training costs and uses the predictive model. This result in excluding women.

The attribute being measured in the later case is relevant for the purpose of the recruiting, but it amplify gender segregation. When discussing AI recruiting bias it is this kind of problem we should focus on rather than looking at attributes which has no bearing on the purpose of recruiting. Attributes which has nothing to do with the goals of recruiting should not exist in a model for recruiting.

Not being educated in anything to do with AI or machine learning, it seems to me that "amplification algorithms" is exactly what these systems are. When I look at a DeepDream image, it appears as though it has amplified many defining characteristics of the original images.

Conceptually, reinforcement learning seems very similar to biological positive feedback mechanisms. These mechanisms have guided human development, and cognitive bias is an innate aspect of being human so it stands to reasons that non-biological systems developed in the same way would exhibit biases.

Yes, step three is an abomination of logic.

It is not, however, a problem with AI, software, etc. It's entirely the fault of fools at the agency, and perhaps shysters selling them crap software.

Garbage in, garbage out.

A reasonable person isn't going to say that the asymptotic complexity of merge sort is more masculine than feminine.

If the goal is to learn whether programming associated with gender, it's unbiased. If the goal is to learn whether the definition of programming is associated with gender, it's biased. The usual goal of NLP is to learn language and (waves hands) meaning. The current tools do this by learning associations, but that's the method, not the goal. The critique is on the methods. The critiques are saying that it's led to trouble, which we'll have to correct for. We want to make it unbiased in the definition based sense, where programming is totally unrelated to gender.

We know that boys and girls are conditioned by society to be more interested in certain fields than others. It's why my cousin asked me to get a robot for her son's birthday as opposed to a dress for her daughter. Do we want our biases embedded in AI? Or do we want a model that hopefully can eliminate some of the subjectivity that humans add (which in some cases we absolutely need)?

From what I've read[1], there are studies indicating that higher levels of androgens in the womb are correlated (within each sex) with higher levels of male-typical behavior, in humans and in other primates. They have also directly established causality in other primates by increasing or decreasing the androgens and observing the result. I don't think they've gone as far as tweaking androgens in human fetuses to see what happens—I suspect it would be hard to get the experiment approved—but I know what I expect the result to be.

The next step is establishing just how much of the male-female difference is explained by biology. I haven't studied that part of it. But the "null hypothesis", that all gender differences in behavior are due solely to social conditioning, is wrong.

[1] Consult https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778233/ and the studies it cites in the second paragraph.

> The next step is establishing just how much of the male-female difference is explained by biology. I haven't studied that part of it. But the "null hypothesis", that all gender differences in behavior are due solely to social conditioning, is wrong.

This is interesting and important science, and you're right that clearly there are some gender differences... BUT

1. Given the history of (pseudo)science in justifying and worsening disparities between groups, incredible caution is needed - especially when making policy. 2. The existence of biological phenomena in no way discounts the strong social cues and pressures which shape young humans. 3. The existence of detectable, biologically originating average differences between genders in no way justifies the use of algorithms in such a way that individuals are punished for their group membership.

On HN, these threads always seem to go straight to "BUT THERE REALLY ARE DIFFERENCES!!", which in my mind is completely missing the point.

The flip side is going straight from "The fraction of women in X group is much less than 50%" to "Therefore someone is discriminating, creating a hostile environment, socially conditioning people, and/or otherwise behaving badly—and if everyone involved appears to be behaving well, then it must be due to either someone unconsciously doing these things, or someone hiding their bad behavior, which we will proceed to expose by calling out a scapegoat, applying the worst interpretation to all the ambiguous things they do, and attempting to ruin their careers." I would say that incredible caution is needed on that side too.

In terms of algorithms: What do you want your algorithm to produce? Gender parity? A distribution that matches what would be reality if biology were an influence but social conditioning weren't? Maybe what would be reality if biology were an influence, and people were encouraged to follow their passion, whatever it is, but scrupulously avoided recommending any particular activities to particular children until after learning what their passion was? ... Or do you want it to analyze reality as it is and produce decisions that are useful in this reality?

Ideally, the algorithm would be fed enough data about the people it's judging that it could make a properly fair decision. If all the algorithm gets is "Male, 22, minored in CS; female, 22, minored in CS", then the only thing it has to go on is the sex difference, which is probably a significant one, and then we can argue about what it should do. However, if it gets "Female, 22, minored in CS, majored in math and was top of her class, has spent 400 hours on personal programming projects / Project Euler / Topcoder / chatting about data structures with her friends, has been using the Unix command line on a regular basis since age 12", that looks like a much stronger candidate—and, importantly for this discussion, a male with those same qualifications would probably be an equally good candidate. I think that, if you have the same level of intelligence (perhaps specifically the math/symbol-manipulation areas), the same level of enthusiasm, and spend the same number of hours on the same activities, then you should get roughly the same results no matter what your sex is; sex is only useful to guess at the enthusiasm and hours spent when those variables are missing from the inputs.

That's the solution as far as I'm concerned: More input data so the algorithm can be fair. It'll consider all the people who've spent 1000 hours studying and practicing programming and achieved the same level, and it'll judge them the same, and that is fair, even if it happens that 80% of those people are men. Unfair would be if the algorithm only gets the superficial data and has to guess, based on group averages, on everything important, and penalizes the truly capable women because it's unable to distinguish them from the average woman.

1. I feel like your flip side is a straw man.

2. Your description of algorithms and how they can/should behave is completely beside the point. Thats the problem with vague marketing language like "AI". The article isn't about artificial generalized intelligence, it's about machine learning algorithms used widely today. These are problems now, not hypothetical discussions about how we want some idealized "AI" to behave.

Brainwash, a Norwegian documentary does a good job : https://www.youtube.com/watch?v=tiJVJ5QRRUE

Among other things, he visits a doctor who has to decide which gender to choose for babies who obviously need some work ( sorry for the wording ).

That documentary is in fact how I originally learned about Baron-Cohen's research. It's good.

> conditioned by society

The only conditioning I've ever seen any real evidence of is _against_ what would be called "social norms": teachers, parents and media work overtime to try to get girls interested in math, science, and programming and if anything actively _discourage_ boys from it.

If the "quiet part" beneath that "loud part" is that girls are being encouraged and boys discouraged because STEM is "boy stuff" and girls need to be pushed into it, kids will pick up on that. And it only takes a tiny bit of a "boy's club" atmosphere before things start getting hostile to the women present, and that quickly snowballs.

What do you mean “conditioned”? I see no evidence of this. Kids here i scandinavia are free to do or pursuit whatever they want. And from a very early age, before they are able to talk, boys like things, girls likes people.. And later, they choose accordingly.

A simple example is toy stores usually having "boy toys" and "girl toys" areas, with tech-related items included in the former more than the latter.

Is that boy toys, or tech related? Girl toys area or doll area. Hard to tell the difference. Yes the dolls will have more princesses, but is that demand or conditioning?

According to the Vedas, mind gets conditioned simply by living life. It is so ingrained, we don't even notice it, but we can't live a life without it either. It's not a negative, but an awareness thing.

Here is a simple example. Everyone, including most children, agree pink is a girl colour. There are times and places in history where everyone, including children, agreed pink was a boy colour.

Society shapes children even before they can speak.

I am also in Scandinavia, have small kids and am around a lot of parents, and I see plenty of evidence for this.

I am also around a lot of nerds, and I also see plenty of reasons why young women opt out of nerdy circles.

Don’t dance around your point, make it. What are the reasons?

Yeah, and it's not necessarily an overt thing. As an example, we were very gender indifferent with our first daughter. She loved "boy" and "girl" toys. And colors. She loved snakes, reptiles, bugs, spiders, etc. She loved to get dirty and roughhouse, which we were fine with. She was mostly the opposite of the girls our local parent friends were raising, whom weren't even allowed to play in the dirt.

For her 4th birthday, we had a reptile guy show up. She was the only girl child that would touch any of them, let alone let snakes slither around on her. She loved it.

It's not like we shut her off from girl stuff - her only cousin was an older girl that liked "girl stuff", and our kid looked up to her. We just always presented them as equal options for her.

Once she got into preschool things started changing. After a year, now there are boy toys and girl toys. Colors are gendered. Bugs and reptiles are icky. Dirt is yucky. And so on.

I'm not going to make a claim that any girl I've never met has or hasn't been affected by society to like "girl" things. But it would take an act of god to convince me our daughter wasn't.

AI/ML can be used to analyse the current state of things, and in that regard I think your statement is correct.

It can also be used to make predictions and decisions.

The question "Is this person likely to be a programmer?" would probably accurately identify a man as more likely to be a programmer, revealing the bias in the data and our society.

But "Is this person suited to be a programmer?" sounds similar but is a very different question and we shouldn't confuse the two, especially when letting a machine decide.

Right, it is cognitively negligent to ignore these things, at least in the context of what an exact machine will tell you.

That doesn't mean accept it, it means to implement the exceptions and a way to update them or understand sensitivities

The whole point of Machine Learning is that you don't, and shouldn't, implement exceptions, manual weights, any of that complex parametrization, etc. The machine should figure all that out itself, even if it needs millions of internal parameters to do it!

Sounds neat, until you trust it with your life.

ah yes the cultured machine learning proponents and their reputation for picking up social cues

Almost. We have to be careful about the data generating process and how we describe the output of AI. If there systematic bias in data collection, it flows through.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact