Hacker News new | past | comments | ask | show | jobs | submit login

On a simple mathematical basis, this is false.

Consider two groups of candidates for a scholarship, A and B. We want to select all candidates that have an 80% or better chance of graduation. Group A comes from a population where the chance of graduation is distributed uniformly from 0% to 100% and group B is from one where the chance is distributed uniformly from 10% to 90%, with the same average but less variation in group B.

Now suppose that we select without bias or inaccuracy all the applicants that have an 80% or better chance of graduation. That means we select a subset of A with a range of 80% to 100% and a subset of B with a range from 80% to 90%. The average graduation rate of scholarship winners from group A will be 90% and that from group B will be 85%.

But we haven't been biased against A. We've selected according to the exact same perfect evaluation process and criterion from both groups. It was just their prior distribution that was different.

The actual applicant groups for jobs or financing in the real world, when they are divided by demographic factors like age, sex, race, and educational level, will almost always manifest different variances in success levels even when the averages are the same. That makes this test useless and mathematically illiterate.

And when we use a normal distribution, as we should always expect given the central limit theorem, the mathematical problems get even more intense.

This short comment is not up to pg's usual high standards for his essays.

It's true that this test assumes groups of applicants are roughly equal in (distribution of) ability. That is the default assumption in most conversations I've been involved in about bias, and particularly the example I used, but I'll add something making that explicit.

I like the idea, but how do you apply this to power law distribution outcomes and get any statistical significance? I don't know the answer.

E.g. the underlying First Round's analysis likely has no statistical significance. Assuming the power law distribution of outcomes top 5 outcomes will account for 97% of value. So we now have a study with n=5.

To make the point let's apply this to YC's own portfolio. Assuming Dropbox, AirBnb and Stripe represent 75% of its value, we'll learn that YC is incredibly biased against:

  * MIT graduates
  * brother founders
  * founding teams that do not have female founders
  * and especially males named Drew
Hard to believe these conclusions are correct or actionable

> I like the idea, but how do you apply this to power law distribution outcomes and get any statistical significance? I don't know the answer.

See my post


where are distribution-free. So "power law", Gaussian, anything else, doesn't matter.

If feel the addition:

    "C" the applicants you're looking at have roughly 
    equal distribution of ability.
makes the reasoning more tautological/weak.

If we take two dart boards (one for female -, one for male founders) as a visual, where hitting near the bull's eye counts as "startup success".

If we take "C" to be true, then the darts would be thrown at random.

Now we draw a circle around the bull's eye. Anything landing in this circle we fund. If this circle has a smaller radius on the female dartboard, than on the male dartboard, then evidently the smaller female circle will contain more darts closer to the target (better average performance) than the larger radius male circle.

But then we do not even need performance numbers: Smaller radius circles will have less darts in them. Using "C" we only need to know that the male-female accept ratio is not 50%-50% for us to have found a bias.

In short: If you see a roughly equal distribution of ability, and (for simplicity) a roughly equal number of female to male fundraisers, then you should always have a roughly equal distribution of female to male founders in your portfolio, performance be damned.

The technique is still useful for when you do not have these female vs. male accept ratio's, and a VC publishes only success rates, but this information on ratio's is often more public than success rates/estimates.

Doesn't this logic assume that there are the same number of darts thrown total at both boards?

The issue with founder funding is there are fewer female applicants than male applicants, and the applications aren't published.

I am sorry for all posts in this thread (including this one). Imagine being PG and reading 200+ negative replies to a blog post you did. I could have reasoned in line with Graham and learned a lot more than when resisting and attacking a viewpoint different than yours.

I feel that a different number of darts is salvageable for this logic, but having thought about this blog post some more, I feel bias is inherently non-compute-able. Our decision on how to compute influences our results.

What PG did for me was show that there is no Pascal's wager in statistics: All outcomes/data/measurements/views are equally likely. The view that the female variable alone is able to divide skill/start-up success is weak. The assumption of non-uniform points is weak. The assumption of no variance/unequal rankings is weak. The assumption that a non-random sample is significant is weak. The assumption that VC's are unbiased in their selection procedure is weak. The assumption that nature/environment favors skilled women is weak. The assumption that decisions of who to fund does not influence future applicants. The assumption that women are still selected for capability is weak. The assumption that women ignore nature/environment and keep focusing on start-up capability is weak. It is much more likely that any other thing happens. PG's alternative is certainly a sane one, but one of many.

Perhaps women perform better because, while VC offers the same chance to men and women, they are better at picking capable women than capable men. Bias in favor of capable women.

Perhaps women perform better because, they are naturally better than men.

Perhaps women perform better because, VC is biased against women, and only the strong survive.

Perhaps women perform better because, affirmative actions to remove the inequality in performance (perceived bias) actually increased our objective bias.

Perhaps women perform better because, VC is bad at picking capable women, so they pick incapable women, of which there happen to be a lot more.

Perhaps women perform better because, now the smart and capable women start to act like the mediocre ones (bad funding decisions influence actors looking for reward)

Perhaps women perform better because, nature is "biased" against older risk-averse, but available, men and, older, unavailable women who have children, and nature favors both young males (who have to compete with the old males) and females (who compete only among themselves).

Perhaps women perform better because, our sampling method was biased.

Perhaps women perform better because, our measurements were 5 years old and we are seeing an old static state of a highly complex dynamic system.

Perhaps women perform better because, they are more variant. The good ones are really good and the bad ones are really bad, making it easier on VC's to pick the cream of the crop.

All I know is how little I know. That (algorithmic) bias is an important subject, worth thinking about, and that we need very smart people working on this subject. I would never have gotten away with upvotes on my posts in this thread if the subject was cryptography. I clearly know very little about both subjects (and only now I know that, which I hope is at least a start).

PG showed that we (I), perhaps too easily, go along with the status quo: Our measurements are all correct, our conclusions are all correct. While, if you think about it.. objectively I agree that women and men are equal in capability. If you believe this to be so, then you may have a selection bias, if you observe that men and women perform differently.

I think the least all views could do is to make sure the environment for female founders to flourish is healthy and in line with skill/capability. Then let nature do its thing.

P.S.: If we know that females actually perform better than males, what is the ethical thing to do? Fund even more female founders and make it harder for men? It would make you richer. Affirmative action? It would not remove a bias, it would introduce one.

Assuming that the distributions are exactly equal, the test would still give misleading results in situations in which the bias does not manifest as a different cutoff.

For example, if a VC funds all male founders but flips a coin to decide whether to fund each female founder, the test would fail to detect overwhelming bias.

Obviously that specific scenario is not realistic, but I believe something like this is plausible enough: A VC funds all male founders who are considered promising, and all female founders who are considered promising AND went to school with one of the partners.

And it's not hard to imagine a plausible scenario in which the test would give false positives rather than false negatives.

A small request: When you amend an article after it is published please note the change in a footnote. It took me a long time to realise the top comment on HN was referring to an older version of the article that didn't mention the equal distribution of ability.

Correlation != causation, and even if the correlation reveals something, with no explanatory theory, we are still at step one.

This essay was based on these two lines from here: http://10years.firstround.com/#one

> That’s why were so excited to learn that our investments in companies with at least one female founder were meaningfully outperforming our investments in all-male teams. Indeed, companies with a female founder performed 63% better than our investments with all-male founding teams

The comparison is not clear, but is not women versus men, but between companies with X number of males plus at least one female founder, versus those with zero female founders and Y male founders.

If we skip a step and take this fact as having some predictive value, it could be lots of things, including off-the-top-of-my-head:

1. Bias against women - which extends to teams that include men, e.g. the bias against woman exists in the presence of male co-founders.

2. That the personality traits shared by groups where women co-found startups with men are positively correlated with success. It is quite possible that these groups have much better EQ, while still retaining the IQ to impress the required amount to be selected.

3. That startups with at least one female select, and I am using this term in a very stereotyped way, "not-white-male" startups. Many Unicorn startups, from Atlasssian to Dropbox, specialise in problems faced by, again for wont of a better term, "white males". Given the mantra of solving problems we have ourselves, it is possible that mixed groups choose less male subjects. As men have been the founders of the majority of startups to date, there must be a plethora of such startup ideas left untouched. One example is DIAJENG LESTARI who started https://hijup.com/, described as "A pioneering Muslim Fashion store. We sell fashion apparel especially for Muslim women ranging from clothing, hijab/headscarf, accessories, and more." Little to no chance the archetypal "white male hacker founder" has that idea.

That's three ideas off the bat, only one of which is bias. It could still be bias, but I feel that points 2 & 3 are at least good candidates for exploration. Personally, I think there must be a lot of low hanging fruit in ideas not aimed at men, and female founders seem ideally poised to have those ideas.

No, in the null hypothesis, for the distribution, don't have to care; the distributions of the measurements for the men and the women can be different; the work can be distribution free, except assume that the expected values on the measurements are the same for the men and women.

For details, see my post


No in the null hypothesis, for the distribution, don't have to care, can be distribution free, except assume that the expected values on the measurements are the same. For details, see my post


The problem here is language and what our actual objectives are.

When people complain about bias, they are not really talking about mathematical bias, but about something else: Their idea of fairness. They are talking about discrimination. And when we are discussing that, we can't really think about whether rules are applied fairly or not, but whether the rules produce the outcomes that we want.

Let's go for a ludicrous example: We'll accept all applicants whose IQ is higher than their weight in pounds. We'll be explicitly discriminating against heavy people, but at the same time, we have pretty clear implicit biases against men, and ethnic groups who tend to be taller. We might as well have said that we prefer children and Japanese women. There's no need for mathematical bias: The bias comes from the rule selection.

So, in your example, if our actual objective is to graduate an even amount of people from groups A and B, we have to, explicitly, make it easier for group B to get the scholarship. And many times organizations have objectives like that.

As a more real example, let's consider a police department. If the objective is to have a racial makeup that represents the community, and different races have different drop-out rates, the candidate selection will prefer one kind over the other, precisely to counter the drop-out differential.

So when regular people, and not mathematicians, discuss bias, the mathematical definition is unimportant. The one important thing is our stated objectives.

Your comment makes sense; indeed, fairness in selection process does not imply that people aren't being discriminated against. For instance, the SAT is fair, but denies those with less opportunities a chance to get into top-tier schools. I can get on board with that.

> We can't really think about whether rules are applied fairly or not, but whether the rules produce the outcomes that we want.

This is a more explicit way of phrasing an attitude that I've noticed in my community (a liberal U.S. university). However, I don't think it's obvious that this is the right principle to uphold.

I squirm with discomfort at the idea that we will only support "fairness" and empirical data to the extent that it is applicable to the outcome that we personally desire. This seems to imply that all evaluation metrics are "biased", until we can find a measure that selects equal representation across all demographics, regardless of the size of applicant pool or ability distribution among that pool.

What outcomes, exactly, do we want? More representation of under-represented groups? How does this relate to the goal of maximizing return on the portfolio? What does this mean for people who want a "meritocracy" (if such a thing can exist)?


It's not easy to tell when bias shows up. Collage rankings might look like an unbiased formula, but it's selected so the 'top' schools end up being highly ranked instead of measuring useful things. Things like a high faculty to student ratio don't actually directly have much impact but it's the kind of stat easily gamed by 'top' schools so it's gotten some sort of mythic importance even if these people don't actually teach undergrad classes.

You can find the same inherent bias in many walks of life. Many of the hurtles to becoming a Doctor have nothing to do with being a good Doctor there just there to ensure the right kinds of people get into and out of the program.

What surprised me a bit is that pg decided to use the word "bias" without any clarification, considering his background in computer science and AI.

Anyway, I think pg's whole argument is rather moot because the three assumptions that he states are incredibly difficult to measure (Part of the reason why it is very difficult to argue for or against affirmative actions without coming across as "biased").

Unfortunately I think this is a problem with many of his essays. They often present a very specific argument with reservations, which makes the argument very hard to disagree with since you have to argue relevance which requires a lot more insight. It's therefor taken as truth by the readers, even if the original argument don't support their conclusion. In general I think they should be seen as opinion pieces rather essays. I have a hard time seeing many of them being up to e.g. basic university standard.

Great comment. There are two types of fairness, (a) fair rules, and (b) fair outcome.

Which are essentially two of the big branches of normative ethics: deontology and consequentialism.

Favoring fair outcome over having fair rules is against everything I believe. We should not strive for a participation trophy culture.

"Favoring fair outcome over having fair rules"

Which essentially no one does? People favor fair outcome because they don't think the rules are, or can be, fair and often as a proxy for rules becoming more fair.

I disagree. Rephrasing the tradeoff a little, Westerners are not willing to accept bad outcomes as a matter of course. It is probably objectively fair for individuals to stop wearing seatbelts or to "responsibly" pursue a meth habit, but those things are still illegal.

The "fair" rules would probably be to let people do stupid things and accept their own consequences, but Western culture is not willing to let houses burn down because people didn't buy into the local fire department co-op.

This well-circulated image shows that making everyone a winner has merit in some circumstances.


The left hand side is fair rules, the right hand side shows a fair outcome

It's never a bad idea to think about how these ideas play out when taken to their logical conclusion.

Kurt Vonnegut on the subject: http://www.tnellen.com/cybereng/harrison.html

That's alright if the goal is to help the individuals, like welfare. But it's not OK if the goal is to get people to do the most extreme things, like job applicant selection looking for a "best" applicant or baseball team for a best player. A person who can be successful without needing as many boxes to stand on as others.

Don't forget that it's these "best" people who add a vastly disproportionate amount of value to the world. They're the ones who invent new technology and discover new science. We all benefit greatly from their success.

The link is down (shows a jpeg with a single white pixel), but I guess that one is the same:


It's really more like this version of the image: http://i.imgur.com/DqKXPF3.png

The baseball game in the image wouldn't be worth if equality of outcome were the rule for baseball team tryouts. The entire game is based on fair competition under the rules pushing participants toward excellence.

Inequality of outcome is the entire reason we see baseball played at a high level. When you demand equality of outcome regardless of talent or effort, you're asking for society to stagnate. You're asking for pervasive mediocrity. You're asking for us to kill effort and motivation. No thanks.

You're beating a straw man. He said in some circumstances.

Yet he didn't list any or describe any criteria for evaluating them. The "in some circumstances" bit was just a way to weasel out of potential objections.

I thought the example was pretty obvious.

Indeed. The problem is that people frequently infer unfair rules from unequal outcomes, without taking into account the possibility of systematic group differences.

    Alan: I believe in equality of opportunity, not equality of outcome.

    Bob:  How do you know there isn't equality of opportunity?

    Alan: Well, just look at how unequal the outcomes are!
At this point, Bob would be wise to change the subject, because if he pressed on, he might get this:

    Bob:  Can you give me an example?

    Alan: Group X is underrepresented in Field Y.

    Bob:  Maybe Group X isn't as good at Field Y.

    Alan: What? That's racist and/or sexist!
And if Bob were to make a comment to this effect on Hacker News, he's probably get downvoted. This is because most people agree with Alan, and many of them abuse their downvote privileges to punish ideas they disagree with rather than those that don't further the discussion. This degrades the quality of discourse, but at least it helps reassure the downvoters that they aren't racist and/or sexist.

Bob: Can you give me an example?

nl: Sure.

To overcome possible biases in hiring, most orchestras revised their audition policies in the 1970s and 1980s. A major change involved the use of blind' auditions with a screen' to conceal the identity of the candidate from the jury. Female musicians in the top five symphony orchestras in the United States were less than 5% of all players in 1970 but are 25% today. We ask whether women were more likely to be advanced and/or hired with the use of blind' auditions. Using data from actual auditions in an individual fixed-effects framework, we find that the screen increases by 50% the probability a woman will be advanced out of certain preliminary rounds.[1]

Bob: What? But that doesn't count because...

[1] http://gap.hks.harvard.edu/orchestrating-impartiality-impact...

I didn't say there are no valid examples of bias, just that many people assume unequal outcomes must result from unequal opportunity, ignoring the possibility of real group differences. Surely you know many real-life Alans who see bias every time a particular Group X is underrepresented in Field Y. Moreover, the values of X aren't random; you'll almost never hear complaints of bias regarding, say, trash haulers, or NFL cornerbacks. (But NFL quarterbacks—ah, plenty of bias there!)

Speculative explanations - i.e. assumptions - are not uncommonly offered in support of the proposition that there is no underlying bias. Nl provides a rare example of the assumptions being systematically investigated.

Right, and unwillingness to consider the possibility of group differences comes from a quasi-religious devotion to the blank slate model of human nature. The way radical egalitarians see it, we're not only equal in dignity, but in potential.

That's a pretty view, but it's inconsistent with reality, and radical egalitarians need to come up with increasingly implausible explanations to explain everyday circumstances that make perfect sense once you drop the blank slate model.

There is a simple explanation for differences in abilities between groups that has nothing to do with their genetics are so-called natural ability: the fact that groups often grow up around other members of their group. Both nature and nurture are largely in common for many groups, so it could easily be either that causes the observed differences in ability.

could easily be either

Or both. They're not mutually exclusive. Do Jamaican sprinters excel because they grow up around other sprinters or because they are blessed with natural ability? Yes. Simply put, or != xor.

Hey rewqfdsa, maybe shoot me an email some time. Address is in profile.

Apply Occam's Razor to these supposed group differences. Which do you think is a more plausible reality?

A. Interviewers prefer candidates who are like themselves, interviewers are mostly white men, therefore most hires are white men.

B. The uterus and melanin both inhibit programming ability, interviewers are perfect judges of programming ability, therefore most hires are white men.

To look at the present (incomplete) evidence and decide that B is the more likely story, is racism/sexism.

B. The uterus and melanin both inhibit programming ability, interviewers are perfect judges of programming ability, therefore most hires are white men.

Serious question: can you at least steel man this point of view rather than making it a ridiculous straw man? If you cannot steel man it, what makes you so sure you really understand the argument?

For bonus points, you can also point out the glaringly obvious complication to this chain of logic: A. Interviewers prefer candidates who are like themselves, interviewers are mostly white men, therefore most hires are white men.

Which is more plausible?

a) the action of natural selection, sexual selection, and the hormone environment magically stop at the blood-brain barrier, or

b) there are real group differences between human populations?

We've already eliminated all overt discrimination. If you continue to cry discrimination, you're essentially postulating a giant unconscious conspiracy. I find the idea wildly implausible. It's much simple to just accept that not everyone is equal in aptitude and ability.

Women musicians started getting orchestra positions in much greater numbers after auditions were made blind.

If biases affect how a professional musician hears music, is it so shocking to think unconscious bias might affect someone's judgment a candidate based on multiple fuzzy factors like ability, culture, and personality?

And that's just for job applications. You really think the criminal justice system has removed unconscious bias?

Criminal justice and orchestra employment are non-market phenomena. The people in charge do not benefit if the orchestra is great and are not accountable if innocents go to jail and murders spree freely and the tubas clank.

So of course the bosses pick out their friends and cronies. And a decent polity should restrain their corruption with blind auditions and accountable audits of prosecutions.

But investors should be looking for a good return on their money. They should be looking for the best investments they can find. If they're not, that is the source of bias right there.

Of course, the Wall Street industry is located in New York because you can use big city lights, strippers, and steaks to scam small town municipal pension fund managers who aren't investing their own money. Sand Hill Road is supposed to operate on different principles.

The people in charge do not benefit if the orchestra is great

Have you ever actually worked for an orchestra? I have. The Chicago Symphony, Boston Symphony and other top orchestras take quality very seriously.

Do you think, say, Georg Solti or Daniel Barenboim were happy with "just pretty good" musicians? Their reputations (and fortunes, for top conductors are very well paid) depend on consistently outstanding performances.

And I don't know how you call it non-market. When you're income depends on millionaires donating vast sums of money, you damn well better care about quality.

It's like saying a football coach doesn't benefit if his team drafts the best players.

> you're essentially postulating a giant unconscious conspiracy

Let me introduce you to the extensive scientific literature on implicit bias: http://www.aas.org/cswa/unconsciousbias.html

If the bias reflects a real Bayesian prior, it isn't the kind of bias that's unjust.

Not so. Priors/posteriors are only as good as the model they're based on. For instance, if you choose parental income as the feature it will can be a stronger signal than skin color, even though both may be good predictors. But the correlation between the income and skin color can account for the predictive power of one feature when the other features is the true cause.

So if Americans from race A commit ten times more violent crime than others and the police consequently accuse and manhandle vast numbers of innocent Americans of race A, there's nothing unjust about that? The vast majority of citizens of race A are innocent of all offenses but deserve constant suspicion and low level official humiliation and violence in a just world for no reason other than being the same color as some crooks.

I don't agree.

accuse and manhandle vast numbers of innocent Americans of race A

One doesn't need to advocate accusing and manhandling to think the police should use statistically valid inferences in the name of justice. I myself am a member of a minority group—men—that is responsible for a vastly disproportionate share of crime, especially violent crime. You could mandate that cops ignore this reality and treat men and women with equal suspicion, but the result would be worse policing. For example, if you look at the statistics for New York's supposedly racist "stop-and-frisk" policy, you'll find that the disparity between whites and blacks is smaller than the disparity between men and women—indeed, smaller even than the disparity between white men and black women. Why have you never heard stop-and-frisk described as "sexist"?

This inconsistency is best explained politically: complaining about racial injustice against blacks is an effective route to power; complaining about gender injustice against men is not. It's the same reason you hear constant complaints about how white tech is, but not about how black sports are. Jesse Jackson can effectively shake down Apple and Intel [1], but there is no white equivalent shaking down the NFL. (Can you imagine if "increasing diversity in the NFL" meant "increasing the relative proportion of white players"? It would be a different world—not, incidentally, one I would particularly want to live in.)

Being male means people will infer based on a superficial assessment that I'm more likely to be a criminal than, say, my sister. But that inference is correct. Being a member of such a group is my lot in life, and complaining doesn't change what is.

[1]: See, e.g., http://www.mercurynews.com/census/ci_29048321/q-jesse-jackso...

A straightforward application of evolutionary biology to Homo sapiens yields group differences as the null hypothesis. You've done nothing but construct a ridiculous strawman to refute this. Moreover, discrimination and group differences aren't mutually exclusive—it's possible that Group X's underrepresentation in Field Y is the result of both discrimination and group differences. The only way to know for sure that it's pure discrimination is to show that group differences are negligible. This requires actually measuring them (which in fact has been done in exhausting detail [1]), but even suggesting the possibility of group differences frequently leads to accusations of racism and sexism—as you've just so ably demonstrated.

[1]: See, for example, The Blank Slate by Steven Pinker. Then, once you get over your knee-jerk "That's racist!!!" reflex, take a look—I mean actually read for comprehensionThe Bell Curve by Herrnstein and Murray. Maybe add a little Cavalli-Sforza (via Steve Sailer) to the mix (http://www.vdare.com/articles/052400-cavalli-sforzas-ink-clo...). You can then graduate to basically anything by Arthur Jensen. As a topper, read "Rational"Wiki's entry on Human Biodiversity (http://rationalwiki.org/wiki/Human_biodiversity) and cringe at the smug, supercilious tone, endless strawmanning and distortion, and at the realization that you, too, were once taken in by the ridiculous "mainstream" views. (I certainly was.)

Thanks for literature and links. Hope I'll find the time to read this. The article from vdare.com seems really emotional to me. Not very reputable.

"Don`t believe any of this. It`s merely a politically-correct smoke screen that Cavalli-Sforza regularly pumps out to keep his life`s work — distinguishing the races of mankind and compiling their genealogies — from being defunded by the leftist mystagogues at Stanford."

"As you can imagine, this finding could get him in a bit of hot water if the campus thought police ever found out about it."

This may be a better place to start -- https://jaymans.wordpress.com/jaymans-race-inheritance-and-i... and https://jaymans.wordpress.com/about/. VDare tends to preach and agitate to the already converted, that is the peril of having to survive on donations. That said, while Steve Sailer is snarky, he is also reputable. He takes good care to not get things wrong. I've been following him for a while, and when some bit of news comes out, or some new policy gets announced, and the NY Times says one thing, and Sailer says another, Sailer almost always ends up getting proved right.

If you want a book length treatment, Michael Hart's Understanding Human History is the complete opposite of the typical, Jared Diamond, environmentalist accounts of human society. It is worth perusing - https://lesacreduprintemps19.files.wordpress.com/2012/11/har...

Understanding Human History is one of my favorite books. I finished it and immediately reread it. This was especially instructive given that a decade ago I read Jared Diamond's Guns, Germs, and Steel twice as well, quite innocent of the political subtext.

This is a smart insight, although in fairness the article suggests, in its female founder example, that there was discrimination against women -- that given men and women of equal ability, men were more likely to be chosen.

Condemning that inequality is different from affirming that selection should be altered to produce the outcomes people view as fair. One is saying, "Don't discriminate against Xs." The other is saying, "Not only can't you discriminate against them, you need to ensure that Xs have outcome Y. That is, you may be required to discriminate in their favor."

The latter is a value, and your point about mathematics being irrelevant stands. But the former is a mathematical claim, and pg was making a mathematical claim, so the mathematical argument you replied to is relevant.

I think the grandparent example is pointing out that it will be hard to use this for sexism claims precisely because studies have shown that the variance in ability in male and female populations is, in fact, different. The studies I'm aware of show more men at both extremes of the bell curve. So more men at the very top and bottom in IQ measurements[1].

There's a genetic basis for this, as well: women have two copies of each chromosome, whereas men have X and Y, so there's no second copy to take over in men, leading to more extreme outcomes, whether good or bad.

Now of course we ought to treat every group of people fairly, but we do need to examine our priors when doing so, especially when proposing ways to detect and punish people who may be thinking bad things, consciously or otherwise.

[1] We may not know just what 'IQ' is, but we do know that tests of mental ability all correlate with each other, suggesting an underlying factor. This, in turn, can be correlated with many other things, like success (or lack thereof).

Sorry but your reasoning does not let pg off the hook. In his article he says that the way to determine bias is by measuring the performance of those that got through. With your excuse, all you have to do is measure the number of people that got through in a particular group relative to the other group.

But the stated objective of the root article was failed. It concluded that First Round Capital is biased against females. There is insufficient data to support that claim! It might actually be that FRC is biased FOR females. But the process leading up to FRC was so biased against females that females are still at a net loss.

The article defines bias as follows:

> Want to know if the selection process was biased against some type of applicant? Check whether they outperform the others. This is not just a heuristic for detecting bias. It's what bias means.

Under that definition, you have been biased against A. [edit: on reflection I see this as a weakness of his definition. I missed that your selection process does in fact select the best candidates.]

Yes, but that's not the common usage of the word. Or how most people understand it. With that usage you could say ivy league schools are NOT biased against Asians since Asian graduates aren't more successful than non-Asian ones, except nobody does.

Hypothetical logic is flawed anyway. Higher ability Asian graduates could be less / only equally successful in the workplace due to pervasive external bias too. A lot of this is exacerbated by the fact that "soft skills" are more important for high status careers, and your "soft skills" are pretty much defined by tribal associations. It's the core of how we interact socially, and it causes problems that are really only fixed by alleviating scarcity.

Unless you know what exactly caused A to outperform others, you won't really know if the process is biased or what made it biased.

When asserting biases, you must first distinguish them from random noise. Using pg's logic, every selection process that isn't perfect is biased.

>> This is not just a heuristic for detecting bias. It's what bias means.

> Under that definition

That's not a definition. It's a claim about what the term "bias" means.

Graham's intuition is assuming equality of the two distributions.

As I noted in a different comment here, you can pretty easily fix Graham's test. Compute min(accepted a) and min(accepted B) instead of the means. In your example, the min of the accepted distributions would both work out to be 80%.

This assumes that the populations of A and B are of the same size. A larger sample will tend to have a lower minimum under many real world distributions - a sample of one will have its minimum equal to its maximum.

Another reason the use of mins here is not helpful, is that adding one equally awful accepted candidate to group A and B would then remove whatever bias there was according to the test, which is not what we want the test to indicate.

The idea by PG is a rough rule of thumb and breaks down trivially - suppose VC fund X were to accept all candidates, but group A was worse than B, the test would falsely imply that the fund was biased.

It's unfortunate the idea was dressed up in statistical persiflage because it isn't rigorous -- it's a rough guideline. To make it rigorous wold be very hard: either the abilities of the candidate populations would have to be measured very closely (unrealistic), or a more scientific experiment conducted (A-B test where candidates from each group are included or excluded opposite to the prior decision, which would need big groups).

No, it doesn't assume or require equality. The p-value you compute for this test will be proportional to the smaller of the sample sizes.

Mins will fail if you conspire to cheat the test, it's true. Very few statistical tests stand up to conspiracy theories.

Compute min(accepted a) and min(accepted B) instead of the means.

Dude, your comments are normally smarter than this. Yeah, you can easily fix Grahams's test -- all you need are some numbers that do not exist and that we cannot measure.

We're talking about VC's evaluating founders. That does not, and cannot, get reduced to a numerical score. And even if VC's did use some sort of scoring rubric, then we would still not know if there was unfairness in the way they made the scores, or unfairness in the selection process. It would just be punting the problem down a layer. PG's central claim -- that a third-party can detect the bias/unfairness in the funding process just using math -- is false.

You can only know if the process is biased/unfair if you have deep qualitative understanding of the process.

A charitable interpretation of what he or she said is this: don't evaluate bias by looking at outcomes of the average applicant, look at the outcomes of the borderline applicants. Even if there is no perfect way to define or measure the minimum acceptable applicant, I think it is reasonable to identify whether applicants were borderline or not.

Isn't that, by the way, what YC has been saying for years in their rejection letters? "We're always surprised by how many of the last companies to make it wind up being the most successful"? Something like that.

A charitable interpretation of what he or she said is this: don't evaluate bias by looking at outcomes of the average applicant, look at the outcomes of the borderline applicants.

That is fine, that is what he was saying. The point is that his solution is completely impractical for the original goal of finding an objective, statistically valid way of measuring whether bias exists. "Borderline" cannot be measured objectively, only by subjective rubric scoring. And when you only measure the borderline candidates, you have reduced an already way-to-small sample even further.

PG and I are assuming a measurable outcome, which the selection process is explicitly supposed to predict.

I made no claims about practicality - right now all I have is a little bit of measure theory showing that pg's algo is, in principle, fixable. I fully agree that the first round capital data he cites is inadequate (and also wrong, due to the unjustified exclusion of uber, which they explicitly note would alter the results).

My concrete claim: PGs idea for a statistical test is solid, I can (and shortly will) prove a toy version works, and given enough work one can probably cook up a practical version for some problems.

"Your idea isn't 100% perfect right out of the gate" is a very unfair criticism. Are we supposed to nurture every idea in complete secrecy until it is perfect?

OK I missed that you meant "easily fixed" in the strictly mathematical sense, not in the practical, real-world application sense.

With statistics on human affairs, 99% of the hard part is not the math, it is applying that math to a complicated, heterogenous, and difficult to measure underlying phenomena. And in most cases, statistics alone will never give you a straight answer, the best they can do is supplement and confirm qualitative observations. Failing to recognize this is how you get all those unending media reports about how X is bad for your health. PG's post was at the level of one of those junk health news articles.

And because human affairs are hard, we should criticize anyone who dares to voice an idea they haven't fully figured out yet.

This idea that statistics can only confirm and supplement "qualitative observations" (I.e. my priors) is completely unscientific and anti-intellectual. If that's true, forget stats - lets just write down the one permitted belief on a piece of paper and not waste resources on science. Science is really boring when only one answer is possible.

This idea that statistics can only confirm and supplement "qualitative observations" (I.e. my priors) is completely unscientific and anti-intellectual.

Since when is investing in startups a science? What is anti-intellectual, what is anti-science is to use the wrong tool for the job. Human affairs are not a science in the way that physics is a science. Statistics are far, far more fraught because there are so many variables in play, phenomena are hard to quantify, each case is so heterogenous, etc. You cannot use statistics in human affairs without also having a very good observational understanding of what is actually going on, otherwise you will end up in all sorts of trouble.

So the PG estimator is clearly problematic. I agree that the yummfajitas (YM) estimator looks to be consistent. In this case though, we're dealing with (small) finite sample sizes, so we need to come up with some sort of test statistic. What would the YM test be here? It seems tricky since you are dealing with a conditional distribution based on left-censored data. I'm also not aware of any difference-of-minimums test, though I am happy to be educated if there is one!

I don't know of something to refer to, but I don't think the statistics are too hard. The test statistic would be exactly min(sample1) and min(sample2).

Suppose the cutoff sample is distributed according to f(x)H(x-C). Then the probability of the minima of a sample exceeding C+e by random chance, assuming the null hypothesis, is p = (1-\int_C^{C+e}f(x) dx)^N.

So now you have a frequentist hypothesis test. If you make reasonable assumptions on f(x) (non-vanishing near C, quantified somehow), it's even nice and non-parametric.

Does that assume both samples are identically distributed and the only difference is the cutoff? If it does, then couldn't we just continue to do a difference of means test and still be consistent? If it doesn't, how do you handle identifying the cutoff minima and the two different distributions in a frequentist way?

The only assumption I need is that P_{f,g}([C,C+d]) >= h(d) > 0 for some arbitrary monotonic function h(d). This comes directly from the p-value formula.

I.e., for any d, there is a finite probability of finding an A or a B in [C,C+d]. I don't actually care what the shapes of f or g are at all beyond this - as long as this probability exists and is bounded below (in whatever class of functions f and g might be drawn from), it's all fine.

Sorry, I'm confused here. A p-value makes an implicit assumption that your null hypothesis is a known N(0,1). That may be throwing me off a bit. I get the point of you want to look at the likelihood function which is just one minus the CDF in the given interval. I'm just not clear on how you can get around f and g being arbitrarily parameterized functions of a given class. Are you assuming we know the class and something about f?

A null hypothesis is just a specific thing you are trying to disprove. In this case, it's simply that the min of both distributions is identical.

I am assuming we know exactly one thing about the class the measures f and g come from: for every function in that class, \int_C^{C+d} f(x) dx >= h(d) for some monotonic function h(d).

The p-value is then computed in terms of h(d), since p >= h(d)^N.

Okay, I'll have to wait for your full write-up, because I am not seeing the path of thought here.

I'll have you know that this particular subthread was way too civil for the Internet.

You can only rarely calculate min(accepted a) in the actual world. In this example, the college learns no distributions; they only know whether the student passed or failed.

So in this particular case, assuming that First Round's sample size is significant, it may just be that the female founders who seek them out are just on average better than the male ones? I suppose that if women think that the selection process is biased against them, and most do (and it may be) perhaps the less than excellent ones just don't apply, whereas that isn't true for males?

The problem is different than that. It's that the measurement of performance is susceptible to differences in the conditional distribution of disparate groups, given that they were selected.

We can illustrate it by modifing the GP example to not include time at all, and make the metric perfect.

Suppose we are selecting for the next qualifying round for the Olympic 400m team, and select candidates if their 400m is under 50 seconds. Then, we measure performance -- immediately -- by a 400m run. We have two candidate pools: people who compete in the professional circuit, and everybody else. 100% of the former group who apply qualify, while only 10% of the others do.

Okay, so now we immediately measure the average 400m time of all professionals, versus average time of all amateurs who can beat 50 seconds. It's pretty reasonable to expect that the professionals might average closer to 45 seconds, while the other group might average around 48 (I'm not a 400m expert, the actual numbers might be off. WR is 43 seconds).

According to the article, we now conclude that our selection process is actually biased _against_ professionals! This is at the least very counter-intuitive. Maybe we provide both groups with coaching, and re-test after 6 months, a year, whatever. The professionals will certainly still outperform the amateurs. However, suppose one of the amateurs goes on to greatness, and wins. Wouldn't this obviously be biased against amateurs, according to our intuition?

So in this particular case, assuming that First Round's sample size is significant, it may just be that the female founders who seek them out are just on average better than the male ones? I suppose that if women think that the selection process is biased against them, and most do (and it may be) perhaps the less than excellent ones just don't apply, whereas that isn't true for males?

First, the sample size not significant. Adding back one data point, Uber, which was a real data point that was intentionally removed, likely reverses the effect.

But imagine we had real sample of thousands of companies, and it did show the result.

A typical scenario is that different demographics might connect with First Round via different deal flow channels. For instance, one channel might be longstanding personal connections, another channel might be outreach to companies in the news.

Now imagine female founders are much more likely to be found via outreach rather than personal connections. Perhaps this is due to a negative personal bias -- the VC's are less likely to be chummy with females because of their sex. So they only find female founders when their company is in the news.

It is typical in all businesses that different deal-flow channels have different average returns. So:

* If both channels perform equally well, no bias will be seen in the statistics, even though the VC's are in fact biased.

* If the outbound channel generally performs worse, then women founders in the sample will perform worse than average, even though the VC's are actively biased against them (they are ignoring all the women who would have done well, if only that had personally known them. Sine the VC's never invest in them, their results are not measured). This is the opposite of the statistical relationship that PG claims should exist.

* If the outbound channel generally performs better, then women in the sample will be better than the average.

I should also add that the differences in the channels might be due to a positive bias on the part of the VC -- perhaps they do more aggressive outbound outreach in order to get more female founders in the pipeline. Or the difference might be due to something completely neutral.

The lesson here is that using statistics is a perilous endeavor. If you want to detect something like bias, you cannot use numbers alone, you need to combine any numbers with a deeper understanding of the selection process. There is no way that a third party can run a simple correlation and determine with any degree of certainty that the field is in fact biased or not.

This seems plausible to me (though no less problematic, of course).

Great insight. I had the great fortune to take a course with Gary Becker where we went into some of the mathematics of college admissions. He made this precise point -- the variances of the particular populations you are looking at matter a great detail. He managed to build some pretty convincing models which provided a compelling narrative for "biases" that we seemed to observe in the real world, all with simple changes to the distributions of populations. Great comment.

I had the great fortune to take a course with Gary Becker

Lucky you. He was great. I never had the chance to take a class from him but it would have been worth a Chicago winter to have the chance.

Second-best: https://www.youtube.com/watch?v=QajILZ3S2RE&list=PL9334868E7...

(I took this very class the year prior)

I think there's a big problem with this counter example. Which is that you're selecting for one thing and judging whether or not you were biased based on another.

> Now suppose that we select without bias or inaccuracy all the applicants that have an 80% or better chance of graduation.

This is subtly different than selecting for the highest graduation rate possible because it's binary, you want a group with >80% chances not a group with the best chances. Imagine if instead of the distributions you had group A was composed of people with a 100% chance of graduation and B was composed of people with an 80% chance of graduation. Our process does nothing to distinguish between those people because that extra 20% chance of graduation doesn't matter.

This brings me to what I think is the fundamental problem with your criticism, it's not clear to me what it means for a group in your example to over perform. If you select a group with the goal of 80% of them graduating it doesn't make sense to call 90% of them graduating an over performance. That only makes sense if your goal up front is to maximize the graduation rate.

I think if you rerun your example but instead assume an unbiased strategy that selects for the highest graduation rate possible you'll find that pg's essay makes a lot more sense.

I find it slightly amazing he posted this. Did he not bother to ask someone with a probabilistic background before posting?

Very interesting! Seems like I don't understand probability as well as I think I do, since I bought the argument till I ran into this comment.

So, what are the effects of variance in different evaluation contexts, and do we have a meaningful way to measure bias if we take variance into account?

My initial reactions:

- It seems the higher the variance, the better examples you can trot out to say you are not biased against that particular group. since youll always find a member of that group who does amazing.

- If distributions of performance are multimodal, its even harder to conclude stuff because different institutions might cut off different modes when selecting the bar.

- Modeling the sources of variance may lead to insight into any actual bias.

This logic is similar to that of Larry Summers in his remarks about diversity in science and engineering, http://www.harvard.edu/president/speeches/summers_2005/nber.....

> But we haven't been biased against A. [...] It was just their prior distribution that was different.

This is why that isn't a counter example. Right at the beginning, the article clearly states that this method is only applicable if the prior distribution is equal:

| You can use this technique whenever (a) [...], (b) [...], and (c) the groups of applicants you're looking at have roughly equal distribution of ability.

AFAIK pg added that to the article after WildUtah made his comment, and pg acknowledges this in a reply to WildUtah

Your premise discriminates between two groups.

It's not clear how you can assign a candidate a 90% chance of graduation. That probability must be a subjective assessment that has come from some (biased) source. In truth, an individual will either graduate or not.

In your example, you can assign 0% and 100% probabilities in group A, but you can't in group B. The most plausible mathematical explanation for that is that you collected insufficient relevant information about candidates in group B.

> This short comment is not up to pg's usual high standards for his essays.

I can almost here him thinking in response, "if I throw a dog a bone, I don't want to know if it tastes good or not."

I suppose PG really should have said:

>the groups of applicants you're looking at have exactly equal distribution of ability.

Rather than "roughly equal".

But obviously that makes the whole thing infeasible.

absolutely spot on. Differences in distribution is only one way in which you could disprove pg. There are others. For example, different "treatment effects". If conditional on getting selected, VCs pay more attention or are more useful for women, then that would be another reason that we would get the pattern pg proposes, but is not due to bias at selection.

Why are you using biased mathematics? If statistics and the scientific method (the tools through which dead white men continue to colonize) give us obviously problematic results, we should abandon them in favor of a method of inquiry that promotes social justice.

I enjoyed this comment.

> social justice

Gee, I never saw a definition. Not sure the meaning of the phrase is clear without a definition.

Please tell me this is sarcasm...

Based on rewqfdsa's comment history, this is probably sarcasm.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact