EDIT: Really pleased with the largely constructive conversation in this thread. Was worried that this was going to be coopted as an ideological flame thread. Thanks for the insightful answers and good faith engagement. Keep up the good work!
As a metaphor, let's say we create two categories in the world for all people - tallers (people above six feet in height) and shorters (people below six feet in height). Human height objectively exists, but these categories are social constructs. Likewise, human variations in genes based on ancestry clearly exists, but the discrete racial categories we define (black, white, asian, etc.) are social constructs since we could create other discrete categories (Irish, Slavic, etc.).
So saying race is a social construct does not mean your genetic make up does not matter or correlate with anything, but that grouping people into the set of commonly agreed upon races is not the inherent way it has to be. At the same time, these groupings do represent distinct genetic make up and so correlate with physical attributes. It's just that different groupings with different correlations are also possible.
This video explains it pretty well IMO: https://youtu.be/koud7hgGyQ8
It seems like the definition from your comment would be "Could we take this labeling system, and define different labels or the labels differently?"
I watched some of the youtube, and in the thought experiment she proposes she take a continuous trait(height) and arbitrarily splits it into two buckets. And talks about how this is kinda silly. But this is something we do all the time, with hypertension, diabetes, disabled, Alzheimer's, the 1%, capitalism, Canadian, etc...
And many of these categories are far more continuous than something like sex and gender which as far as categories go are pretty discrete.
Maybe these things are social constructs, but if they are then we surely must come to the conclusion that almost everything we care about in the world is a social construct.
That's the point really. And it's not inherently a value judgement to say something is a social construct. Using it as a value judgment usually is meant to convey that some secondary attributes are not inherent, or even more so, just historical by-products and might need rethinking or recontextualisation. Being a woman and being of the female sex might seem to be identical, but womanhood is not the same as having a certain genomic makeup, and therefore womanhood is obviously a social construct. The video above even outlines how female-Ness might be considered a social construct but I'm not smart enough to explain that to you where abigail will do it much better.
That the hard racial boundaries are unscientific (despite attempts to make it so), and that it's artificial and depends on consensus, and therefore is subject to change, and is typically subjective to the involved social group: the "one drop rule" is an example of how arbitrary this can get.
As sibling pointed out, in the United States, Italians and Jewish folk used to be not white, but are now widely considered to be white, despite not having any genetic or cultural changes in the interval between the shift in categorization.
Something can be continuous but still very clustered, and the extent to which it's "silly" depends on the uniformity of the distribution. It can still be useful to label the clusters in the distribution.
Another reason to subdivide a continuous dimension is that there could be a threshold after which some other dependent variable begins its inflection point (think of a hockeystick distribution). For example, there's probably some blood pressure value beyond which we begin to rapidly see serious adverse health effects--it's useful to call this "hypertension" or something. For another example, there's a threshold for the average temperature of the planet beyond which global warming "runs away". These are useful thresholds even though the dimensions are continuous.
It's interesting to think about the various ways to model this, and in many of your cases, the most reasonable model probably depends a lot on context or what it's used for/what statement you're trying to make.
Passing was (and to a lesser extent remains) crucial in the US and many places where variations in appearance were crucial to racial assumptions. If you look white and you act white then, most of the time to a first approximation you are white. But of course this opportunity is much more open to a relatively pale-skinned person than to a dark-skinned poor person which is a further problem on top of the problem that now people are lying about who they are.
there are "Larger Genetic Differences Within Africans Than Between Africans and Eurasians" "and the genetic diversity in Eurasians is largely a subset of that in Africans, supporting the out of Africa model of human evolution"
The highest genetic diversity is actually among people who are small subgroups of "Africans" - Khoisan not Bantu.
So, genetically, just as you can't draw a line around "bony fishes" without also including land animals who are also descendants of bony fishes.
Also, you can't genetically call "Africans" a well-defined sub-group of humans, excluding others.
But with race, you never have two white people produce a black person, or vice versa (and no, a black person with albinism is not a white person).
I had a personal friend who was white with two black parents, not albino, and genetic tests proved paternity.
However, can I ask what the point of your hypothetical is? I'm not sure what message or conclusion I'm supposed to get from it, in context of the discussion.
Think about what this means People who have, say, three black great-grandparents and five white ones, will vary widely in how black they look, and this likely affects how they identify. But there are also invisible markers of ancestry - of the sort you would expect maybe were visible only on an X-ray if you looked carefully. Those could easily break the other way. You could easily get someone who looked white (and thus, probably identified as white), but were "black on the inside", in terms of their less visible characteristics.
Yet the algorithm sees through all that, and manages to see what you feel like? Correctly classifying Shaun King as black and Tom Jones as white? From a 8x8 X-ray picture?
The people who insist that "race is real" should be the most confused by these results, since we know how fuzzily identity is coupled with ancestry, especially in the main groups studied here, black and white Americans.
I'm much more prepared to believe, for instance, that there is something catastrophically bad going on in medical image dataset collection, than I am that self-ID race is nebulously predictable from almost nothing.
Why don't you answer the argument instead of trying to convince me it's risky to disagree with the crowd?
If two doberman pinschers had a puppy that looked like an English bulldog, it would be strange and newsworthy. But, if the two doberman's actually had grandparents that were English bulldogs, the mystery would be fairly easy to solve.
At least part of this confusion is associated with the culture of race(at least in the US) as opposed to the genetics of race. For example, we consider Barack Obama black, but he is equally as white as he is black. There's no genetic basis for making that kind of determination.
Africa is very, very, very genetically diverse compared to the rest of the world. I don't think there exists a population which doesn't contain genes for lighter skin.
I think race labels make some sense in a social/cultural context: In America we can call someone "black" when they are 75+% white, because for their entire childhood/life, socially, society at large treated them similarly as they treat people who are "100% black".
But race doesn't make sense in a genetic context. It's probably far more absurd than defining the difference between an accent vs dialect vs language. Even though there are clear differences between individuals/families, the boundaries are absolutely arbitrary.
Which do you think is more common, based on your quoted stats?
a) To see white parents who have a genetic child who has black skin, or black parents who have a genetic child who has white skin (disregarding albinism)
b) two tall parents have a genetic child who is short as an adult, or two short parents to have a genetic child who is tall as an adult
Literally, news articles are written and go viral about "black couple has white baby", whereas there has never been an article "Two 6'3" people have an adult child who is only 5'5""
If you had a series of 100,000 parents+genetic children paraded in front of you, do you think it would be more common to see two parents who are both rather taller or both rather shorter than their child, or would it be more common to see two parents who are appear to be of one race, but their child appears to be of a different race?
If your answer isn't "I'd expect those things to happen at about the same rate", then you should question what exactly the sources are saying for the person who posted about the heritability of height vs skin tone.
The feeling I get is that many people really, really don't want race to be real. Because it if it isn't real, then you can't say there are differences among the races. So, they will argue against common sense and try to say things like height is just as heritable as skin tone as a rebuttal to the fact that pretty much always, two white parents don't give birth to black babies, and two black parents don't give birth to white babies.
I see the same kind of mental logic at play with LGBT-supporters, where there is a strong insistence that being gay is genetic, and not a choice. That way, you can't chalk it up to lifestyle choices that you can just change. Personally, I don't really see why it matters whether being gay is genetic or a choice, because there is literally nothing wrong with being gay, whether you are born that way or whether you "just" choose to be that way.
I suspect you are making a category error comparing "two parents who are both rather taller or both rather shorter than their child" and "two parents who are appear to be of one race, but their child appears to be of a different race".
To put the two into the same category, you could compare "two parents...taller or shorter than their child" and "two parents...lighter skinned or darker skinned than their child"---two aspects that are single traits. And no, I wouldn't be crazy surprised in either case, unless you were specifically thinking about Robert Wadlow and Zeng Jinlian giving birth to Chandra Bahadur Dangi or two parents of Danish descent giving birth to a child with a Bantu skin color. (And no, I don't know the relative frequencies of such events.)
The other way to resolve the category error, if you prefer to compare bundles of traits, would be two people of Italian extraction giving birth to a stereotypical Irish child. That sounds pretty unlikely, especially prior to modern travel and migration patterns.
But the real, underlying question that is the base for "many people really, really don't want race to be real" is, "So what?"
Yes, human beings tend to share traits with other closely related people. So what? Individual variation is pretty large, too.
Historically, Italians and Irish were considered to be different races. Not so much today, because "race" is a social construct and the difference between the Irish, southern Europeans, and the canonical northern European isn't a big deal today.
Races are defined to be a way of applying a group of conclusions, which may be difficult to perceive directly, to an individual who has an easily perceived marker for the race. That can be more-or-less neutral to somewhat pernicious. ("You are Asian, therefore you must be lactose intolerant!" Well, maybe.) Or it can be straight up evil, especially if the conclusions you are making are simply made up to enforce your superiority to the individual.
As a result, race is either not real, or real but completely uninteresting. Any other option is intellectual laziness at best and at worst....well, it leads to poor outcomes.
Now, neither you nor I particularly care whether homosexuality is genetic or a choice, but I hope you can see how someone who has to respond to "So just don't be gay!" might prefer one over the other.
With all due respect, I don't believe you're being honest here. And that goes to my point about people wanting race to not be a social construct. I believe you're lying to yourself(or just lying to me) that you wouldn't be crazy surprised in either case.
> But the real, underlying question that is the base for "many people really, really don't want race to be real" is, "So what?"
Exactly. You don't want race to be real, you think it might be misused if it was real, and so you argue that it isn't real. That is not a compelling basis for an argument. A good argument for the earth being round is not that you're scared of people punching you in the nose if you say it is flat.
> Now, neither you nor I particularly care whether homosexuality is genetic or a choice, but I hope you can see how someone who has to respond to "So just don't be gay!" might prefer one over the other.
Yes, I can see how they would prefer that. But it has no bearing on the reality of whether being gay is genetic, or a choice, or possibly both, and really has no place in scientific analysis of sexuality.
What races are there? Is Irish a race? How, in fact, do you meaningfully define a race such that there aren't 20 races in sub-Saharan Africa for every one everywhere else?
And then, what do you do with that information? I'm Irish, she is Southeast Asian, Ted over there is Mayan-German. Does it improve life in any significant way?
If you wish to examine a granfalloon, just remove the skin of a toy balloon. — Bokonon
Maybe if diseases are distributed differently across different races, it can make testing and treating them more cost effective, leading to an overall improvement in health outcomes. I'm not going to waste my sickle cell anemia test kits on testing Icelandic people.
Do you apply this same logic to people who study esoteric branches of mathematics with no real possibility of improving life in any significant way?
This all goes to my point that you don't want race to be real, so you argue that race is not real. Maybe it is convincing for you, but it is absolutely meaningless to me. It would be great if the universe worked that way though, all we would need to do is close our eyes and pretend cancer doesn't exist, and it would just go away!
Humans have a myriad of visible characteristics: height, weight, skin color, eye color and shape, hair color and type, shapes of facial features, and so on.
All these characteristics vary continuously.
If you see that data as points filling a high-dimensional cube, there's not going to be an empty space there.
Some areas are going to be denser than the others, but there are no gaps there.
What you try to do with "race" is you're trying to cluster this data.
But there really is only one cluster. Might as well call rand() a million times to get a bunch of points in 0..1, and cluster that.
Oh, but you've see black people! And white people!
Well yeah, but it's all those people filling the spaces in between any two points that make it impossible to draw the line.
The only way to draw the line is to make a call on where to draw it — that is, to make an arbitrary choice. Without it, your clustering algorithms would fail.
Yes, there are high-density peaks on this data, especially if you look at any single characteristic.
Yes, you can separate the peaks. But deciding on where to put the the threshold is choice — a social construct — that can leaves a lot of points without a "race" label (which race is Irish - Mexican?) and/or change which peaks make the cutoff (are Armenians a race, or noise in the dataset?).
>Maybe if diseases are distributed differently across different races, it can make testing and treating them more cost effective
The scientists have two options:
A) Look at the original data which you used to assign the race label (skin color, hair type, etc), and see if there's any correlation of that data with diseases
B)Look at the data, cluster it using an arbitrary choice to be able to get more than one cluster, ignore a lot of people below the threshold, assign the labels, IGNORE THE RAW DATA, and then look for correlations between labels and diseases.
Which approach do you think is more scientific?
Or is it somewhere in between? Do you feel comfortable marking the line where life possibly begins? If you mark it before birth, aren't you just giving ammunition to anti-abortion advocates to take away freedom from women? Should we just say that life begins at birth, and shut down anyone who asks otherwise, because anything else is dangerous and possibly arbitrary and not 100% accurate in rare cases?
What's the continuous variable here that you are measuring to assign the discrete label (alive/not alive) to?
If you want to do that based solely on one variable, time from conception, then you get bad science. People can become dead at any point in time.
Yes, it's not possible to assign the "alive" label based on time from conception alone.
We do have plenty of discrete characteristics based on which to assign the alive/not alive label.
The Bible, for example, assigns the discrete label "alive" based on discrete label "breathing" in at least one case.
The definition I use is "does not need to be within a human body to sustain activity in brain cells".
Hope that makes things clear.
You on your own definition of life would apparently want to restrict the rights of women relative to what they have currently. This is a dangerous idea and should not be allowed.
I chose my definition from the rights-of-women perspective. My understanding is that a fetus is generally non-viable outside of a host body, and thus not "alive" by my definition.
On the other hand, we havd C-sections and incubators. If a baby can be safely extracted with a C-section, placed in an incubator, and survive, my understanding is that the choice to abort is no longer available.
Of course, I can be wrong here.
My point was that at least I can make a definition here that does not depend on a choice of an arbitrary value of a continuous variable. My definition of "alive" depends on the choice of which variables to look at, not on arbitrary thresholds. And I don't insist on it being The Truth.
"Heart rate" is a continuous vafiable, but the distribution of its values has a large gap between 0 (no heart rate) and nonzero values (the lowest observed was 27bpm). So you don't need to make a choice for a clustering algorithm to work. You can run K-means on "heart rate" and get these 2 clusters: 0 and everything else.
This allows one to make a definition of "alive" based on heart rate. That's not my definition, but it's a usable one.
This is not feasible with race if we use the variables commonly understood to be associated with race: skin color, height, nose shape, etc. There are no gaps in those variables.
Even eye color varies continuously , and it's not clear how to assign labels .
Color in general is a good analogy for race. You see the colors vary in a rainbow. You can tell the difference between red, green, and blue.
But you can't run K-means on a rainbow to get 7 colors out. Or any number but 1, for that matter.
You need to make a call yourself where to draw those lines.
This reflects in languages. Russian has distinct words for what we'd call "blue" in English; But also English has words like cyan, turquoise, navy, etc., which other languages may not have.
Color, in end, is a social construct.
And I notice you cannot answer any of my questions as to how you would define races.
I have no idea how you would define race. I don't know why you would expect me to. I also have no idea how you specifically define certain other aspects of biology that you probably accept as real and accurate, because I'm not a biologist or a geneticist.
People are not 'race x' or 'race y' biologically, as if race is some discrete set of features common to a whole population. Every individual has a set of biological features inherited from their ancestors which, theoretically, could include any or all so-called 'races'. Human beings have a continuum of features that is heavily interlaced amongst all the 'races'.
Putting it another way, if we were alien visitors, and had in front of us a representative sample of dead bodies from the entire world, we would be hard pressed to sort those bodies into 'races' based on biological features.
For example, we currently use melanin levels as a key indicator of 'race' today, but an alien, lacking the social context of the significance of say, high levels of melanin, may well consider it a secondary feature since its shared with otherwise unrelated people
Clearly people are biologically different based on race and the AI here is picking up on that. My kids orthodontist even told me they align teeth in part based on race. The Asian arch is flatter across the front for example. I asked about this because an engineer I worked with had a father in dentistry and told me my kid had "German teeth in an Irish mouth" which matched her ancestry, which he didn't know - just said that in response to my description of the crowding.
So YES, races have biological differences. If not, we wouldn't be able to tell where people are from. I get that it's not cool to discriminate based on race, but it's not OK or even practical to deny that it exists (see dentistry example above).
No one is going to argue that you get your traits from your ancestors and that regional groups have similar traits due to shared ancestry, it's the classification itself that doesn't match up well with reality.
Seems like you have a pretty good grasp on my assertion here.
Right, but genetic correlates are real. These superficial characteristics evolved along with a thousand other non-superficial characteristics, in mostly (with the exceptions of conquest, trade, and border settlements) isolated regions.
Your skin tone and eye shape are indeed cosmetic trivialities, but they correlate strongly with muscle fiber density, susceptibility to certain diseases, endocrine profiles, and a host of other things that very much do matter.
Where people err is in assuming that these sorts of trivial observations are sufficient to show that race has a biological basis. Racial groupings are not biologically natural. They are completely arbitrary from a genetic point of view. That is, there are no sensible biological criteria according to which humans can be grouped neatly into a handful of 'races'.
See for example the following comment regarding sickle cell anemia: https://news.ycombinator.com/item?id=28525697#:~:text=azalem... Yes, 'black' people are statistically more susceptible, but that's not because of any property of 'black' people as a group. It's just because regions with Malaria happen to have dark skinned populations.
There are lots of "exceptions" to this, like sickle cell anaemia, for example , which is used as a teaching example of an Mendelian autosomal recessive disease. But note that it goes hand-in-hand with a historical pattern of malaria, covering a fairly large and inhomogeneous blob of africa, the middle east, italy/turkey/greece and india. Our social construct of race varies quite substantially over those places.
The utility of clusters is nebulous, too.
I don't think this is true, logically or socially. We have dense clusters that are slowly merging as the world enjoys this extremely new concept of travel and intermingling world populations. I've seen many people describe themselves as "mixed race", directly or indirectly by describing their ancestry. Of course, the number of basis vectors required to accurately describe a person is increasing with time, but it seems that medical science has chosen to ignore this "easy" way to describe and treat people for some reason. But, is it all that surprising, considering not even women were represented fairly in medical trials, even 15 years ago?
The same is true of "species". Last I checked, there were at least 24 different definitions of "species", all of which have some overlap and none of which are perfectly precise. You don't see people going around saying "species is a social construct". Then again, maybe they will soon, I suppose anything is possible these days.
That said, you are correct that "race" is merely a rough statistical correlation for some cohort, not some precise measure. If we can categorize more precisely, then we should, and we only fall back to "race" as a last resort (if it's applicable).
> A species is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction.
This is the definition I’ve always heard, and it’s certainly more rigorous than any definition of “race” that I’ve encountered, and makes no reference to any arbitrary social constructs.
Conversely, the definition for “race” is explicitly arbitrary and social:
> A race is a grouping of humans based on shared physical or social qualities into categories generally viewed as distinct by society.
It's not as precise as you think, because fertility isn't transitive. Consider members of a species, M1, M2, F1, F2. M1 might be fertile with F1, M2 with F2, but M1 may not be fertile with F2. Are they all really members of the same species?
Read up on the species problem for more information (there are now 26, not 24):
That being said, it seems to me that it makes sense that a species would include the entire connected graph whose nodes are members and whose edges represent the ability to have fertile offspring.
I found  which is an interesting example of a very large graph, but nevertheless, all the examples seem to be within altogether very similar groups.
I think a reasonable definition for casual use doesn’t need to require the graph of a species to be fully connected, only fully reachable.
There are certainly leaks to any abstraction of species, but they are empirical cases of exceptions. They don’t inject arbitrary social categories into the definition. Definitions of “race” have no empirical basis on which to be proved or disproved in the first place.
That's simply not correct. The literature abounds with all sorts of correlates with race, like propensity for sickle cell anemia, vitamin D deficiencies, susceptibility to alcohol, and morphological differences. These are just as empirically justified as any classification of species, and just as with species, not all of those properties need apply to every single member in that category.
So to the extent that we find "species" a meaningful category when applied correctly, then we should also find "race" a meaningful category when applied correctly. The key in both scenarios is to apply them correctly, and we should abandon them when we find more precise metrics.
That said, you are correct that there are also numerous cultural and social properties that are sometimes lumped in with race in a manner that we don't see with species, mainly because "species" hasn't been politicized. That doesn't imply that there's nothing "there" once you tune out that baggage.
I think this is the thing people misunderstand about race being a social construct, is that race is a bucket, by nature there need to be correlations between people in that bucket in order to actually place people into that bucket in the first place.
There will likely be other correlations between people in said bucket, who are in this cause usually more closely related and share more recent common ancestors.
The size of that bucket and how we put people in it is arbitrary, but the fact that correlations exist when you put people in buckets isn't.
This is another problem with species - where is the transition where one species evolves into another?
Is there any definition of race where, given two people, you can apply some criteria to determine whether or not they are the same race? The criteria can't be "look at the definitions of each race and categorize the two people" because that's circular, how were those particular categories chosen? With species, you can determine that a bird and a fish are different species without knowing what birds and fish are.
Edit: but even if I did agree with the definition, if "race" is as useful a category as "species" despite being socially constructed, all of the people calling race a social construct have utterly failed to make the case that we should stop using it.
Approximation it may be, imperfect it may be, construct it may be, but its utility is very much real. Attempts to handwave it away do not change the fact that race and genetics are very much linked as it is used today.
I would dispute that claim.
Due to population bottlenecks among the first 'out_of_africa' groups, a black passing south indian is genetically a lot closer to a white as milk finnish person than various african subpopulations are to each other. (Africans are orders of magntitude more diverse than the rest of the world, in a genetically quantifiable way)
Race markers like latin american and hispanic betray the fact that some countries (argentina, chile) are almost entirely white, others are have denisovan dna (natives) or are racial frankenstien's monsters due to slave trade (Brazil). It makes no sense to use these umbrella race denominations.
Race as an overloaded term for sociological, antropological, genetic and medical use is stupid. It just becomes a terrible tool for each. Genetics has smartly stopped using race much, but the others still continue to do so, despite the inconveniences it brings.
There is utility to race , only because we refuse to cut the middle man and identify clusters directly from genetic data. No one needs a cockerel to wake you up, when alarm clocks have been invented. Honestly, typing this comment has just made me want to invest in these 23nme-like companies.
There's a reason for this, and that reason is cost. Even the relatively cheap microarray based tests that 23 & me uses are expensive at scale. Race, imperfect as it is, serves as a low-cost, reasonably effective proxy in many (but not all) cases.
Remember, for medical logistics, it's not about getting perfect care (are you sequenced yet?). It's about getting cost effective care, lest you bloat costs to high heaven.
Where's the utility in that?
What's the utility of the concept of race?
But this is true for everything. For example "night and day" - these are just buckets, but nobody would argue that there are no differences between night and day because of that.
could be genetic I suppose but I wasn't aware of this as a thing eg. Asian fit sunglasses
The point is that just because two categories can mix doesn’t automatically mean that those categories don’t meaningfully exist.
No, I don't know what the point of this comment is. But I'm not sure I understand the point of the parent comment either.
Africa is going to be similarly diverse as Asia.
I think the parent is saying that it's possible (likely even) that the AI isn't picking up on biological features, but some other artifact. For example, perhaps the quality of x-ray machines or technicians correlate with race (race and "access to higher quality radiology" both correlate with wealth) and the AI is really picking up on the quality of the imaging. The fact that the AI still worked when the imaging quality was reduced across the board (pixelated into 8x8 squares) suggests that this particular hypothesis is unlikely, but this is the kind of error we're discussing.
Off topic, but this sounds very engineery, indeed. Was the conversation polite?
White cat, tabby cat, grey cat, etc? We don't try to say one sort of cat is better than other, but we can tell them apart very well.
Maybe what you're saying is that the aliens would not have the same prejudices associated with that marker as we have.
For example, lining up those dead bodies by skin tone alone would have Central Africans mixed in with South Americans, Austronesians, and South Asians, Northern Europeans mixed in with East Asians and Inuit, Southern Europeans mixed in with Indians,Central Asians, Native Americans, and Arabs, etc etc.
This made me chuckle. But it is a good example. I bet there is a lot we can infer from people based on their common go-to colors for clothing.
"Nonsense," replied the thistleglorb. "You both have exactly the same capacitance."
Exactly. One of the people I'll always point to as evidence that race is a social construct is Barack Obama. He was the "first black president". In reality he is, genetically, as "white" as he is "black". We still call him black because of the color of his skin.
If people insist for long enough that racial categories are inherently biological you'll eventually end up in one-drop judgement territory. Not a great place to have a discussion.
'Aliens' visiting Earth would immediately categorize us into groups very crudely resembling the groupings we use today, because our visible characteristics are the most immediately obvious artifacts of our existence.
(Edit: when I say 'immediately' I'm indicating this would be an obvious, first order thing to do from the first pictures they have of us, not an 'Enlightened Alien' scientific form of categorization).
If all they had were 'pictures' of us, the race categories we use would be the obvious grouping, or something resembling that.
They would see that most of the people in Sub Saharan Africa looked quite different from those in East Asia. (And difference between Sub-Saharan Africans and East Asians is more than 'melanin').
There's a 'continuum' between every biological grouping, that doesn't mean those categories don't exist. It just means we're going to argue a lot about where and how to draw the lines.
Race as a 'Social Construct' relates to all of the other attributes that we associate with race, and individual lived experiences due to how they are perceived etc..
To your point, Aliens wouldn't immediately pick up on the 'Social Construct' bit, at least not right away and so they wouldn't have the prejudices that we do, but if they could only observe from afar, they would see exactly what we see, and visual distinctions would be the 'first order of separation' even if it was, after further understanding (i.e. genetics) a less important distinction as you hint.
Edit: someone provided this like I'd like to also include it  which illustrates some of the current debate over the notion of race, and that it's clearly politicized.
Further features that seem very important to us like the shape of our eyes or skin tone may seem irrelevant to a creature which doesn’t have a face and sees the world in a different color spectrum than we do. They might group based on smell or habitat marking groups of urban, suburban, and country dwellers.
From a social PoV, in Cuba, someone as you describe 3/4 one and 1/4 the other would be likely be put in the category of the 3/4. In some places an albino is just someone with different genetic characteristic, in other places they are a creature that brings bad luck and suffer violence.
Dog and cat breeds are distinguished throughout the world. It's genetic and socially constructed as well.
Race is definitely drawn along genetics lines - you've just demonstrated that in your example.
The 1/4 black - 3/4 white person was not identified as 'Asian' in your example, but rather, in the mixed physical scenario was crudely categorized as one or the other, in the example you gave, Black.
(FYI I tend to disagree a bit about the category though: I believe people will be categorized mostly for how they actually look, not so much the 'ratio' of anything. There's a lot of 1/2 Filipino 1/2 White people on TikTok who 'look' 100% White and make funny videos about the fact nobody believes them about their heritage).
Your example shows that race is definitely a social construct, but that it also has underlying genetic realities.
An alien taxonomist, perhaps.
We humans go "bug! whale! snake! cow!" most of the time even for species found here on our own planet.
We would even name the sub-groups firstly the artifacts that differentiate them physically.
Blue Whale, Grey Whale
Black Bear, Brown Bear.
As we develop a better understanding, we'd also probably later determine that the genetic differences may not map very well to the physical attributes ... but due to historical groupings based on visual cues, we'd continue to overstate/understate the differences in the textbooks and in pop culture.
They'll presumably have had things like https://en.wikipedia.org/wiki/Carcinisation happen on their own world.
I wonder about that... I can't tell the difference between, say, a male octopus and a female octopus, but octopi can. Maybe the differences that seem so obvious to us are actually almost imperceptible outside of our species.
Whatever categorization they came up with would most certainly not resemble what we consider racial categories today.
That doesn't mean race doesn't exist any more than the fact that height is a continuum doesn't mean that short people and tall people don't exist.
Exactly, although not in the traditional sense. There are many many overlapping genetic aspects in humans; We subdivide for political or social reasons, not strictly biological ones. To make the height comparison more fair, it would be as if we divided people into "bigs" or "littles" arbitrarily and formed political parties around it, etc. Height is one biological aspect and even then what is "tall" is subjective.
For example, the US viewpoint of white or black(~=african) is a relatively recent way of looking at race. People don't slot neatly into X or Y buckets.
I recommend reading the wikipedia article on Race .
 A humorous reference. Sorry.
For example, US documents usually include Latino as a possible race, even though Latin Americans are white, black, indigenous Americans, or a mix - and Spaniards are what usually would be classified as white. If you check older forms you'd see Italians and Irish people categorized as a different (non white) category, etc.
It gets confusing with countries like China, Japan, and India, which are more racially homogeneous, and where the country name is the same as the (common) name of the predominate racial groups.
It really isn’t. Where are you getting this idea from?
The other replies here are mostly good, but I'd also like to note that "race is a social construct" refers to how "races" aren't really objective categories (What defines if somebody is "white"?) and more of a subjective thing, particularly at the margins. We can build classifiers that can match most people's (in our current cultural context) perceptions most of the time, but that doesn't make it a rigid natural phenomenon.
For example, I could build a classifier that looks at household finances and decides if people are lower, middle, or upper-class. I'd bet that I could get it good enough that most people off the street would agree with the results most of the time. However, that doesn't make "social class" some sort of objective, unchanging, universal truth. Somebody from 100 years ago would probably find us all to be upper-class. Somebody from the far-flung future would (hopefully) find us mostly to be near-destitute.
Agreed, and upvotes too! It seems like I've struck upon something people have been interested in talking about.
> The other replies here are mostly good, but I'd also like to note that "race is a social construct" refers to how "races" aren't really objective categories (What defines if somebody is "white"?) and more of a subjective thing, particularly at the margins. We can build classifiers that can match most people's (in our current cultural context) perceptions most of the time, but that doesn't make it a rigid natural phenomenon.
I certainly believe this to be the case, but when I hear "race is a social construct" it's almost always in the context of denying biological differences between the races in the same way that some extreme (though mainstream and influential) people take "gender is a social construct" to mean that literally all differences between the sexes are socially constructed including height, weight, strength, etc (otherwise known as "blank slatism").
That said, unlike biological sex, there are fewer valid social implications that we can draw from race (e.g., there are a bunch of social implications which fall out from women's unique ability to bear children, but no analogues which fall out from race) and we have drawn many false implications from race which have been tremendously harmful to individuals of different races, so if we have to reduce everything to a slogan or a binary (as our simplistic society increasingly demands), then "race is a social construct" isn't a bad one.
Racial segregationists in the Americas literally had to write laws delineating races based on factors external to that individual (e.g. their parentage). That is: even people who believed in a racial hierarchy also believed it was not possible to objectively identify a person's race without knowing e.g. the races of that person's parents.
Race has never been considered a generally observable fact about a person.
If an arbitrarily large grouping of genotype combinations is necessary to categorize people, perhaps that categorization scheme is not useful? I would imagine that the number of "races" generated via the mechanism you posit would measure in the hundreds or thousands, rendering it unrelated in practice to the word "race" as it is used.
Could you give a reference to someone who actually says this?
I love thinking about future historians view of today, I think it's better and more useful than futurism.
This isn’t to say a lot of people who are into race science don’t wildly overstate their claims, but there isn’t literally nothing to it.
People need to realise the following:
- Race is a social construct
- It's also a proxy for ancestry
- Ancestry is a proxy for genetic history
- None of the above contradict each other.
It is possible that we could sufficiently redefine race and ethnicity such that the above isn't true, but as it is right now, race is at least moderately coupled to a biological signature.
What should also be emphasized is that race isn't an end-all. The within race variation is far greater than the between-race variation.
But it's just because of the metrics you've chosen. If you started defining race by bone density, and ignored ahistorical half-biblical half-mystical 19th-century human taxonomies, you might find that their classifications aren't interesting or useful for most things. If you controlled for effects that are affected by differential social treatment (like diet and upbringing), you might find most of your metrics are ghosts.
The process of sorting people into boxes for differential treatment based on their qualities affects their qualities.
I’m all for “variation with groups is larger than variation between groups”, both as a literal reality and also as a practical answer to the question of how do I go through the world without being a jerk to people.
Racial categories are squishy and permeable. So are ideas of species, sub species. They’re useful tools with sharp limitations. Again, some people certainly oversell the utility of the tools, often for malign ends. But it doesn’t mean it’s an entirely bankrupt view
Huh? If the distributions of bone densities among “people you would visually identify as ‘race X’ ” and “people you would visually identify as ‘race Y’ ” differ, then knowing that an individual would be someone you would visually identify as ‘race X’, gives you some amount of statistical information about their bone density.
I guess you just mean that you can’t make any high-confidence statements that are substantially different than if you didn’t have that information?
Small note on second paragraph: I don’t think the racial categories in question are even half-biblical? I mean, I know in the Old Testament there are lots of things referring to e.g. edomites, or Amalekites, etc. , but I don’t think these are really like, “races”, and I can’t think of anything that really supports the idea of a fixed set of racial categories. But perhaps I’m forgetting/missing something.
Only if they differ substantially. If they look like https://evergreenleadership.com/wp-content/uploads/2014/01/B..., you can't conclude much from an individual's bone density measurement, even if "people of race X have 3% more bone density than people of race Y".
I interpreted "conclude anything from" as "have a different posterior probability distribution" rather than "have a significantly different posterior probability distribution". This was an error on my part.
Thanks for the elaboration.
There are biological correlations with inherited genetic lineage. That only has weak correlation with assigned race. Lots of different lineages of people are "black", Africa is big and diverse. Lots of different lineages of people are "white". But for example, if your lineage is from a malarial region, your chances of being a genetic sickle cell carrier are higher. Most people from those areas are dark skinned, so it correlates with being "black".
Also there's the impact of racism, which affects everything from nutrition to poverty to pollution exposure, and does so on an individual, a regional, and a national scale. And this has biological consequences.
My question is, using medical imaging can it particularly identify say people of african origin but it cannot tell apart east vs west africans? Can it uniquely identify asians but can it not tell apart an indian person from a Korean? Or given proper training can it discern between north and south koreans or between a french person and a greek person?
Race is a social construct not because groups of people are all identical but because both science and major-religion have concluded that humans share a common human(homosapien) ancestry, therefore there is one human race and multiple ethnicities and geographical super-ethnicities (south east asian and north european for example).
Edit: This is also why race on id cards is silly. Not because you don't want to identify people based on appearance but because ethnicity is more granular and leads to less confusion. Would it be more identifying to say indian or asian? African or north-african?
I strongly believe the modern black/white/asian "race" is a darwinian invention to try and understand and classify nature better, based on intuition instead of science.
In reality, there is broad overlap and if you look up close, the whole concept becomes hairy. Someone with a father of Scandinavian descent and a mother with African lineage, what is that person, black, white, 50/50?
It's the same with gender/sex. While the biological substrate is clearly variable, the categories are social.
Races are mostly clusters in variation within one mechanism - eg. skin color is largely a gradient of more or less melanin, and what gets selected for depends on the environment. It's not intrinsically linked to the whole cluster of race-typical traits, those just are in the same genetic bundle that gets inherited from generation to generation, and most of those traits can be mixed.
Sex itself has a sharp divide from the mechanisms themselves being completely different.
Nature doesn't care about categories and will happily produce all kinds of distributions. It's us humans who try to bin them cleanly.
Women with more male-like personality goes into that "variation in a shared system that has sex-linked distributions": There are women that have more typically male personality configurations and the reverse, just as there are tall and short people. Nothing surprising about that, and it doesn't relate to sex itself being a binary dictated by two different reproductive mechanisms.
Northern Europeans, Southern Europeans, and Eastern Europeans (or rather people descending from those areas) differ in a number of physical aspects. Are they different races? For those who think race matters today, it seems not---they are all "white". Back in the heyday of scientific racism in the first half of the 20th century, they absolutely were---that is why the US had different immigration limits for different parts of Europe. Are native Australian people the same race as Africans?---there's no especially close genetic relationship as far as I know.
Physical differences exist. How you use those differences to divide people into groups and, more importantly, how you treat people of those groups is a social construct.
Also, due to omission of many other variables (such as culture), those variables are being conflated with race. I personally think a lot of what is commonly accused of being racism, sexism etc. is really just "culturalism" or "preferentialism"... let me think of an example... Given two bars, one filled with rap music and the other filled with techno music, and I pick the techno one every time... am I racist (assuming the predominant race in these 2 bars differs)? Or just preferentialist, culturalist or (frankly) "techno-ist" (if that were a thing)?
Because I think it's much harder to get angry about preferences than it is to get angry about racism, I think that given a choice, we need to consider the less-triggery inputs to a perceived problem
if there are all of these biological correlations with race, what does it mean that “race is a social construct”?
Race is probably somewhere in-between. There are people all over the spectrum but there are pretty clear large groups with sparsely populated gaps in-between them.
See: Caster Semenya, who is disqualified from competing as a woman because she has XY chromosomes and naturally elevated testosterone. She's intersex. Her only supposed recourse is to take medication to force her testosterone levels lower to be qualified as a woman. Of course, the organization is now defining who counts as a woman to qualify for women's sports is likely making socially constructed categories.
The words "bed" and "sofa" aren't arbitrary social constructs just because sofabeds exist.
What race is Obama? And how is your answer to that question not a social construct?
If the defining physical differences between races is both melanin and sternum width, that doesn't seem to be more relevant than just skin melanin.
* In many cases. It's more error prone than most people want to admit.
It's more that humans have been able to define, with significant disagreements, what race someone is with their stupid human eyes. Race is nothing but a collection of these judgments. I'm not brown because I'm black, I'm black because I'm brown.
Sorry, that's also true.
My asterisk was talking more about the inability to visually tell the difference in many cases. The black people in America who could pass as white in the 1950's (or now). Or how Latino and Middle-Eastern people often get confused. There are numerous cases, both specific and general.
But yes, it's not an objective truth being measured, and I'm sorry if I implied otherwise.
Perhaps the best example is the American views on what defines blackness. Because I grew up in a community where mixed race white/black was seen as a distinct race from white or black, I have a very hard time interpreting race the way Americans do sometimes -- Barrack Obama and Kamala Harris's mixed parentage is obvious to me, which makes them obviously not black to me (think like in the same way Obama is obviously not white) and I have a hard time wrapping my head around Americans seeing them as such, but apparently they do, since even a small amount of physical traits that suggest recent African descent categorizes you as black there.
This is not true. I had a college professor explain the "race is a social construct" idea, and her position was staunchly that there was _no_ biological basis for race. See also this article by the scientific american:
> Today, the mainstream belief among scientists is that race is a social construct without biological meaning.
This is the idea that GP is responding to - clearly there must be some biological basis for race, if an AI can determine race from an x-ray.
Not necessarily; they could easily reflect societal differences. Bone density could be affected by diet, or medical care received, or environmental factors in poor neighborhoods, which might easily vary by race without there being a genetic cause for that variance.
If it were a simple "all black people have a bone density of 3; all white people have a bone density of 2" you could pretty solidly conclude a biological component, but we're in the realm of small variances and probabilities that confound things quite a bit.
I think the most likely explanation for these results is something like https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-fr..., personally. Especially with the 8x8 pixelation example.
I'm pointing out that's not necessarily true. First, we have to prove that the AI is determining race from an x-ray. It's effectiveness at doing it via 64 pixels makes me skeptical that it is.
After that, we get into sticky territory determining whether any biological differences we can identify between races are caused by genetics, or environmental/societal factors like disparities in healthcare, diet, neighborhoods built over old SuperFund sites (https://en.wikipedia.org/wiki/Love_Canal), etc.
"Black" represents a wide range of genotypes, many which differ more from each other than from "white" individuals and populations, even if there may be other tendencies like bone density and novel genetic features that appear more commonly or exclusively in some subsets of the "black" population. The skin colour phenotype being (usually) darker just happens to be very easy to notice and have acquired a lot of socially constructed meaning. Except in the narrow context of skin tones, it isn't biologically meaningful to consider "black" people as a particular group though, particularly not compared to more specific genetic markers that don't have the same socially constructed meaning...
This does not entail that no biological traits are correlated with race – biological traits are correlated to varying degrees with all kinds of subgroups of people. It does, however, mean that racial categories have no scientific basis.
To expand on this, imagine a Martian scientist studying human biology in isolation from human culture. Such a scientist would not subdivide humans into groups that match common racial categories, as these groupings are arbitrary from a biological point of view.
That said, assuming the paper is just wrong is simply bias.
If there is an identity associated with race, there is no reason to think it doesn’t impact biology. There is no reason to assume it’s genetic.
False. Diet and behavior affect biology amongst other things.
> Also there is a more than hundred year history of scientific research and no genetic (or other) grounding for race has been found.
We didn’t have machine learning or big data 100 years ago.
Yeah, so if a South-Asian orphan is adopted into a Swedish family, he magically ceases to be of whatever race were his parents and becomes white. That's... not how the concept of human race works.
And it’s entirely possible that if he was adopted at a young enough age, whatever this AI is detecting would read him as white too, assuming his habits and diet affected his development, as they might.
Indeed that would be consistent with your claim that there is nothing genetic about race.
In general, arguments of variation within a group are not arguments against considering between-group differences, or those between-group differences being real. The variation within a category cannot be dismissed, though, it's hugely important for understanding the world properly and for guiding personal conduct.
This is often repeated, but the point of the question is that the OP calls this into question.
Slavery, Race and Ideology in the United States of America: https://u.pcloud.link/publink/show?code=XZ3bwqXZT2m8MI2egSRA...
Side note: If you've seen Ken Burn's docs then you've probably seen her before: https://en.wikipedia.org/wiki/Barbara_J._Fields
I'd say that that only occurs in the odd case of 'Latino'. For some reason Spanish-speaking got amalgamated into a thing it didn't belong.
I'd be surprised if anyone in Europe even thought of themselves as 'white' until (perhaps) the post-WWII era. Wogs begin at Calais.
>Africa is also very diverse when you take into account Northern Africa.
I suppose that you could consider North Africa as a separate continent given the bordering desert. My understanding is that sub-Saharan Africa has more real-deal genetic diversity in human populations than the rest of the world put together (which makes sense given it's age).
Obviously race is a real thing. Either a giant tree of relatedness with clusters of appearance + small construction differences or simply a way to form self-interested groups (go to any prison for 10 minutes). The fascination with it as of late is unfortunate, but I'm not sure if it's a reflection of resource depletion/overpopulation, a wave of quasi-religion, or people simply forming up teams for a big fight.
Biologically, forget about bone density etc. Skin color, facial features, face shape, etc are discernable directly.
Race as a social construct is the idea (originally rooted in imperialism and colonialism) that certain races are inferior mentally and societal development wise. A few Brits saw Africans residing in huts and living on farming using primitive tools and concluded that they could not develop any further than that and that their brain development was limited. (Of course, this was perpetrated to enable guilt free enslaving and "civilizing" them and exploiting them for labour. I am sure that if African societies were simply introduced to western civilization and allowed to trade and travel, the ideas from west would have been adopted and assimilated much quickly)
So, biologically, races are distinct, identifiable and have evolved to meet the needs of their local environment. But socially, races as inferior or superior was perpetrated with ulterior motives and have been shown as false time and again.
I saw a visualization of the clustering on twitter.
To quote Ruth Wilson Gilmore: “The racial in racial capitalism isn’t secondary, nor did it originate in color or intercontinental conflict, but rather always group-differentiation to premature death. Capitalism requires inequality and racism enshrines it.”
Cedric J. Robinson (among others) have discussed how capitalism and racialization are continually co-created.
1. Abolition Geography and the problem of innocence, in Futures of Black Radicalism.
Cultural relativism is not the norm and should not become such since there are differences between cultural traits where it is possible to state that some are objectively better than others. As an example, the cultural trait of genital mutilation is objectively worse than that of leaving girls' bits alone - and I'm open to stating the same about boys even though that would raise up a storm of protest. The cultural trait of parents marrying off their offspring without said offspring having a say in the matter is objectively worse than than that of having the offspring decide for themselves who they want to share their life with. The cultural trait of having people who achieved success within the bounds of the law - whether those be inventors, writers, athletes, successful farmers, builders or architects or anything else - is objectively better than that of having successful criminals and hoodlums as role models - yes, "street culture" with gang bangers as role models is objectively worse than whatever name can be given to cultures which have/had those inventors (etc) as role models.
X-rays can not be used to detect whether you might mutilate your newborn's genitals, marry off your 5yo daughter to your 20yo nephew or leave your children to be raised by the local street gang leaders since these traits do not depend on the colour of your skin even though there is often a correlation; correlation does not imply causation . Take for example Michael Skråmo , a Swedish-Norwegian man who very much looked the part of such but ended up as a recruiter for islamic state in the Nordic countries. Contrast him to e.g. Luai Ahmed, a Yemeni refugee who lives in Sweden and is a vocal critic of everything Skråmo stood for. It was not Skråmo's white skin and blonde hair which made him ready to pick up a Kalashnikov, it is not Ahmed's brown skin and black hair which made him averse to the negative cultural traits related to islam.
MLK was right when he longed for a society where people would be judged on the content of their character and we were well on our way of achieving that goal. Unfortunately there are those who derive their identity - and income - from their purported position as fighters against racism (without scare quotes), a fight which was nearing its conclusion. While most old soldiers fade away  some have taken it upon themselves to revive their old enemy so as to keep their purpose - and income - alive. Their culture is not mine and I consider it to be objectively worse than, e.g. MLK's. If you then consider that MLK was a "black" man while I am of north-west European descent and as such have "white" skin the truth becomes clear, it is not the colour of our skin which makes us alike - it is the content of our character.
Race is not a social construct. Culture is. Nature is not a social constrict, Nurture is.
You would think so, but the proliferation of affirmative action among tech companies and prestigious universities says otherwise.
"Because the variation of physical traits is clinal and nonconcordant, anthropologists of the late 19th and early 20th centuries discovered that the more traits and the more human groups they measured, the fewer discrete differences they observed among races and the more categories they had to create to classify human beings. The number of races observed expanded to the 1930s and 1950s, and eventually anthropologists concluded that there were no discrete races. Twentieth and 21st century biomedical researchers have discovered this same feature when evaluating human variation at the level of alleles and allele frequencies. Nature has not created four or five distinct, nonoverlapping genetic groups of people."
It's a social construct, but it's not fully arbitrary, being historically (and currently) used as a proxy for ancestry, and thus genetics.
>That we divide not based on actual genetics but visible markers makes it a social construct
This is going too far towards the other end. Physical characteristics stem from genetic variations, and consistent patterns in appearance are often linked to some shared ancestry. It's not the end-all, but it's hardly without cause either.
The social construct argument just says that the specific categories and lines we draw are fairly arbitrary. Why is an afghani middle eastern (white?) but a Pakistani south-Asian? Is a Russian from Vladivostok really more closely related to a brit than a Mongolian? Idk - but, to me, the social construct argument just says “who cares? The specific groupings are pretty arbitrary anyway”
It's location based. Humans only recently gained the ability to travel vast distances, and in the past lived (and bred) within a small, localized region. The "arbitrary" location based ethnicities actually do reflect that genetic ancestry
Of course, really wide ones like black/white/asian lose some of their meanings.
The question is what about the looks of a chest X-ray are connected to race. I agree with the research here, it's non obvious what is being extracted by the AI.
If I had to guess, maybe something about the quality of the scan itself. Perhaps one race was scanned at one particular hospital, vs a different hospital scanning a different race. Then it's just picking out the different scanner.
With that said, the simple explanation is that the AI picks up on these small patterns in a way humans don't. The brain and neural networks are fundamentally pattern-recognition engines. The AI is just seeing something we don't either notice or can't see.
They do not, actually. Incapable of it; cheetahs are the only big cats that can do it.
> Purring ability, rather than size or behavior, is one of two chief distinctions between the two main genera of cat, Felis and Panthera.
We end up segregating ourselves for a variety of reasons, in which case groups that are physically very distinguished end up forming almost an ethnic basis.
For example, two groups with varying genetic makeup and maybe a number of non-obvious biological differences, but who otherwise looked identical - would have the similar life experience in terms of their social treatment by other groups.
But irrespective of how a person is socialized - if you're Black, people are going to treat you one way, and if you're White, people are going to treat you a little different. That 'lived experience' differential is a somewhat unavoidable.
The degree of that variability is obviously debatable, but surely it exists to some degree.
I suppose you could make a parallel in ethnicity: a century ago, the difference between a Scottish-American and an English-American would have been apparent by lineage, accent, Church affiliation, and that might have affected relationships, status etc..
Whereas after a few generations of integration, there is definitely 'no' (or not much) difference between those two groups, and no vector for differentiation/discrimination. The historical ethnic situation was a 'social construct'.
That said, some of the argumentation used to promote the idea that there is no genetic basis for race is a little odd, the 'Africans have more genetic variation than other groups combined' is often used, but frankly I do not understand how that doesn't mean there are material differences between them and other groups.
And of course there is no 'hard line' between groups, but there is also no 'hard line' between the Scottish and English, there are many people who have attributes of both cultures, but that doesn't negate the existence of either group.
I think we're a bit oversensitive these days to these issues. Systematic racism exists and we should think about it, but that doesn't mean there's a boogeyman behind every door.
I think in this case it's also worth examining what exactly the AI is finding out, because it may not be just 'bone marrow'.
or the paper itself
I see a lot of "oh it's probably just picking up on x y z" when x, y, and z are things they explicitly checked for:
1) "It's probably just the names or other metadata" – they only gave it pixel data to train on. To control for things like metadata overlaid on the image (e.g., a name written on the image) they divided the images into 3x3 sections and trained classifiers on each section separately.
2) "It's probably some artifact of how the hospital marked up the images" – they used something like 7 different datasets from different hospitals and different modalities (X-Ray and CT).
If it is cheating somehow, it's not doing it in an obvious way that you can think of in a minute or two. Also note that they had more than just medical folks working on the paper; the author list includes plenty of computer scientists. It's unlikely they're making an elementary ML mistake here.
This gives the CNN more information on one race than another, which can create a classifier that performs very well on the training and test data it has access to but then flakes spectacularly on data outside the training set (because the source isn't representative of the total variance in the global population).
The fact that it works on an 8x8 massively pixelated version of the x-ray points to the possibility that it's not actually working, which would be bad if you based patient treatment decisions on an training set that was actually teaching the AI something else entirely.
What do you mean, not working? That the AI was randomly choosing the correct race 82% of the time by luck?
I'm confused by what your implying because it would seem to me that the authors went through many steps to try to pinpoint how the AI was doing this identification and how baffling it was to everyone that even with a lot of x-ray information removed (8x8 pixels compared to say 4k), it somehow was still correctly picking the race.
What would this "something else entirely" that you are implying actually be?
No; as with the article I linked elsewhere in the thread (https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-fr...), that the AI might have found some other indicator, like filenames in the data set, or metadata in the images that included patient name, or differences in the length of patient name (often redacted by black rectangles in x-rays in training data), or any number of other factors.
This happens all the time in science. As another recent example of "whoops, turned out we were measuring the wrong thing", https://en.wikipedia.org/wiki/Faster-than-light_neutrino_ano...
Another example around AI: https://www.vox.com/recode/2019/12/12/20993665/artificial-in...
> One such résumé-screening tool identified being named Jared and having played lacrosse in high school as the best predictors of job performance, as Quartz reported.
Are lacrosse players naturally better workers? Probably not. Are they probably whiter, wealthier, better networks, etc. than the average population? Probably. These sorts of things - as with the 8x8 pixel example - start to point to confounding variables that need to be worked out and accounted for.
The paper quite explicitly goes into testing and disseminating what exactly the AI detects. Two observations:
- the classification clearly was primarily based on the visual content rather than spurious metadata, because various transformations of the visual content had the expected impact on classification correctness
- the classification clearly wasn't based on one specific feature of the visual content but rather on multiple factors in the visuals, because various transformations to features (including masking out specific features like bone density) produced results matching expectations (usually gradual decrease in accuracy, with some thresholds).
Conversely, if the classification was primarily based on factors other than the visual content, the visual transformations would have had negligible effect - possibly up to a threshold, and then would throw the AI completely off.
The same may be true here, and I think it's the most likely explanation.
I'd be interested in whether the same model can be trained to predict patient wealth, hair color, style of clothing, religion, etc. from the same x-ray data sets.
While "faster than light neutrino" was highly unexpected and rather suspect from the start, the "bone geometry differs slightly between ethnic groups" is well established among the anthropologists of humans. There are also parallels in wider biology of animals - mentioning that to underscore it's as scientifically expected, and not merely construed for humans alone.
The question here was how exactly is AI detecting it this well from chest X-rays; the question centered around AI and possibly if it would unexpectedly influence the medical processes - rather than around the bone geometry itself.
For sake of example, a random link from google search: https://www.researchgate.net/publication/24427702_Ethnic_dif...
This specific model's ability to do it from a 64 pixel version of said x-ray makes me skeptical it's doing so successfully.
That's actually a great example of this problem, though.
> It’s a startling image that illustrates the deep-rooted biases of AI research. Input a low-resolution picture of Barack Obama, the first black president of the United States, into an algorithm designed to generate depixelated faces, and the output is a white man.
> It’s not just Obama, either. Get the same algorithm to generate high-resolution images of actress Lucy Liu or congresswoman Alexandria Ocasio-Cortez from low-resolution inputs, and the resulting faces look distinctly white. As one popular tweet quoting the Obama example put it: “This image speaks volumes about the dangers of bias in AI.”
The notion that a classifier can reliably identify race based on an 8x8 grayscale is risible.
The same is... not remotely true for humans, or even two chest x-rays of the same human.
Seeing as this would be easy to do, I imagine that if it is at all plausible from what they know that it is getting information from anything other than the x-ray scan, that they would have already tried this?
I do wonder how good of a predictor something would be if it just went off the average brightness of the image. Probably very bad, but maybe better than chance?
Well, better than chance on the training set is to be expected, the question I guess is whether it would be better than chance on the test or validation set (I’m not confident in my understanding of the distinction between testing set and validation set. Is the idea that if you are using the score on the testing set to decide when to stop training, and maybe what hyper parameters to use or something, and other things to determine which model, you only try the model on the validation set once you have decided on your final version of the model?)
It's confusing, not least because people refer to "testing" when they mean
So, suppose you have a dataset, let's call it D, and it doesn't matter what's in
it other than "instances". To train a classifier you start by creating two
partitions of D: a trainign partition (the "training set"), and a testing
partition (the "testing set"). We'll denote them by T₁ for the training set and
T₂ for the testing set.
It's typical to use most of D as a training set, for example you may choose 80%
of D to be T₁ and 20% to be T₂. Obviously T₁ ∩ T₂ = ∅ and T₁ ∪ T₂ = D.
Now, because T₁ is four times the size of T₂ it's very likely that when you test
your classifer on T₂, it will appear much better than it is, just because most
of the instances in T₁ aren't represented (by similar instances) in T₂. This is
called overfitting to the training set. One way to mitigate it is to perform
cross-validation, the most common type of which is k-fold cross-validation.
In k-fold cross-validation, you further partition T₁ to k partitions, or
"folds", and then hold out each i'th partition, for i ∈ [1,k], use all the rest
k-1 partitions as a training set and test on the i'th held-out partition _during
training_. So you train your classifier on partitions 1 ... k minus i, test it
on partition i, and repeat this process for all i, recording the performance
(accuracy, F1, ROC etc, whatever your metric is). Then you choose the model that
performed the best on your chosen metric.
And then you test it on T₂.
To avoid confusion between the k folds of T₁ that you use for testing your
training models during cross-validation, on the one hand, and T₂, that you use
for testing the model that performed best on cross-validation, on the other
hand, we call the testing process performed on the k folds "validation" and each
i'th subset of T₁ used for validation a "validation set". And we just call T₂
the "testing set".
The confusion arises because we do actually _test_ on sub-sets of T₁. But T₂ is
always the "testing set" and it's never "seen" during training.
As to hyperparameter tuning, this is done _on the testing set_, i.e. T₂. This is
A Very Bad Thing™ but there you go. Once you train a classifier and find out
that it sucks on T₂, what do you do? Well, you tune the classifier's
hyperparameters. Or do a grid search to automate the process. So eventually you
overfit your classifier to the test set, because you now essentially have no
"unseen" data instances in T₂ - the classifier didn't see the instances in T₂
during training but the trainer did, or, worse, the grid search did, and the
classifier's hyperparameters were tuned according to that knowledge. How to
avoid that, is a big question, but anyway that's what is done in practice, and
the reason for that is that when you do Big Data, you end up needing so much
data that despite having terrabytes of it, you never have enough.
That... that doesn't influence one of the presumed ways the NN categorizes images: the trend in bone geometry. The "blobs", while fuzzy, still largely retain the relative proportions to each other. Or, in other words, proportions of image elements are invariant for operations of scaling and of blurring.
The fact that trained neural networks cannot tell us why they give an answer and the best tool we have to explore that is to wiggle the inputs and see how the black box responds is a major concern for the whole space. Figuring out how to tag data with enough information to generate a "why" was an active area of research ten years ago and still is.
One of the points of building these systems is to do better than human-driven.
The practical one is that errors in a machine system scale, as do most things with machines. If I have a single bad X-ray tech who is applying the wrong medical process because I have a different race, for some reason, the damage that tech is doing is limited to whatever specific set of patients they are seeing. If a similar error occurs in a popular machine classification tool used widely by a hospital network, the damage is widespread. It is a plus that the machine can be corrected and the correction also scales, but with the (relatively speaking) stone tools we use to understand why a CNN makes its decisions these days, every fix risks breaking something else we're not testing for.
The first psychological reason is that machine learning systems break in "alien" ways. They don't make the kind of mistakes humans make... They make mistakes as a product of their machinery, which means it's much much harder to predict what those mistakes will look like for an average operator. As a frequent example, it's pretty rare for humans to misclassify human beings in photographs as apes, or to fail to recognize a face in an image because the skin is too dark. That's a failure mode that happens over and over again with image recognition systems.
And the second psychological reason is that humans don't trust machines to make human decisions yet. And that mistrust doesn't extend to other humans, even though we're incapable of cracking open another human's mind and understanding their thought process at the mechanistic level. It doesn't matter... we are the same organism and have a shared experience and empathy with them that we lack with machine recognition systems. It's semi-irrational, but it can't be wished away. A system for understanding why a machine makes decisions would be a step in the direction of addressing those concerns.
Perhaps hospitals that treat a disproportionate share of poor people (which themselves are disproportionately not white), tend to use a different brand of X-ray film, and that brand has different contrast ratios than that of the brand preferred by rich hospitals. Thus, they'd be detecting the different brand of X-ray film rather than anything about the patients themselves.
Of course, at this level it's still hard to imagine generating that 82% hit rate. But maybe there are multiple factors along these lines.
Most of us radiology folk abandoned film 20 years ago and went to digital systems (CR or DR). This doesn’t negate your query though, as vendors do have different technologies and their images do not look the same.