Hacker News new | past | comments | ask | show | jobs | submit login
AI Reduces the World to Stereotypes (restofworld.org)
184 points by gigama 7 months ago | hide | past | favorite | 133 comments



It seems to me that they are asking for stereotypes and getting stereotypes.

If you'd ask me to paint an Indian person, of course I'd paint a stereotype to make sure it looks Indian, and not some normal person from India that could be from anywhere.

Or like imagine playing one of those games where you're supposed to guess the prompt of what your friend is drawing. This is sort of like that, isn't it? The AI is creating an image that would have you look at it and think "an Indian person", not just "a person".


Absolutely!

> “A Mexican person” is usually a man in a sombrero.

They are asking for an image of a Mexican person.

If the result shows an infant, or a person in a business attire in front of a glass building, or a construction worker, or someone competing in a bike race, how would we know they're "Mexican"?


> They are asking for an image of a Mexican person.

Exactly. Even more so, they're asking for an image that is likely to be described as depicting a Mexican person. It should be obvious why, without additional detail in the prompt, the model will reach for features that'll make it obvious the person is a Mexican to a viewer with no other context - hence exaggerated stereotypes.

Artists do that too, all the time, if they want to communicate nationality in a "show, don't tell" way, and there is no other indicator in the image (such as location, or overall topic of the work).


But a Mexican could be a man in a suit.

That is, what they say. If you ask midjourney for the image of a Mexican man it produces this cliche. Most Mexicans don't look like that. Most will be closed to the Mexican in a suit.

I think, the result would be more diverse if you look for stick photos of Mexicab men.

This example only shows, what you can expect from every other prompt: a bias for cliches.

And yes, you are right: that is, what you ask for in Midjourney.

But you should know, that it's always a cliche.


> Most Mexicans don't look like that.

By definition this is true of any group. Most people do not look alike. There’s no way to have a single picture of what “most Mexicans” look like.


So just ask for a man? if they should have tan skin, ask for that, if you need them to have latinx facial features, specify that. If you ask me, asking for it to generate 'a mexican man' is it self a little 'problematic' and so you get a slightly 'problematic' image in return. 'Racist' prompt in 'racist' prompt out you know?


> If the result shows an infant, or a person in a business attire in front of a glass building, or a construction worker, or someone competing in a bike race, how would we know they're "Mexican"?

If you asked an AI image generator for "a Mexican" and got back someone standing while wearing a biking suit (but with no bike in sight, mind you), do you need to know that the person in the image is Mexican? Do you need to verify by doing a visual check for stereotypical features?

Or would it be good enough for you to just believe that the AI gave you "a Mexican" and hence accept the image as a valid answer to your prompt?


https://www.pexels.com/search/mexican%20person/

Quite a bit of variety there, looks pretty mexican to me!


I would argue that one could complain about the vast majority of images on that page as also depicting some sort of Mexican stereotype.


Indeed, your example explains what's going on perfectly.

Generative AI images are "plausible description generators" with a human in the loop.

They aren't trying to draw something, they're trying to get you to call what they draw something.

Given a prompt from a human, they produce an image likely to be labeled as such by a human.

"Well sure that's a Mexican person but I meant..." is not a valid caption.


Yes variety but lots of sombreros and maybe 50% dressed for day of the dead


They did an experiment: ask for a poor person and you get a black person. Then they tried to buck the stereotype: ask for a white poor person, and often you still get a black person.

This was my experience too with NightCafé models. Once I wanted to refine a model to do a detail differently, but even with exaggerated weighting I was not successful.

The models are fed with stereotypes, so they don't know to spit out something not a stereotype.


Often this comes down to how well and descriptive the training data is labelled. If you just label a picture as "man" or "woman" you are not going to get good results compared to something like "face of a caucasian man of Italian descent in his mid 20s with red hair, green eyes, and pale skin, with a blue background".

You also need consistently labelled data so that the model can have a chance to learn the differences properly.

I've also seen the image models not understand context, so if you ask for e.g. "green eyes" then it will often place the image in grass/a green background, select green clothes, etc. -- i.e. it is only learning the association of the colour and not the association to a particular facial feature.

The image models are very bad at feature shifting and not understanding how features combine -- resulting in things like multiple arms because two of the images it is splicing have the arms in different positions.


"This is not a pipe"


How's that stereotype? Poor people are on average more likely to be black. If you asked for samurai would you complain about stereotypes if it gave you asian samurai if you asked for white one?


Completely agree. The article says this:

"Nigeria is home to more than 300 different ethnic groups"

Followed by this:

"But you wouldn’t know this from a simple search for “a Nigerian person” on Midjourney"

So they literally asked for a generic Nigerian person instead of specifying something like a Yoruba Nigerian and complain they got a generic Nigerian person? If the model isn't trained with explicitly labeled Yoruba Nigerians, that's a training problem.


The problem with this specific instance is that the images generated mix and match characteristics that are unique to those ethnic groups. So the models are reducing real ethnic differences to simple stereotypes. In other words, the models are wrong, and wrong in ways that eliminate diversity.


There’s a saying, “All models are wrong. Some models are useful.”

No matter how granular you get with specific ethnic groups, it’s not possible to capture the long tail of all the types of people who exist, and all of their appearances.

If you ask Midjourney to draw a man, should he be wearing clothes? A man might be naked. Should he have two arms and two legs? Some men don’t. What about two eyes? What color skin should he have?

The fact that Midjourney will never draw a third degree burn victim when simply asked to draw “a man” isn’t a flaw in the model. The model is biased, yes, but it is biased towards utility.


It's biased towards uniformity. What we observe in the article above is a distinct lack of variance in the model's output. One way this lack of variance comes across is as cultural bias, but it is also striking how flat and homogeneous are the results, even for 100 generations, given the same prompt. You'd expect some variety- but all the Indian men aren't just 60-year old sadhus, they are all slight variations of essentially the same 60-year old sadhu.

For me, the salient observation is the complete lack of any kind of creativity or anything approximating imagination, of those models, despite a constant barrage of opinions to the contrary. Yes, if you asked me to draw you "a mexican man" (not "person") I'd start with a somberro, moustache, a poncho, maybe a donkey if I was going for a Lucky Luke kind of vibe. But if you asked 100 people to draw "a mexican man" and it turned out they all converged on the same few elements you'd nevertheless have 100 clearly, unambiguously different images of the same kind of "mexican man", often with the same trappings, but each with a clearly distinct style.

It is this complete lack of variance, this flattening of detail into a homogeneous soup, that is the most notable characteristic, and limitation, of these models.


> It is this complete lack of variance, this flattening of detail into a homogeneous soup, that is the most notable characteristic, and limitation, of these models.

And yet when hands came back with beautiful variations in finger count, people were unhappy.


Ethnic differences among groups that aren't extremely isolated are mostly gibberish perpetuated with cultural identity politics.

Offspring tends to go in one direction or another so one group may end up inbred, but a mix is more accurate than compiling the beliefs of human cultures about their genetic traits and isolation from each other.


Actually no, they are reducing different ethnic groups to what is becoming the current norm: everyone being mixed.

Inside cities, no one is specifically looking for a member of their own tribe to marry so the ability to identify ethnic groups by facial features is collapsing.


> of course I'd paint a stereotype to make sure it looks Indian

Would you? That's pretty boring: Given the vagueness of the prompt, you're actually free to paint anyone, from Kumari Mayawati to Satya Nadella.

Not doing "the obvious", whether that's a harmful stereotype or just a tired trope, is part of what makes art art. But from the images, of which there are hundreds, all of them extremely similar, I wouldn't think "an Indian person". I'd think something far more specific: an old bearded Indian man wearing a turban. Which is sort of the article's point.

Interestingly, trying the same prompt in my local installation of Stable Diffusion, I got quite a lot more variety in terms of age and sex (though I couldn't really escape turbans and bindis). So this actually seems fixable even for very vague prompts, despite the implication of your comment that the problem is with the user.


> Would you?

If I were to be honest, yes. There would probably be a lot more diversity in my paintings than demonstrated in the article, but ultimately my experience would be limited to what I see in the immigrant community, popular culture, and the news. For the most part, those are very narrow slices of Indian society. More important, it will reflect what I see most often in those categories and is unlikely to reflect facets I rarely see.

If anything, AI art could probably do better than I when properly prompted. One could choose someone who would is likely to exist (a farmer in India or a university student in India) and the model would likely have some "idea" of what they look like. Perhaps a language model can massage vague prompts to create more specific and representative ones automatically, to further reduce individual bias. (I say reduce because it's ultimately limited to the data that has been fed to it, but it should have a broader scope than an individual person has.)


Why should we lower the bar on AI models to your superficial understanding of Indian culture?


>Would you? That's pretty boring: Given the vagueness of the prompt, you're actually free to paint anyone, from Kumari Mayawati to Satya Nadella.

this would be a valid point if the person doing the painting was of sufficient artistic ability that they could paint a picture of a specific Indian person and have it be recognizable, if they knew what specific Indian person would be recognized by the person requesting they drawn an Indian person.


This response demonstrates the same issue as the OP, which is to think like an engineer and attempt to reverse-engineer the design goals of the software rather than to consider the prompt in and of itself, without context.

If you commission someone to paint "an Indian person", would you withhold payment if they painted a specific Indian person, or an Indian person not in traditional dress? (And, to be clear, Midjourney is certainly capable of doing this recognisably). Hopefully you would instead be happy with the result, because it would be what you asked for -- if you specifically wanted a "stereotypical Indian person" you would have asked for that instead. "Be recognised by the largest amount of people" is not typically the goal of an artistic work. Is it the goal of Midjourney? Well, to the extent that it is, that's the problem that the article is pointing out: if you attempt to cater to everyone, you will necessarily produce a picture which is at best conventional and at worst extremely stereotypical.

A few seconds of playing around with Stable Diffusion shows that this need not be the case, so the article actually points out a specific deficiency of Midjourney.


You said: >Given the vagueness of the prompt, you're actually free to paint anyone, from Kumari Mayawati to Satya Nadella.

let me re-emphasize

>you're

the abbreviation you are is evidently in reference to a person, a person free to paint anyone. Specifically you asked the previous person if THEY would paint a stereotype if asked to paint an Indian.

If I am free to paint anyone when asked to paint an Indian I will never paint a specific Indian and always attempt to paint a stereotype because my personal painting skills are not good enough to paint any specific Indian and have them be recognizable by anyone.

I assume the abilities of Stable Diffusion and Midjourney are actually good enough to paint a specific Indian, their abilities are definitely greater than mine when it comes to 'painting'.

For some reason you decided my response had something to do with Stable Diffusion and Midjourney from an engineer's perspective, rather than the specific subject of what a human would do if given the same prompt.

I don't know why you would make this mistake, maybe the response demonstrates the tendency of engineers to misunderstand the meaning of simple texts if they do not match up to their preconceptions?


Datasets of photos with emphasis on striking detail created before generative AI weren't tagged with AGI in mind, and I think that that wasn't the "first problem" of AGI says more about the difficulty of creating effective, quality tagging and metadata than anything else.

For instance, SDXL produces very different results when you expand your prompt vocabulary even marginally. Pairing prompts with Hindu, Sikh, desi, Telegu, Brahmins, North/South, Kerala, country/city, etc provides detailed and diverse results, and that's all pretty generic. It also recognizes clothing styles and types, food, holidays and events, and it even generates recognizable background details and architectural styles with regional prompts. Also, to their example, "Jollof rice" beats prompting "Nigerian food" if you expect to see jollof rice.

I plug this to artists who also teach, but this is a great way to show the tremendous value of the arts and art history. Start tagging for training, make better datasets, and license them. People think they're slick because they know how to prompt "cool picture, in the style of $artist", but most of the world doesn't know what filigree, sfumato, or Rococo are. Guess who does? Their art students.


I think this is an interface problem. Given a generic prompt, the AI draws a generic image, like a beginner would. Things you don't specify explicitly default to generic options.

A more sensible response would perhaps invent additional requirements to get more interesting and more varied outcomes. The right amount of variance depends on the context, but it's rarely as low as the current interfaces default to.


I've wrestled with StableDiffusion and it's very, very biased.

I wanted a photo of an average-looking older woman and it was unbelievably hard to get it to produce that. And even after some very detailed, emphatic prompts the results still weren't as good as generic stock photo - never mind someone you might see in the real world.

SD believes most women are in their 20s and have big boobs. It's comically obvious if you try to get any fantasy art out of it and you want something that isn't big-boobed porn.

It's a content problem, not a prompt problem. It's been modified since to make it less porn-y but it's still a very long way from supporting straightforward prompt access to the face and body space most humans live in.

So it's a fair criticism to say it's stereotyping. Most of what comes out of it is a white middle class male's idea of what [thing] looks like.

This is inevitable with small training sets. AI is basically data compression. But the lack of awareness that the output comes from a lumpy dumbed down version of the training space is worrying.

It's textbook worse-is-better - narrowing experience and possibility towards flawed mediocrity, with the firm implication no one should have higher expectations. Because that's as good as it gets, and it's fine.


> Most of what comes out of it is a white middle class male's idea of what [thing] looks like.

ITT people complaining about stereotypes while stereotyping.


Why do you think it's reproducing what "white middle class male's idea of what [thing] looks like."? This seems an incredible leap. And incredibly racist and sexist.


> SD believes most women are in their 20s and have big boobs

You means the community-made checkpoint.


I actually heavily disagree with this. Most AI has one specific depiction of a thing and really struggles to get away from that specificity, which honestly is something I consider to be a fundamental failing of ML modeling currently.

For example: trying to get chatgpt to write about psychotic post-partum symptoms, getting stable diffusion to produce a realistic looking woman above the age of 50, or writing an immigration narrative that isn't "home country bad, new country better".


> "home country bad, new country better".

I think the deeper problem is that it only writes happy endings.

A while back someone pointed out that Dracula, Sherlock Holmes, and Winnie the Pooh had all become public domain characters on the same day, so I tried asking it for a story that combined them — it read like I expected it to (a terrible premise written with middling skill), but it also insisted on wrapping everything up with a twee "and then all three of them went on more jolly adventures" kind of ending.

Likewise the time I asked it for one about alien invaders, where it wrote them turning (without good reason) from villains into friends at the end.


I've found that problem is pretty easy to counter with a bit of extra prompting.

"Give it a surprising, dark ending" or "add a twist".

The majority of stories people tell have happy endings, so it's not surprising that it defaults to those twee resolutions.


True. This doesn't explain the obvious bias for the prompt "an American person" though. All the results are of young and beautiful people, the vast majority of whom- in contrast with all other countries- are women. That, to be honest, is an over-idealized representation that doesn't match at all with my stereotypical image of "an American person".

(This would be a better fit: https://artblart.files.wordpress.com/2011/01/duane-hanson-to... )


I've seen many a "Bob" and "Linda" on cruises before.


That is indeed a better fit. Personally, every time I visit the USA, I am reminded that it is a country of morbidly obese people in so many customer-facing jobs. I sort of forget about that in the interim, because media depictions of Americans show more photogenic people.


AI systems have a tendency to overamplify human biases and stereotypes to the point that it looks ridiculous even to most (not particularly "woke") humans.

If you told an actual artist to draw 5 pictures of Indian people, I doubt you'd get 5 old men with Turban and beard. Most people understand that reality is more varied than this.

This reminds me of a paper my former coworker wrote about how Google Translate, a couple years ago, would misapply gender stereotypes to gendered nouns in a way that humans wouldn't. The world "table" translates to German as "Tisch" (where you eat; masculine) or as "Tabelle" (in a spreadsheet; feminine). It turned out that when accompanied by an adjective stereotypically associated with masculinity (e.g. "strong"), the system would translate "table" as "Tisch", but in the presence of a stereotypically feminine adjective (like "soft"), it would pick "Tabelle". This is ridiculous, no human translator (not even the most sexist) would do that, as we understand that grammatical gender isn't biological or sociological gender. But the AI system somehow can't say "I don't know what the translation is, it's ambiguous" and so it just makes up a pattern where there should be none.


> If you told an actual artist to draw 5 pictures of Indian people, I doubt you'd get 5 old men with Turban and beard. Most people understand that reality is more varied than this.

You have to keep in mind that with these models, it's not like asking an artist to draw 5 pictures of something - it's like asking 5 different artists, who don't know about each other, to each draw a single picture of something.

Generated images are independent, there's no system there to notice it's generating multiple images from one prompt, and thus might want to ensure they're not too similar. I hear OpenAI is hacking around this with DALL-E 3 by having the prompt preprocessor (GPT-4 expanding your prompt) inject stuff like "diverse people" many times in the expanded prompt, to bias things the other way.


> I hear OpenAI is hacking around this with DALL-E 3 by having the prompt preprocessor (GPT-4 expanding your prompt) inject stuff like "diverse people" many times in the expanded prompt, to bias things the other way.

I just asked GPT-4 for images of an Indian man, and it created four separate prompts to pass to Dall-E.

  1. Photo of an Indian man wearing traditional attire, standing against a scenic backdrop with a serene expression.
  2. Oil painting of an Indian man in a kurta, playing a sitar under a banyan tree.
  3. Illustration of an Indian man in modern clothing, holding a cup of chai while reading a newspaper in a bustling city.
  4. Watercolor painting of an Indian man practicing yoga in a tranquil setting near a river.
When asking for "Show me photos of diverse Indian men" the prompts become:

  1. Photo of three Indian men from different regions, each wearing distinct traditional attire, standing side by side in a vibrant market setting. (The resulting image literally looks like triplets in different attire)
  2. Photo of a group of Indian men from various descents, engaging in a conversation at a local tea stall.
  3. Photo of young and elderly Indian men, representing diverse backgrounds, enjoying a game of chess in a park.
  4. Photo of Indian men of diverse ages and regions participating in a traditional dance ceremony. (This one was funny. It was a bunch of Indian men sitting with their legs crossed with one Indian man in a cross legged position floating above all the rest)


I actually think talking to 5 independent artists to draw an Indian man would still produce wildly different depictions than a model, and that's because... well I don't think of turban == indian personally. I think of a brown guy with thick black beard and hair in a t-shirt and jeans... because I work in tech and that's like 90% of the Indian guys I work with. I can imagine 5 different artists would themselves have 5 different ideas of what a generic Indian guy would look like.


Exactly. I assume their preferred solution would be for the AI to refuse to depict cultures, ethnicities or genders, as generalising leads to stereotyping. Postmodernists should touch grass sometime, preferrably outside their bubble.


Those "postmodernists" are truthphobes, using their own lingo.


Damn those vaguely defined generic postmodernists!


I don’t think it’s about fearing the truth I think it’s a right for fear that there is a new system which everyone treats as authoritative and very often the system is wrong. I think it’s worth the question. What are we going to do with this new system that produces fast accurate, looking answers when the answers that AI produces are very often wrong or flawed in someway or present inaccurate answers or misrepresent certain facts or data. I think it’s reasonable to be suspicious of any supposedly authoritative source and to question how we’re using such tools, and what the effect of such tools might be.


to not be at least a little skeptical of one's epistemology is arrogant as hell


I don’t think that follows at all I think what they would prefer is that both AI developers and users of AI systems are aware that this is what is being fed to them and that without purposefully going to avoid stereotypes you’re going to get stereotypes.

I don’t think that’s the article was trying to say that AI is inherently racist or inherently is causing people to be racist, it’s that AI is still seeing as authoritative. If you just ask AI show me a picture of someone from a specific culture ask and it shows very stereotypical result, I wouldn’t call the AI result. Racist what I would say is the problem is that the person viewing this might except that this is an authoritative answer this is what progrsmmers and maths have shown is undoubtably, a true accurate representation of a person from such a culture, and the end-user is no more aware how this picture was formed, or whether or not the AI considers to be a stereotypical representation

I think the focus on the Barbies around the world from AI generation was a good example as it does. Kind of show some very strange interpretations of different cultures. Now granted we need to take this with a grain of salt because we don’t know the queries we don’t know the exact models and stuff like that but that’s not really the point the point is more that AI in person to use AI frequently treat the AI output as authoritative. There is no indication if the AI maybe it wasn’t able to get a good idea of what your request was if they maybe had a lack of confidence in what they are responding, you just get a results and it seems with all the white papers and with the buzz around AI that it’s an authoritative result

I am not a stranger to using AI to assist with tasks, something like a quickly converting from one syntax to another, is something I do on a fairly regular basis the difference, however, between using AI like that, and using it as an authoritative source is that I would check the queries or the code that’s produced by the AI. I would not accept it blindly. I will double check and make sure since at some point I need to run this code in real environments. I think that is what the article was trying to say is that AI is very cool and I can do some very cool things but if you’re not understanding how it’s doing what it’s doing and not checking it output and trusting the AI blindly that is where it has a problem. And I would agree with the article if this is what I’m trying to say,

I don’t think it’s so much about moderating speech or anything like that, not that it must not produce certain outputs; It’s more long lines of there is simply too much implicit trust in how AI works in the output that AI produces. And I would agree with the article if this is what I’m trying to say, I don’t think it’s so much about curtailing speech or anything like that or that so must must not produce certain outputs. It’s more along the lines of people’s Trust in how AI works and in the output that it gives and s stereotypes users are probably giving it bad inputs to produce these outputs, which are quite stereotypical and very often incorrect.

I think, showing more about how the AI understood the prompt and how the output was produced, such as sources, or may be certain key words and examples of images, that the AI associate with his key words would help the users to understand why they are getting this out put and also it would help them understand may be watch the words they use in their props are associated with, and maybe understand a bit more about why their innocent understanding of their problems or questions may be is not as innocent as they thought originally. It doesn’t have to lecture them. It simply needs a show “X is associated with Y in this model and that’s produced Z” and let the users draw their own conclusions.


> “From a visual perspective, there are many, many versions of Nigeria,” Atewologun said.

> But you wouldn’t know this from a simple search for “a Nigerian person” on Midjourney. Instead, all of the results are strikingly similar. While many images depict clothing that appears to indicate some form of traditional Nigerian attire, Atewologun said they lacked specificity. “It’s all some sort of generalized"

It's generalized because that's what you asked for. If it was the other way around and a prompt for "Nigerian person" would return an image of a person from one specific group then these people would complain that "not every Nigerian is Igbo. The other groups are being marginalized by AI."

At least they do explain why that is, and I found it interesting that the prompt for "American person" returned mainly women, so the article wasn't a complete waste of time.

I also raised an eyebrow at the fact that they refer to prompting as "searching" throughout the article.


> It's generalized because that's what you asked for. If it was the other way around and a prompt for "Nigerian person" would return an image of a person from one specific group then these people would complain that "not every Nigerian is Igbo. The other groups are being marginalized by AI."

Is there a problem with returning a member of a subgroup? As long as the AI model tends to invoke a different subgroup of "Nigerian" every time someone prompts with just "a Nigerian", then people can simply generate multiple images and pick whichever one fits their purposes.

If you want a "generic" Nigerian person, then the AI model could also support a "generic Nigerian" prompt. But just "Nigerian" could default to showing an arbitrary leaf of the "Nigerian ethnic groups" tree. This is analogous to having to prompt for "Igbo Nigerian" in order to get "Igbo Nigerian".


The algorithm could return a random Nigerian ethnic group proportional to their actual population.

To be fair that’s probably challenging however perhaps the direction the models should go.

It would be great if the algorithm returned diversity by default.


But then it wouldn't be a generalized Nigerian person. That doesn't exist, so you get an approximation.

Say I make a video and for some reason use genAI to depict nationalities. It has to be a single person so I can't have it generate a group photo of all of Nigeria's 300 ethnic groups, so how do you display diversity in a picture of a single person? With your proposal by chance I get a person from a small minority group, then that is much less representative of Nigeria, and less inclusive than if it just gave me the stereotype. It might even out over time, but most images that are generated will never see the light of day. Now if my video happens to be the one to blow up on the internet for some reason then the rest of the country probably won't be happy that that specific group was used to represent them as a whole.

In that sense using the stereotype is the fairest way since everyone is equally misrepresented.

It's not even fundamentally an AI issue. If I instructed someone on Fiverr to simply "draw me a Nigerian person, no discussions" instead, the result would be the same. It's on the person writing the prompt to decide whether and how to display diversity in whatever they're using the output for.


> Say I make a video and for some reason use genAI to depict nationalities. It has to be a single person so I can't have it generate a group photo of all of Nigeria's 300 ethnic groups, so how do you display diversity in a picture of a single person?

In a sense, you can't display diversity in a picture of a single person, unless you want the AI model to produce "a person with characteristics of many Nigerian subgroups" rather than "a single Nigerian person" in response to the prompt "a Nigerian". Would it be a significant problem to the purpose of your video to show a member of a particular ethnic group and explain that it was a coincidence if someone asks about it? Additionally, if you really want to represent 300 ethnic groups in a single image, then you might still get complaints from people who won't recognize their minority group in an image of "generic Nigerian", which I'm interpreting in your case as "a person with characteristics of 300 Nigerian ethnic groups".

> Now if my video happens to be the one to blow up on the internet for some reason then the rest of the country probably won't be happy that that specific group was used to represent them as a whole.

> In that sense using the stereotype is the fairest way since everyone is equally misrepresented.

This idea of "fairness" is problematic in my opinion. I would not want to be in your position if something like this were to happen to you. However, pleasing many ethnic groups by generating an approximation of an entirely fantastical "person who represents 300 ethnic groups" (the relevant alternative being an approximation of a more realistic "person who represents 1 ethnic group") isn't fair to you. You shouldn't be at fault if Nigerian viewers jump to conclusions about whether you are favoring a specific minority group. You shouldn't be at fault if non-Nigerian viewers unfamiliar with multiple Nigerian ethnic groups take the image you used as representative of "Nigerian" as a whole.

Disclaimer: I'm not even slightly familiar with the social relations between Nigerian social groups. If certain Nigerian groups think of each other with dehumanizing hostility the way some Han Chinese people think of Uyghur Chinese people, then I apologize for being insensitive.


maybe a solution would be if the ai can provide context. e.g. here is an image of a nigerian person with certain characteristics, but the possible result space is bigger and here you can try with nearby prompts which will return different results.


Stereotyping or abstracting is how we can generalise and reason about the world in absence of further specifics or details. Generalisation in itself is not a problem at all. We need it to be able to function in absence of 100% complete knowledge.

It potentionally becomes a problem when we use generalisation without recognizing further information. additional detail and variation.

Problematic stereotyping is ignoring or refusing all information about a specific instance presented, and persisting in treating the instance solely based on the prototype of the category according to your ontology.

Many of the examples of stereotyping in the article demonstrated the former. Few are examples of the latter.

Every model holds 'biases'. These correlate prompts with outputs. Without bias, the output would be a complete random sample of the target domain based on the training images regardless of their labels or descriptions. A picture of a duckling drinking water would be just as likely to be produced from the prompt 'a sunset over Jupiter' or 'a sportscar on a German autobahn' than from 'a baby duck drinking'.

Most models let you play with parameters that losen the correlation. Look onto e.g. 'temperature' or 'prompt strength' parameters.

Now we can of course argue about wether a particular model is biased in our preference. Should Midjourney more often depict a picture of a typical blond Caucasian woman when prompted for 'a Mexican'? This is not impossible. Some 'anime' specific models will produce a Japanese looking young female for that prompt because that is all they can produce.

Some people argue that some models, 'general' models, should be more alligned with their specific ideological ontology. More often than not, the loudest voices in that space hold very particular viewpoints that more often than not advocate very rigid categorical reasoning, precisely committing harmfull stereotyping in the latter sense above, refusing to take into consideration instance features over categorical generalizations extrapolated from a very narrow dogmatic and local context.

Most certainly a debate should be had. Is there enough model diversity, or is the space overly dominated by certain viewpoints? Should the 'market' (most often in this space this is driven by producer influence, not consumer choice) decide, or is some regulation required? ( but 'Quis custodiet ipsos custodes?')

Probably decent concerns on al sides, but no good answers?


The interesting part to me is that they are getting stereotypes instead of the average.

I have never in my life seen an Indian person with a beard and turban, nor I've ever met a Mexican person wearing a sombrero and poncho. And given how boring the results of generic prompts tend to be, my theory is that they specifically tweaked their training data to avoid getting "generic Indian worker wearing a shirt" in favor of "stereotypical Indian man that would make a good NatGeo cover".


Right, part of me would expect a generated person of <x> ethnicity to look something like those images where they superimpose a bunch of faces to find the "facial average" of different countries: https://www.artfido.com/this-is-what-the-average-person-look...

I think it's probably a matter of the training data itself using stereotypical images, though. The first page of Google Image results for "mexican man" is almost entirely guys in hats, most of those sombreros. And those images are obviously getting tagged as "mexican man" in training data, but if you have an image of (for example) the frontman of a death metal band from Mexico, I'd assume that image wont get any tags about the band members' ethnicity because it's not obvious from the image context, nor is it the most striking thing about the image itself.

Hell, you could even have two different images of the same person: one where they're wearing a poncho and sombrero, one where they're wearing ripped jeans and skull face paint. I'm sure they'd be assigned wildly different tags.


> they are getting stereotypes instead of the average.

That sort of makes sense though. The training data is labeled images, and a picture of an average Indian in say an Indian newspaper or someone posting their own picture on their blog, won't be labeled "Indian", since within that context the nationality either doesn't matter or is a given. The training data would have to include the context like "if source url tld = .in" then add "India" to label. But that adds a whole host of other issues.

Someone correct me if I'm wrong.


The image model knows what images look like even without prompts, and if you train it on a trillion images it will create a latent space where similar pictures have similar embeddings. Inaccurate captions for some of them may mean that the text encoder can't get you to those embeddings, but they're still in there.

What this means is that text prompting is a bad way to drive an image generating model.


> I have never in my life seen an Indian person with a beard and turban

That’s surprising. Sikhs are a prominent feature of the Indian diaspora in North America and Europe. And after 9/11 they were in the news in the US since authorities had to inform their population that these were not the Muslims who had perpetrated the attacks.

Perhaps you have seen Indians with beards and turbans but, not being informed about Sikhism, you thought that they were from somewhere else like Afghanistan or the Middle East?


> I have never in my life seen an Indian person with a beard and turban

Wait really? I see Sikhs not too uncommonly. Sure they are the minority of ethnically Indian people I do see where I am (Australia), but I see Sikhs with the expected beard and turban pretty often, though admittedly I live in an area that has a decent number of immigrants


It’s both — the average depiction tends to be stereotype-based


Isn't this mainly an issue of garbage in garbage out?

Most of the world's recorded images are not average or representative. People take and share images very selectively. As far as the model is concerned, what it produced probably is representative (representative of the training data).

On the other hand, if a generative model was trained exclusively on new and unfiltered images of a journey through the sights of a country - not a tourist's sights - but non-selective sights, a journey no human would bother taking. Not only would it have a fighting chance of generating something beyond a stereotype for the given prompts, but we might also learn something from it.


"The depictions were clearly flawed: Several of the Asian Barbies were light-skinned; Thailand Barbie, Singapore Barbie, and the Philippines Barbie all had blonde hair."

Who is making assumptions here? My Asian gf has lighter skin than me (Northern European). Also, it is not uncommon for Asians to dye their hair.


I would guess Barbie cosplayers anywhere in the world would attempt to look like the og Barbie: Slender, dyed blond, caucasian female in pink coloured dresses.


I have over 700 Barbie dolls and accessories.

The originals from the 50s have red hair and polka dotted swimsuits.

No blonde hair or pink dresses :)


I should have said iconic, not og ;)


Dumb article full of dumb quotes from dumb people with politically correct job titles. If I was at midjourney I wouldn’t want to talk to them either because they are the worst type of agenda-driven journalist. Wow, minimal prompts have minimal amounts of variation between seeds! Now report on something interesting, like how prompting gore/violence/death tokens outputs fluffy pictures of cats and fields of flowers due to training methodology, and how this makes models perform worse even for “non censored” content.

Scammers, grifters, and charlatans, the lot of them who have never written a line of code and still want a piece of the pie to themselves. Fuck every “AI ethics think tank”, “AI policy expert”, and so on who wants to limit and remove people’s freedom of access to this technology.


I agree with you, but these people are becoming more and more irrelevant by the minute.

Nobody is not using chatGPT or stable diffusion because it could be biased. The ship has already sailed and any complaints about bias (or copyright for that matter) are standing on the shore left behind.


> Nobody is not using chatGPT or stable diffusion because it could be biased.

But their makers did expend a lot of effort into lobotomizing, sorry, censoring, sorry, making the models "safe", due to the smears, sorry, "reporting" by these journalists.


> Bias occurs in many algorithms and AI systems (...) In an analysis of more than 5,000 AI images, Bloomberg found that images associated with higher-paying job titles featured people with lighter skin tones, and that results for most professional roles were male-dominated.

The use of the term "bias" here is disputable IMHO. What these systems describe is reality.

We should aim to change the world, not the resulting -- faithful -- image of that world in AI. Cure the disease, not the symptoms.


Sure, curing the disease is more important than curing the symptoms, though the two aren't entirely unlinked.

What the systems describe isn't reality though. Mexicans invariably wearing sombreros doesn't reflect Mexican fashion, it reflects whether people have bothered to tag the image with "Mexican" or not.

If you can tag reality in ways in which US frat boys' fancy dress preferences are somewhat representative of the label "Mexican" and famous Mexicans in Mexico City usually aren't, then it certainly isn't necessary for job title tags to be highly correlated with ethnicity (Posed stock photos have tended to push back against this for years). And whilst it's true that certain occupations are dominated by white males in the West, they're certainly not the world's "default" people; that's more a reflection of the sort of English speaking internet power users whose content gets hoovered up by the dataset. And that is definitely a bias, even if it's a completely unintentional one.

In general it's "reality as seen through the narrow lens of people uploading and tagging photos, often not even with the intention of conveying useful information to an image generation algorithm". That reality includes a lot of biases, some of them more accurate than others and some of them more benign than others.


My comment above wasn't about Mexicans (that's another comment) but about whether describing people with a high-paying job as having a light skin tone is "biased" or a reflection of reality.

Of course as you say, the problem (if there is one) is in the dataset and not in the program. But if we consider this should be corrected after the fact, then at that moment we are sure to introduce an actual bias.

On what basis? Who decides what bias should be applied, and the appropriate amount?


> What these systems describe is reality.

Not quite. The "reality" (population) that an AI model represents is more like "images of the internet" rather than "population of X country" or "population of the world". (I am comparing a set of images to a set of people on purpose.) Here's a quote from an article about Stable Diffusion [1]:

> For example, the model generated images of people with darker skin tones 70% of the time for the keyword “fast-food worker,” even though 70% of fast-food workers in the US are White. Similarly, 68% of the images generated of social workers had darker skin tones, while 65% of US social workers are White.

> We should aim to change the world, not the resulting -- faithful -- image of that world in AI. Cure the disease, not the symptoms.

Additionally, AI model companies should warn model users that images of "<member of X group>" such as "a Mexican person" are not representative of X group. Nonetheless, I would appreciate an AI model which does something along the lines of crystaln's suggestion [2]:

> [If prompted for "a Nigerian person"] The algorithm could return a random Nigerian ethnic group proportional to their actual population.

(I'm presuming that "a random Nigerian ethnic group" refers to "a member of a random Nigerian ethnic group".)

[1] https://www.bloomberg.com/graphics/2023-generative-ai-bias/

[2] https://news.ycombinator.com/item?id=37964732


Stereotype accuracy is one of the largest and most replicable effects in all of social psychology.

https://psycnet.apa.org/record/2015-19097-002


At first I thought this is a real problem, but the more I think about it, the more it's one of those "I asked an AI how to be evil and it told me!!!!" situations.

The AI has to return something when given a vague prompt like that, and it is also specifically tuned to try to return similar things for the same prompt. It would be much less useful if it wasn't consistent because you wouldn't be able to gradually tune a prompt to get the image you want.

So then their ask reduces to make the AI return a specifically not-stereotypical image of the race even though all that's specified in the prompt is the race. That could be done but doesn't seem much better.

Maybe we should just expose the temperature control on these models and rename it to "diversity"..


> So then their ask reduces to make the AI return a specifically not-stereotypical image of the race even though all that's specified in the prompt is the race. That could be done but doesn't seem much better.

I'm guessing that what the researchers were hoping to see was less "a specifically not-stereotypical image of the race" and more like "across many generated images, some people showing stereotypes, some people showing other stereotypes, some people showing none of those stereotypes (but possibly though not necessarily yet other stereotypes)". Diversity isn't in "one image of one person" but can be in "multiple images of one person each", and I think that the researchers are aware of that.


I addressed that in my comment. Having a relatively stable output for the same input is a core requirement of the models atleast with the default settings.


Wait so you are telling me the magic pattern matching algorithm works by finding patterns?? Who'd a thunk.


The diversity-industrial-complex reduces the world to bias and oppression.


The output is based on the input

Seems like a water is wet kind of issue


These things are made of neural networks. Literally everything about them is weights and biases.

I agree and all, but it's weird to claim these models have general bias without testing them on a variety of inputs. These models have a lot of minute details. They're capable of differentiating a lot of specific things. They don't lack information about Indian women.


stereotypes aren't inherently bad, they're just ways of reducing the complexity of the world. For someone who has never been to mexico, never met a mexican, never thought about the topic deeply, that is what a mexican is like. There may be some people upset by that, wanting to show that they are more than just the stereotype which they personally don't like. The only way to get further is to introduce more nuance. The way I see that here is to ask for a "mexican buisnessman" or "mexican lumberjack" or whatever. If those pictures had sombreros then maybe id agree it was a problem but right now this is the most shallow and surface level interaction with the technology possible, and the article presents it with such gravity as though it was some great hidden injustice.


That's not AI, that's just Midjourney, which is highly biased to create the most "aesthetic" version of a prompt with a reasonably high level of determinism (compared to other models).

Here[1] is what DALLE-3 gave me when I asked for "an Indian person".

[1] https://supernotes-resources.s3.amazonaws.com/image-uploads/...


There’s an interesting point to dig out of this I think: the average of any one cultural identity is pretty inauthentic and because ML is pitched to the public as a massive efficiency boost, we’re going to see a lot of output from simple prompts. Not needing to “program” a prompt or over-think your query is the selling point. “Just type what you want”.

Yeah, that means we’re going to see a lot of the same averaged-out caricatures. Your local Italian restaurant will select one of the first 3 options for “Italian pizza chef” for their menu.

IMO, I think the author is trying to communicate that, but attributed blame to the AI tools because there’s other very clear cases of biased training data. (They even mentioned issues with facial recognition and black skin tones)

Human laziness (or actually, using a technology as it’s pitched) is the main factor here I think. The AI dutifully turns your non specific query into a non specific result. Messing around with prompts about Nigerian tribes myself returns pretty diverse results.


Well this shouldn't be surprising. This is a big issue with AI since it doesn't actually come up with any new thoughts or reasoning - but essentially "remixes" its pool of data - you have a system where everything becomes "oversaturated" over time, kinda like compressing a jpeg over and over and over.

It seems to me AI generated images are accelerating the "manufacture of glamour", as pointed by John Berger.

We are already surrounded by images on a daily basis, and AI is accelerating the production of these "alternate ways of life".

John Berger / Ways of Seeing , Episode 4 (1972)

https://www.youtube.com/watch?v=5jTUebm73IY


Oh no, that's what journalists and marketers were supposed to do. AI is taking their jobs.


For comparison here [i] is the first screen of my Google image search results for "Nigerian office".

There is one image of a man at the immigration office in non-Western attire. There is one image of a specific Nigerian gov't office with Nigerian flags.

For the rest, how I am supposed to tell "Nigerian office" from "Ghanaian office" or "American office with mostly black employees"? Many of the office pictures are without people. But people are gonna want generated images that scream "Nigerian" when they say "Nigerian".

[i] https://imgur.com/a/armJWI4


I think a point of reference would be to try the same prompts on a stock image library, and see what you get by comparison. Taking the 'indian person' prompt on pexels for example gives: https://www.pexels.com/search/indian%20person/

I see men, women, children, weddings, parties, offices, bedrooms, streets. It's quite diverse. I'll also be a stereotype of a sort, but it's clearly wider and more representative of an aspirational indian scene.


I think there's two things here that are interesting. First, if you ask me "Describe an Indian Person" I'm going to... not do that? Like straight out the gate 50% of Indians are female, 50% are male, so the first choice I'm going to make in order to do that is to discount 50% of Indians. And the more I narrow it down the less representative it will be. So I wouldn't. I could describe broad cultural and ethnic attributes but even they are pretty useless. So yes, you're asking a question that a human can't answer without stereotyping and getting angry when the computer returns stereotypes. What do you want? A RNG that picks an image of 1 of the 1 billion Indians and returns an accurate description of them? Is that useful? Was the original question even useful - other than to provide the image that the person asking the question was probably expecting.

The second thing, and I think this is more interesting though, is that we all have bias. And that's fine, we have social norms and cues and processes and culture to mediate that. We don't expect 1 person to be making important decisions based on gut instict. If it's important we have a process for deciding how to handle decision making. The risk is that by handing over decision making to AI you're just massively empowering something that is as biased as anyone. If you treat AI as just one more tool in the toolkit of decision making it's probably fine. The problem comes when people who don't understand AI put too much trust in it. It'd be like people relying on lie detectors to sentence someone to death (don't @ me), if you knew how lie detectors worked you... just wouldn't put that much trust in them. In the same way, the reason to highlight these biases is to say "This is a tool, it has limitations, don't blindly follow what it says".

I take it back: there's a third thing that's intresting. Maybe these AI are... shallow. You ask for a picture of Indian Cuisine. Yes, you can get 1000 images, but they are a variation on 1 idea. If you asked a human they wouldn't give you the same dish laid out 5 ways or with 5 different garnishes, they'd give you 5 different dishes. So maybe part of this is really pointing to the fact this AI is still very shallow in it's observation of the world.


> First, if you ask me "Describe an Indian Person" I'm going to... not do that? Like straight out the gate 50% of Indians are female, 50% are male, so the first choice I'm going to make in order to do that is to discount 50% of Indians. And the more I narrow it down the less representative it will be. So I wouldn't.

The thing is, the prompt is not "[give me an image of] an Indian person" but "[Give me multiple images of] an Indian person". If I generate 100 images from the prompt "an Indian person", I would expect those 100 images to include a few tens of men and a few tens of women. I would expect some of the people in the images to have lighter skin and others to have darker skin. I would expect some of the people to wear X kind of clothing and others to wear other kinds of clothing. (I would also expect the images to have different lightings, but I digress.) I don't have to be familiar with many real Indian people to expect that I would get different-looking images. Even if an image generator is going to tend to show stereotypes, different images could contain different subsets of stereotypes.


Well, of course. Large language models reflect the average of the input data.


An average of the input data would be a scalar value, not a few GB model.

Don't go around making claims about how a instruct-tuned text model works if you've never tried to operate a foundation model. It's very obvious those two aren't doing the same kind of thing; they can't both be "average text".

And of course don't confuse the sampler for the model; you can change those out.


The current generation of AIs are Internet simulators.

Imagine watching as many people search the Internet for ____ then watching which pics they click on. That is what our current generation of AIs do.

If you watch many people ask the Internet for a picture of a Nigerian person, then see what pics they like, you get a stereotype of a person from Nigeria.

That is what the AIs do.

I think those who are unhappy with this state of affairs disagree with our society more than they disagree with anything else. I wonder how many people they got mad at over pronouns in the last year.


"How AI reduces the world to stereotypes"

I find this interesting, in that there are any number of A.I. systems other than deep learning and large language models. Contemporary usage in the nontechnical press, though, uses "A.I." to refer specifically to DL and LLM, especially when they are generative. From this perspective, the above title uses a stereotype which ignores other A.I. technologies.


A better headline would be “AI is reducing the world to stereotypes” because the current phrasing assumes this to be some kind of invariant.


LLMs produce output based on stuff that humans wrote, and humans reduce the world to stereotypes often. So why should this be surprising?


Since AI is trained by human labelled /generated data, what the article is implying is that humans reduce the world to stereotypes.


AI image models are almost entirely not trained on human labeled data; StableDiffusion is trained on scraping nearby text on the page, DALLE3 uses synthetic captions from an image-to-text model, Midjourney doesn't disclose what they do. You can't get humans to label a billion images.

One way you can tell this isn't true is that if you take an image model and prompt it with an image, or just surf through the latent space by changing the embeddings, you'll find absolutely everything in there, from non-stereotypical representations to undescribable things.


> StableDiffusion is trained on scraping nearby text on the page

And that nearby text was written by humans, so it may not be explicitly labelled in HTML attributes but if the context wasn't related the scraping wouldn't work.


If you go looking in LAION it's often complete garbage. I think people underestimate how bad it is, and aesthetic finetuning does somehow fix it but not by writing better captions.

(How does it work? Beats me.)


I'm skeptical of their methodology.

The images they're showing are very similar to one another. All the pictures of Delhi are essentially clones. They're getting a picture of old man for "an Indian person" then jamming the same prompt again.


If I google images search "a mexican person", 18/23 are wearing sombreros. If the dataset used to train the model also looks like that, then obviously the trained model will give you someone wearing a sombrero.


This is possibly evidence that artists don’t have _that_ much to worry about, at least for now. Written output in particular tends to resemble the worst trope-driven self-published stuff you can find on Kindle Unlimited.


Usually a statistical models job is to take pile of data and try and figure out the structure within it that makes it similar. It shouldn't come as a surprise when they do this.


Are stereotypes inherently negative or is it in the eye of the beholder? Seems like normalization could also have negative impact.


They’re human made approximations of other humans. Not inherently negative, but when they, it can quickly lead to violence and prejudice.

See: US history every time there’s a new wave of immigrants.


Really interesting article.

These models left unchecked like they are now could be really dangerous. Increased use in articles will results propagating harmful stereotypes(unconsciously as Midjourney is easier to use than browsing the stock images), the enforcement of western(Anglo-Saxon) viewpoints in other countries.

Also it's just simplifying life to the easiest, most basic common denominator. I honestly think that there is no added value for them to exists.


People left unchecked are really dangerous. The models are fine, they can’t harm you.


You are the dangerous one, because you want to gatekeep this technology.


How exactly is the AI supposed to show diversity when your prompts have exactly zero diversity?


just like the saying "you are what you eat", the models are as good as the data they're trained on.

to combat this, either you introduce randomness/noise intentionally at the cost of the results, or you work on enriching the data to be more inclusive.


And we must stop stereotypes, whatever the cost: https://www.cbsnews.com/sanfrancisco/news/bart-withholding-s...


I left the page once my scroll got hijacked. I wish people would stop doing this, it doesn’t look good and it’s annoying.


Without this sort of nonsense sucking up thought and debate we could have had a colony on Mars by now.


[flagged]


A white man and a black man walk into a fried chicken establishment. One of these men is here to get dinner for his family, and the other man is the butt end of this joke.


Indeed! The white man tripped and fell, and everyone laughed at his misfortune, then the white man ordered dinner for his family. The end.


I see. "Look, a joke about racism! Let's ignore that and make it about the white man instead."


I see. "Look, I see racism. Everywhere! Reeeeeeeeee!"


You're the only person who has characterized your comment as racist. You've preemptively positioned yourself as a victim.


Yes, the one that's American is the butt end of this joke, because the whole setup is a distinctly US of A thing.


Yes. This example would be universal if there were some kind of logical reason behind "fried chicken plus black people equals funny." There isn't.

Stereotypes are cultural, not logical.


Yes. Point is, we shouldn't look at stereotypes in general through the lens of particular issue/cultural obsession that's very specific to the United States of America.


Interesting. Why is it invalid to discuss stereotypes that originated in the US?


It's valid to discuss stereotypes originating in the US, or (closer to what I meant) social issues present specifically in the US - but it's not valid to generalize from them to the entire world, or even just the entire English-speaking world (especially considering the on-line parts).


I think I am missing something.

Comment A (original): 'stereotypes have a reason.'

Comment B (me): example of a stereotype without reason, which happens to originate from the US

Comment C (you): 'You're generalizing the US to the entire world.'

Maybe I should have included a warning to ease the shock: 'You're about to read an example from the United States.'


Comment A shouldn't be taken as a statement of boolean logic that can be disproved with a single counterexample.


Why? It was set up that way. If they had said "many stereotypes have a reason," then I would ask which ones have a reason. But that isn't what they said.


Now think about how AI has been writing a lot of the articles we read and shaping how social media algorithms work and you’ll understand how the world is getting so polarized and weird. I’m so sick of stereotypes. It’s the laziest approach to anything and the world is so much more varied than that.


Now think about how people have been writing a lot of the articles we read and shaping how social media algorithms work and you’ll understand how the world is getting so polarized and weird.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: