Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why are AI generated images so shiny/glossy?
67 points by arduinomancer 29 days ago | hide | past | favorite | 62 comments
I’ve noticed a lot of the time you can tell an image is AI generated because it has a shiny/glossy lighting look to it.

Has anyone figured out why this is the case?




It’s just the typical aesthetic model used and isn’t inherent to the tech itself. It’s very easy to make AI images in specific art styles, with the result that you can’t tell they’re not real.

This is actually something of a pet peeve of mine - people sharing AI images never use styles other than the generic shiny one, and so places like Reddit.com/r/midjourney are filled with the same exact style of images.

Edit: if you’re looking for other style inspiration ideas, this website is a great resource for Midjourney keywords: https://midlibrary.io/styles


But why is this the prevalent, default AI style across many models? Is it in the training data or is it some bias from the algorithm?

This must have some concrete answer and not just "it is the way it is"


That is a good question. My guess is that their training data was a mix of photorealistic images and historical paintings, and thus the outcome is the cartoony style that looks somewhat realistic but not overly so.

There’s also the possibility that they avoid being too realistic for fear of deepfake scares WRT the underlying tech. And so if you want images that look realistic without being believably “real”, you’ll get something like the default aesthetic of Midjourney.


Because it's the way it is.

"It" isn't AI here. It is human mind. As an old aritst said, monkey likes shiny things.


The same reason that after Jobs died, it was decided that iPhones should be large enough to fit in people's pockets. Because: many engineers in tech wear cargo shorts.


A big part of that must be that most people don't know how to describe style. That is something I bump up against often with generative AI - it can be obvious that two images are different but hard to describe why.


That's definitely been one of my hurdles. Though ammusingly one thing that's helped is to give an example to ChatGPT and ask it to describe the art style.


Agreed, aesthetic literacy is very low and most people probably couldn’t name more than a handful of art styles.


Is it easy? Are you aware of any good tutorial content on this? I have libraries of images I'd like to feed into generative models. Think gritty, noisy stuff. I'm just not aware of how to do it.


I just edited my post and added a link to https://midlibrary.io/styles, which might give you some ideas.


Many AI-generated images you encounter are low-effort creations without much prompt tuning, created using something like DALL-E or Llama 3.1. For whatever reason, the default style of DALL-E, Llama 3.1, and base Stable Diffusion seems to lean towards a glossy "photorealism" that people can instantly tell isn't real. By contrast, Midjourney's style is a bit more painted, like the cover of a fantasy novel.

All that being said, it's very possible to prompt these generators to create images in a particular style. I usually include "flat vector art" in image generation prompts to get something less photorealistic that I've found is closer to the style I want when generating images.

If you really want to go down the rabbit hole, click through the styles on this Stable Diffusion model to see the range that's possible with finetuning (the tags like "Watercolor Anime" above the images): https://civitai.com/models/264290/styles-for-pony-diffusion-...


DALL-E's style is intentional to prevent misuse of fake but near-undetectable highly realistic images.


whats a good prompt if i want a schematic/engineering/wireframe look for all my objects ?

Most models seem very reluctant to do that. (Historically, rendering full 3D was also easier than rendering wireframes. Art imitating life.)


Uhm, Llama 3.1 is an LLM.


Ah, my mistake. "Meta AI" can generate both text and images, but apparently text prompts are handled by Llama 3.1 while image prompts are handled by Emu. I initially struggled to find the name of the image generation model.


Oh, I didn't even realize Facebook had a text to image model.


Maybe because the image is generated from Gaussian noise in diffusion models, while the real photo pixel entropy doesn't distribute like this.


What happens if you feed non-Gaussian (e.g. pink or brown, or sufficiently correlated) noise into the model instead?


A lot of (non-AI) photos of humans tend to be airbrushed by (human) photo editors -- this removes natural imperfections -- like patchy skin, acne, discolouration etc.

In AI models, I think the pictures the AI generates is biased to generate is also a form of "airbrush" except the model makes the reflectivity of the images high -- simply to hide the fact that there _arent_ any imperfections that would make the photo more realistic.

In other words, gloss is just a form of airbrushing -- AI does it to hide the fact that there are no more details available.

I would guess that AI models could make the airbrush more like the airbrush human photo editors do by changing some hyper-parameters of their models.


Dall-E at least seems to have adopted the cartoonish style just to avoid lawsuits

You can get realistic images with Midjourney and Flux with minimal prompt tuning. Adding “posted on snapchat” or “security camera footage” to the prompt will often produce mostly realistic looking images


there is an "aesthetics" model

https://github.com/LAION-AI/laion-datasets/blob/main/laion-a...

obviously, it reflects the mass preference for glosslop

secondarily it is likely due to a desire to ensure that ai images have a distinct look


> glosslop

What a marvelous word. Yoink!


> secondarily it is likely due to a desire to ensure that ai images have a distinct look

This does not make sense and seems like conjecture, do you have a source?


Once the herd starts stampeding in one direction, we get runaway processes - https://www.racked.com/2017/2/1/14441128/local-news-anchor-i...


Your link has nothing to do with genAI


there is no source it is pure conjecture, but I would say that there are many many fine tunes available of various image generation models so it is clearly possible to make many styles. thus it must be a conscious choice on the part of a API provider to render by default to a distinct style. there are many plausible reasons they would want to do this. Surely, but I don't have actual evidence from the internal management processes of these organizations that they were doing this for one reason or another.


It's industry standard procedure to tune your model to output a consistent distinct style, to prevent malicious actors from abusing it and presenting fabricated (but very convincing) images as real.


Credible source?


I don't know, but I've noticed another pattern: They don't like leaving any empty space. Every area has to be busy, filled with objects. They can never leave any empty grass, or walls, or anything. Everything is full of objects.


Easy to overcome by adding minimal, white space, empty space, etc. to the prompts.


Q: Is AI incapable of achieving sense of reality? A: AI can definitely achieve sense of reality at the current stage, but not a universal sense of reality that applies to any scenario.

Q: From a technical perspective, why do shiny/glossy effects occur? A: Shiny and glossy effects are prominent in terms of signals. Modeling general realistic details is challenging, but current AI has found an approximation. For example, modeling a mountain with every texture and bump is complex, but if the mountain is represented as a combination of shapes, the raised parts might approximate areas with high glossiness.

Q: If we reduce its complexity, can we generate realistic effects? A: Some models trained on specific scenes at https://chatgptimage.org/ demonstrate very detailed, photorealistic results. However, this is limited to those fixed scenarios.

How to expand to more general scenarios: Using even larger models and datasets. This is how Flux achieves its realistic effects: https://www.reddit.com/r/FluxAI/comments/1erpct2/boundary_be...


This is an interesting question, though I think it needs to be qualified a bit since there are many AI images and AI image generators that don't match this pattern.

First, AI Images != OpenAI/ChatGPT Images. OpenAI has done a great job making a product that is accessible and thus their product decisions get a lot more exposure than other options. A few people have commented how there are several Stable Diffusion fine tunings that produce very different styles.

Second, AI Images and images of AI images of people are different. I think that the high gloss style is most pronounced in people. Partly this is because it is more notable and out of place.

If you take the previous two points as being true the question becomes why does ChatGPT image model skew toward generated shiny people. I would venture that is a conscious product decision that has something to do with what someone thought looked the most reliably good given their model's capabilities.

Some wild speculation as to why this would be the case:

* Might have to do with fashion photos having unusually bright lights and various cosmetics to give a sheen.

* It might have something to do with training the model on synthetic data (i.e. 3d models) which will have trouble producing the complicated subsurface scattering of human skin.

* Might have something to do with image statistics and glossy finishes creeping in where they don't belong.

* Might have to do with the efficiency of representing white spots.


I suppose because a large part of these models is recognition-probability, the shine is sort of an approximation of what is likely lighting. It isn't just the lighting that you expect but the culmination of thousands of similar yet slightly different. If you where to take a thousand photo's of someone with all manner of light angles, maybe it would look like this. Just a wild guess though.


People have started training Lora's for Flux that look pretty pretty real. This was a good recent example: https://www.reddit.com/r/StableDiffusion/comments/1ero4ts/fi...


Those photos look impressively realistic, but the skin has exactly the wax-like sheen that I think OP is talking about.


One of the most interesting things about Midjourney is that it always returns multiple images, and asks the user to select which of those they would like to view at full resolution.

This is pretty clearly training for a preference model - so they now have MILLIONS of votes showing which images are more "pleasing" to their users.


I naively assumed the “airbrushed” effect AI photos have was just a way of blending components of the training data to make it look normal — opposite the way a collage of magazine clippings would appear.


I wonder if this is getting close to the actual reason. That is, there's a "latent space" of composable units that, say, have blurred edges (or something) for easy combination that ends up looking like an airbrush/glossy effect.

I think this idea is quantifiable and testable. It'd be interesting to see if this, or something like it, is the actual reason.


Intentional choices during data set collation (to some degree 'emergent intention' due to aggregate preference). Search for 'boring realism' to find people working in other regions of latent space, e.g. this LORA: https://civitai.com/models/310571/boring-reality . Most of the example pictures there don't have the shiny/glossy look you're talking about.


ML-generated pseudo-photos look 3D-rendered because noise is information, and more information is both more expensive (a noisy photo can be 3x the size at the same resolution) and creates more opportunities for self-inconsistencies (e.g., with real camera sensor noise) that make fakes easier to identify automatically.


Because that's the kind of image that AI trainers like the most? Would they rather train them on old newspapers?

That would be the "oiled bodybuilder" applied to image training. Maybe similar and clearly defined lighting also allows AI's to match features much better, specially volumes.


I tend to agree. However, I tried to continue to prompt ChatGPT make make the picture less "AI like" and it actually ended up doing a really good job after 5 or 6 attempts. I'm not sure why it took so much prompting. Further prompting just made it worse.


Be it said that ChatGPT is not ideally suited for this kind of inquisitive work because it will re-write your prompt as it sees fit so without checking back what the actual prompt was (e.g. on the Bing Image Creator page) you can never be sure what it added, took away or modified from your input before it hit image generation proper.


Are you sure? I know from using the OpenAI API, that the entire chat history is part of the request. I suspect the ChatGPT UI likely does some processing for efficiency but the core model is stateless so the entire list of messages needs to be passed.


I'm not sure whether we're on the same page—when you ask the chat interface to generate an image, the language model will / may / can rewrite your wording, making it even more difficult to link cause and effect with that in the pipeline. IOW the verbiage you use in the chat interface is not the input the image generator gets to work on. That's how far I know the system, please correct me.


Because models are trained on images that are usually edited in post-production with this aesthetic.

Mostly highlights down, shadows and clarity up. I often needs to edit it back to have realistic looking lights on scene.

Also "--s" 0 helps with generating more realistic images.


Sorry to ask a dumb question, but… is this in the prompt for a larger provider like ChatGPT (DALL-E) or are you using your own?


No DALL-E doesn't take flags like this. This looks like the stylize parameter used in midjourney.

https://docs.midjourney.com/docs/stylize


Thanks. I assume this is being run as a private service then and not a publicly available tool in a marketplace.


AI is compression and compression, of any lossy kind, usually works by removing the high-frequency information first. That applies to both audio and imagery. It's obviously not the only factor, but I bet it's an important one.


Is there any technique to reapply high frequency info to make stuff look realistic?


With less lossy compression, sure. Meaning bigger models.


The noisy pattern of the skin, combined with noise from light hitting the camera sensor, plus the noise created of image compression creates an effect that is too subtle and random to “learn”.


I believe it's due to AI's limited ability to generate localized texture details effectively, often resorting to the use of highlights as a concealment strategy.


Models have Goodharted themselves into oblivion. That's the result of endless cycles of aesthetic preference optimization, training on synthetic data, repeat ad nauseam.


There’s an added objective in some of these models to make them more aesthetically pleasing based on subjective crowdsourced data that very likely contributes to this


I believe it's due to AI's limited ability to generate localized texture details effectively, often use highlights as a concealment strategy.


I think the generators were trained mostly on ArtStation and this style is quite common in concept art.


I don't know what you mean by "shiny/glossy lighting look to it." Could you give some examples?

I haven't noticed that a lot of AI images are generated with a "realistic cartoon" style, and I assume that's to smooth over some uncannyness.



Also people in them look like caucasian anime characters. Why is that?


It has to do with the training data. Mostly are photorealistic style. I'm sure there will be more real world style coming out soon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: