Hacker News new | past | comments | ask | show | jobs | submit login

With Stable Diffusion & co. I've had the opposite sensation. I was completely floored as to how it blew past all of my expectations.



Don't get me wrong, Stable Diffusion & co are incredibly impressive. I'm using NovelAI image generation for a project I'm working on, so it's already useful to me as more than just a toy, even. It is absolutely a massive technological step change.

But NovelAI and Stable Diffusion both have limitations. It's nearly impossible to generate two different specified characters, much less specify two characters interacting in a certain way. For NovelAI, common/popular art styles are available, but you can't use the style of an artist with ~200 pictures. (Understandable, given how the AI works technically, but still a shortcoming from a user's perspective.) Both are awful at anything that requires precision, like a website design or charts (as shown in the article). And, as most people know by now, human hands and feet are more miss than hit.

People are extrapolating the initial, enormous step change as a consistent rate of change of improvement, just like what was done with self-driving cars. People are handwaving SD's current limitations away; "it just needs more training data" or "it just needs different training data." That's what people said about autonomous vehicles; it just needed more training data, and then it would be able to drive in snow and rain, or be able to navigate construction zones. Except $100 billion of training data later, these issues still haven't been resolved.

It'd be awesome if I were wrong and these issues were resolved. Maybe a version of SD or similar that lets me describe multiple characters in a scene performing different actions is right around the corner. But until I actually see it, I'm not assuming that its capabilities are going to move by leaps and bounds.


I think you're wrong here.

My partner works in design and her design teams have jumped all in on using Stable Diffusion in their workflows, something that is effectively in "version 1." For concept art especially it is incredibly useful. They can easily generate hundreds to thousands of images per hour and yes, while SD is not great at hands and faces, if you generate hundreds or thousands of images, you get MANY which have perfect hands and faces. Additionally it's possible to chain together Stable Diffusion with other models like GFPGAN and ERSGAN, for up-ressing, fixing faces, etc.

Self driving cars are completely different, no one was using "version 1" of self driving cars within weeks of the software existing. Stable Diffusion and similar models are commercially viable right now and are only getting better in combination with other models and improved training sets.

I think you're shifting the goalposts to what success is here to be quite frank. "The model needs me to be able to specify multiple characters in a scene all performing different actions."

The truth is, if I had to ask art professionals on Fiverr for "beautiful art photography of multiple characters doing different actions", it would be difficult and expensive for them too! And worse, you would get one set of pictures for your money and if you weren't satisfied, you're shit out of luck! On my PC, Stable Diffusion can crank out > 1000 unique pictures per hour until I'm satisfied.


> My partner works in design and her design teams have jumped all in on using Stable Diffusion in their workflows, something that is effectively in "version 1." For concept art especially it is incredibly useful.

I do agree if you are coming from the angle of "I need concept art of a surreal alien techbase for a sci-fi movie[0]" then SD&co are super useful. I'm not saying they don't have their uses. But those uses are a lot more limited than people seem to appreciate.

> I think you're shifting the goalposts to what success is here to be quite frank. "The model needs me to be able to specify multiple characters in a scene all performing different actions."

Having multiple, different characters in a picture/scene interacting in some way is not an uncommon, unrealistic requirement.

[0] high res, 4k, 8k frostbite engine, by greg rutkowski, by artgerm, incredibly detailed, masterpiece.


As far as I can tell, it is possible to draw such a scene by adding in the pieces and using the tools to paper over the boundaries and integrate those elements. It takes much more work than just generation but maybe one fiftieth to one hundredth of the work necessary for classic illustration.


It reminds me of one scene in I, Robot (2004)

https://www.youtube.com/watch?v=KfAHbm7G2R0


I have also been floored with their output, but it's because of that that the comparison to self-driving vehicles is so relevant. Even if we saw impressive growth over 5 years, it doesn't mean that growth will continue for another 5.

It's possible that Stable Diffusion, or minor improvements of, is our peak for the next few decades.


I think the future will involve “layering” different AIs for art. One for backgrounds, one for human poses, one for facial expressions, one that can combine them. That sort of thing.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: