Don't get me wrong, Stable Diffusion & co are incredibly impressive. I'm using NovelAI image generation for a project I'm working on, so it's already useful to me as more than just a toy, even. It is absolutely a massive technological step change.
But NovelAI and Stable Diffusion both have limitations. It's nearly impossible to generate two different specified characters, much less specify two characters interacting in a certain way. For NovelAI, common/popular art styles are available, but you can't use the style of an artist with ~200 pictures. (Understandable, given how the AI works technically, but still a shortcoming from a user's perspective.) Both are awful at anything that requires precision, like a website design or charts (as shown in the article). And, as most people know by now, human hands and feet are more miss than hit.
People are extrapolating the initial, enormous step change as a consistent rate of change of improvement, just like what was done with self-driving cars. People are handwaving SD's current limitations away; "it just needs more training data" or "it just needs different training data." That's what people said about autonomous vehicles; it just needed more training data, and then it would be able to drive in snow and rain, or be able to navigate construction zones. Except $100 billion of training data later, these issues still haven't been resolved.
It'd be awesome if I were wrong and these issues were resolved. Maybe a version of SD or similar that lets me describe multiple characters in a scene performing different actions is right around the corner. But until I actually see it, I'm not assuming that its capabilities are going to move by leaps and bounds.
My partner works in design and her design teams have jumped all in on using Stable Diffusion in their workflows, something that is effectively in "version 1." For concept art especially it is incredibly useful. They can easily generate hundreds to thousands of images per hour and yes, while SD is not great at hands and faces, if you generate hundreds or thousands of images, you get MANY which have perfect hands and faces. Additionally it's possible to chain together Stable Diffusion with other models like GFPGAN and ERSGAN, for up-ressing, fixing faces, etc.
Self driving cars are completely different, no one was using "version 1" of self driving cars within weeks of the software existing. Stable Diffusion and similar models are commercially viable right now and are only getting better in combination with other models and improved training sets.
I think you're shifting the goalposts to what success is here to be quite frank. "The model needs me to be able to specify multiple characters in a scene all performing different actions."
The truth is, if I had to ask art professionals on Fiverr for "beautiful art photography of multiple characters doing different actions", it would be difficult and expensive for them too! And worse, you would get one set of pictures for your money and if you weren't satisfied, you're shit out of luck! On my PC, Stable Diffusion can crank out > 1000 unique pictures per hour until I'm satisfied.
> My partner works in design and her design teams have jumped all in on using Stable Diffusion in their workflows, something that is effectively in "version 1." For concept art especially it is incredibly useful.
I do agree if you are coming from the angle of "I need concept art of a surreal alien techbase for a sci-fi movie[0]" then SD&co are super useful. I'm not saying they don't have their uses. But those uses are a lot more limited than people seem to appreciate.
> I think you're shifting the goalposts to what success is here to be quite frank. "The model needs me to be able to specify multiple characters in a scene all performing different actions."
Having multiple, different characters in a picture/scene interacting in some way is not an uncommon, unrealistic requirement.
[0] high res, 4k, 8k frostbite engine, by greg rutkowski, by artgerm, incredibly detailed, masterpiece.
As far as I can tell, it is possible to draw such a scene by adding in the pieces and using the tools to paper over the boundaries and integrate those elements. It takes much more work than just generation but maybe one fiftieth to one hundredth of the work necessary for classic illustration.
But NovelAI and Stable Diffusion both have limitations. It's nearly impossible to generate two different specified characters, much less specify two characters interacting in a certain way. For NovelAI, common/popular art styles are available, but you can't use the style of an artist with ~200 pictures. (Understandable, given how the AI works technically, but still a shortcoming from a user's perspective.) Both are awful at anything that requires precision, like a website design or charts (as shown in the article). And, as most people know by now, human hands and feet are more miss than hit.
People are extrapolating the initial, enormous step change as a consistent rate of change of improvement, just like what was done with self-driving cars. People are handwaving SD's current limitations away; "it just needs more training data" or "it just needs different training data." That's what people said about autonomous vehicles; it just needed more training data, and then it would be able to drive in snow and rain, or be able to navigate construction zones. Except $100 billion of training data later, these issues still haven't been resolved.
It'd be awesome if I were wrong and these issues were resolved. Maybe a version of SD or similar that lets me describe multiple characters in a scene performing different actions is right around the corner. But until I actually see it, I'm not assuming that its capabilities are going to move by leaps and bounds.