> Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.
This seems to be purely an issue of the size of the network. Parti (https://parti.research.google/) demonstrates that as the number of parameters increases, with no change to the underlying architecture, a lot of these problems simply go away. Basically just throw more compute and memory at the problem and everything gets fixed.
People who buy art do not buy it because of the technical execution. You may need to execute a piece in some way to get a desired effect, but the technique is the mean not the goal.
This is not to take away from the achievements of AI. It's that creating pictures adhering to a prompt with some degree of creativity is very little of what art is. Maybe it will replace some part of commissioned illustrations where the artist's name does not matter (e.g. some avatar pic?).
We still value, financially, some material goods for much more than they cost to produce. Or for much more than their almost identical mass-produced counterparts.
> It's that creating pictures adhering to a prompt with some degree of creativity is very little of what art is.
I mean, it's the majority of commercial art - you get a prompt from the client, you maybe flesh it out in a few different directions with sketches, then you refine a final piece. And AI is incredibly good at this process - instant results, infinite patience, and it's free. A very hard combination to meet.
Calendars, book covers, video game assets, green screen backgrounds....
Even in a case like video game animations, where the AI can't build every frame, it can still give you a good reference photo. From there you just need a cheap artist to fill out the frames - a huge cost savings, and a big blow to the artistic community.
Where do you get started as an artist, without any of those? Obviously, Fine Arts isn't nearly as effected, but how do you get your start when you can't build a name from your cool book covers, or get famous off Magic: The Gathering card illustrations?
Well, we’ll see how it performs, if it’s ever made public.
The 20B images don’t look that much more impressive than what SD is already doing (aside from the ability to render text), and in some cases they look worse. It’s hard to tell because the resolution is so small, but even in the 20B “astronaut riding a horse through a pond” image, it looks like his hands are still nonsensical.
This nitpick about hands sounds desperate. Here we are, with a tech so powerful that it overshadows the default hype it's surrounded by (no small feat, most technologies fail to live up to the hype as you know) ... and the critics merely move the goalpost a tiny bit further, even if the tech scales so well as to make their new goalpost irrelevant in a year.
It's not a nitpick. It might be a nitpick if hands were the only thing it couldn't do. But it struggles with a lot more than just hands.
>the tech scales so well as to make their new goalpost irrelevant in a year.
This just brings me back to my original question. Self-driving cars have been "a year away" for many years now, and now companies are starting to hint that human assistance may be required for the foreseeable future [1]. So, why the confidence that art will be an easy problem to solve with just more scaling, when that approach hasn't eliminated the need for humans in any other domain?
I have a suspicion that generative art is going to hit a data wall, also. All of these models are constrained in what patterns they can learn because image captions are not very precise. They can rehash common motifs associated with keywords, but they’re not good at following specific instructions. (“The chair is at the corner of the rug, turned 15 degrees to the left, with the leg nearest the camera aligned with the edge of the fireplace.”) For them to meaningfully improve in this regard, I have to imagine someone will need to locate a trove of a few billion images with exceptionally high quality captions, and well distributed throughout the space of possible image types, subjects, themes, and styles.
I think that details like angle and position will be resolved by using basic sketches as a starting point (we can already make images that sort of conform to layouts as well as prompts), and subdividing the image into assets it then has to stitch together in subsequent steps, and then adjusting lighting/contrast/style as a set of filters in post processing. The wall is lowered quite a bit when you don't insist on doing everything from a single magic prompt
(This will be great from the point of view of art creation; not so great from the point of view of supposedly rendering humans obsolete)
That makes sense. I don’t think that will render humans obsolete; I think it will just increase their productivity and ultimately raise the standard of quality expected. It means artists can explore and iterate on ideas faster than if they had to lay down preliminary artifacts manually. But it doesn’t eliminate the need for authorship: someone still needs to decide what to communicate visually and how to communicate it.
For now I can run my stable diffusion on a vintage laptop from a decade ago, on CPU (!), and it doesn't even utilize most of my RAM. And training this model was still cheap compared to, say, a google senior engineer yearly salary. The limits of scaling are further than laymen may want to believe.
With an order of magnitude more parameters it won't just do hands, it will do quite a bit more.
Her position doesn’t invalidate her arguments. How good is a product that can’t stand up to criticism?
Edit: to tie it back to the original criticism: what’s the maximum training cost we’re willing to accept for the model? How can we guarantee return greater than the increased training cost?
You don't have to retrain an AI to spin up another instance - just download a 4gb weight file via a magnet link floating around the internet and run some python code in terminal on your old PC. This kills the comparison.
And training a real living breathing person in a rich OECD country is going to be costly - no offense meant, I'm actually not from OECD.
But then again, when it comes to commercial needs, a human doesn't need "retraining" every time you ask them to draw something they weren't familiar with when they went through art school...
Doesn't "the AI" train on art produced by people? "Just expand the dataset, just increase the parameters" seems like it should hit a wall fairly quickly... and still not be very good, because deep learning systems have no insight.
But not as weak as the case that the route to production grade commercial art is reached via biasing the training dataset more towards sloppy social media images...
There is no problem here, for any moderately "in" person it's obvious you can bias the model towards aesthetics by concentrating the highly liked images in the dataset. And if it isn't enough, just ask your audience to sometimes rate the aesthetics of the image and use this as a signal for dataset curation.
Artistic styles are often just thin semantic filters over the base 3d geometry that can be learned from photos, and learning these shouldn't require many examples.
This seems to be purely an issue of the size of the network. Parti (https://parti.research.google/) demonstrates that as the number of parameters increases, with no change to the underlying architecture, a lot of these problems simply go away. Basically just throw more compute and memory at the problem and everything gets fixed.