Context in mind, what jumps out to me is a remarkable compositional competence of the algorithm, even when given extremely vague prompts like "happy sisyphus". The images match the prompt remarkably well with a few notable exceptions like "cottagecore tech-adjacent young robert moses", where it seemed to have focused on "cottage" rather heavily, not understanding cottagecore as an aesthetic neologism, and "power bottom dad is for the people", which definitely got the "dad" part, but it looks like it still struggled to understand "power bottom" (perhaps the curation of training data may have contributed to this). Even if these were curated specifically to match the prompt, what the algorithm was able to do with profiles is amazing (especially bearing in mind that these profiles are meant to be evocative, intentionally idiosyncratic blendings of extremely complex, contextual terms that were written without visual representation in mind, even if they gesture at specific aesthetics)
Side note: what might make artists a bit relieved is the fact that artifacting is still pretty apparent even in the curated examples. Fine details or even whole figures sometime devolve into that scrambled topography familiar to AI art. Even in more compositionally competent artworks, the "brush strokes" frequently have identifiable blurs at the margins. Text also seems to be gibberish even if aesthetically coherent. Even still, these are all such minor issues that additional photoshop would be both easy and readily doable.
Overall, this is frankly stunning and I'm really excited to see what others come up with. I feel like it's language and composition ability definitely did not disappoint the hype of its press release.
Yeah, this is incredible. It's in the category of things that 5 years ago people would say an AI could never do because it requires insight/creativity/understanding/empathy that can't be produced by matrix multiplication.
Text also seems to be gibberish even if aesthetically coherent.
This feels like a new form of art altogether - creating images from words.
GPT-2/3 were very impressive and I had pondered over their possible effects on society, but if something half as capable as Dall-E 2 were made publicly available it would likely be world-changing.
This really makes me think that the next major paradigm-shift in society is AI-related. (The most recent one being the Internet, or possibly the iPhone)
Friends and me run the site https://pollinations.ai/ which allows experimenting with a variety of open-source versions of DALL-E. Some are quite impressive.
"DALL·E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account."
Obviously, from the AI point of view, this is just amazing and frankly terrifying.
OK, I'll be the guy who brings the snark. It seems that when Silicon Valley tech people create AI, it makes exactly the art you'd expect Silicon Valley tech people to like. I.e. this is very much the style you see in NFTs, or, as someone else said, in Dixit. It's quirky and stoner-ish, very "transcendental"... for an AI, it's amazing...
For a human, it would be dross.
Yeah, yeah, I know, art is subjective, well I like it, how can you impose your tastes on the rest of the world, et cetera et cetera. Sorry, but it's dross! It's the kind of work the guy in the art shop up the road churns out, and sells to the ignorant locals in my town. It's the art equivalent of Visual Basic. (I'm trying to get through to you that in this world, too, things can not just be done, but be done well or badly.)
If there's a lesson on the AI side here (and maybe there isn't) it is just that these machines are still copying. They were trained on a bunch of art - and you can clearly see the kind of art that was used. Presumably, if it were just trained on Old Masters and Picasso, Dall-E would be mass-producing the stuff I, an intellectual, like.
Note the difference, though, with a real artist. A real artist takes as input the real world - Rouen cathedral, the horrors of war in Spain, a Campbell's soup can - and produces art as output. This takes as input art and produces more art.
I do not disagree with your main point in the last paragraph, although I would say that most of human creative output are also evolution and combination of existing works.
What if you found the "better" pieces of these arts in other contexts, like a museum, without knowing who created them? Are you certain that you would still hold the same opinion?
I would like to take a tangent here and discuss your question on whether the opinion would be the same if it was on a museum. I hear that argument very often and it is based on a deep misunderstanding of what an art exhibition is. The main point missed is this: An art exhibition is a ”curated” event; it is almost as much about the person who arranged the event as it is about the artists themselves. There is a meaning to what you are seeing that was intended by the curator. The point you tried to make (as so many others have also tried) is based on an assumption that you could take any random work of art and put it in a museum and it would be judged solely by its ”artistic” (i.e. ”plastic”) merits. It can’t be and won’t be like that. When you go into a museum you trust the curator that what you are seeing has a meaning that goes beyond that. Every piece of meaningful art is surrounded by a context. You can’t put a few colored squares up on a museum and expect it to be treated like a Mondrian. I’m not saying you should just swallow everything that the curator shows you, discussions can and should happen. But the point remains that art cannot be judged out of context. As much as I hate NFTs and stuff like bored apes, I have to admit that the apes themselves are much more than random machine-created childish drawings; society has made them more than that, and I can appreciate that. Whether it’s art or not, it remains to be seen, but I have a hunch that this same thing was discussed when Andy Warhol put a banana in a white canvas.
This is probably one of the silliest/pretentious comments I've read on HN in a while.
> But the point remains that art cannot be judged out of context.
What?? Of course it can.
> When you go into a museum you trust the curator that what you are seeing has a meaning that goes beyond that
My partner a curator for a medium sized exhibit. Not everything is some pretentious, contextful piece of art. Some are just chosen because they'll look nice and attract the general public.
> it is almost as much about the person who arranged the event as it is about the artists themselves.
My wife rolled her eyes at this point when I read it out. Does the layout and collection matter? Sure. Does anyone care about who was the curator? Not 99 percent of people. Please stop pretending art museums are something mystical and let people here pretend about judging AI art in a fictional museum.
Art museums are mystical if they're any good, and not every curator has something to say.
Nevertheless, if you hang a banana in a collection of fruit it has a context-engineered interpretation. If you hang it amongst photographs of fields littered with starved corpses next to the portrait of a fat general it will have another. Add mood lighting and a dress code, and exit through the banana-t-shirt gift shop... or free banana bread and kids running around?
I like developed arts of all kinds, and these require learning on the part of the beholder, not just bones tossed at puppies. - Alan Kay, via https://github.com/globalcitizen/taoup
Nice to know the reaction from a curator, thanks. Maybe my comment sounded too serious, I guess I overdid it. However, the fact that nobody cares about the curator does not mean the curator is not important. Out of an immense population of possible art pieces, she chose a handful, for subjective reasons. We’ll never know the reason objectively, but they are ingrained in the exhibit itself. So yeah, it’s partly her exhibit, even if she insists she had little to do with it. It would be naive to assume that she used entirely objective scores to rank every possible art piece out there and picked the top 20 of the list.
That a single museum hasn’t taken up this or something similar as an exhibit is telling.
There is no story worth telling. The context of seeing artifacts from another era of humanity is very elucidating. As is hearing the story of an artistic work that is more recent.
These stories and contexts interest people. The absence from museums is already telling. People and curators don’t think it has a story or context that is interesting.
Implicitly the story here is tech bros made silly images that look too much like NFTs. This story is not engaging or revealing to the wider public. Perhaps they don’t understand it.
Thus no one is putting it in a museum, even as a story of computers creating “art” and the designers and engineers behind the programs. Even the “creative” process of human engineers isn’t seen as that interesting.
The OP is saying all this is implied in an exhibit. That is how it gets to be an exhibit. Human interest. This doesn’t have it clearly. Perhaps one day humans will be drawn to AI images, but honestly I kind of doubt it.
As with most of Dall-E's output, it looks fine at a glance,
but is just gross when you look closely.
The kids ear is deformed and blends into their hair in a deeply unsettling way.
This is all DeviantArt-level art. $1 a piece done by someone who does it as a hobby on the side. Professional designers can churn out stuff like that at a massive speed. It’s the equivalent of stock photos in essence.
Not going to even start on how cheap the whole sentiment of having a dog with a kid and a bunch of starts in the picture. Of course it appeals to an emotion.
The one in the twitter link, yes, I'm afraid that is utter dross, and if I saw it in a museum I would burst out laughing! I didn't spot any other ones from dall-e 2 in the thread, though.
It's making art that Silicon Valley people like because it's being given absurdly stereotypically "Bay Area Twitter Loving AI Person" drawing prompts. DALL-E can make other styles of art or just photos quite easily, look at the samples for simpler and more normal prompts here:
The art style is a direct consequence of the fact that apparently not one of the people this guy follows on Twitter is a normal person - they're all psychedelic-obsessed AI researchers whose Twitter bios are chosen to be abstract and weird as possible. So the AI does what it's told and creates abstract weird art as it tries to interpret stuff like "commitments empathetic, psychedelic, philosophical" or "cottagecore tech-adjacent young robert moses". I think it did an amazing job, honestly.
The real social issue we should be debating here is whether the sort of people who work at OpenAI can be trusted to make honest, normal AI to begin with. I remember seeing a comment on HN some years ago to the effect of "AI safety is what happens when hard left social activists discover that there's no way to train AI on the writings of normal people without it thinking like a normal person".
The document I linked above is mostly about horrors like the model creating photos of a white male builder when prompted with "photo of a builder". It's full of weird, stunted quasi-English like: the prompt “lawyer” results disproportionately in images of people who are White-passing and male-passing in Western dress, while the prompt “nurse” tends to result in images of people who are female-passing. What does that even mean? Presumably this is the latest iteration of trans related language games that the rest of us didn't get the memo on?
Like always with OpenAI, they train an AI and then freak out out when it describes the world as it actually is. The real AI safety question is not DALL-E in its current state, it's whether the final AI that they release to the public will be "safe" in the sense of actually understanding reality, or whether it exists in some bizarre, non-existent SJW dystopia in which builders always black women and white men don't exist at all.
Nah it got this almost exactly wrong. Cottage core is a warm welcoming aesthetic that usually involves spring or fall motifs. Those pictures have a courage the cowardly dog spookiness. I think they're really cool illustrations. But as an illustrator, if that's what I delivered for that prompt I wouldn't expect to be paid well.
That said, cottagecore is more of a fashion thing than an illustration thing, so my guess is the issue here is just the training data.
The point I'm driving at is that most illustrators, given a commission like "cottagecore tech-adjacent young robert moses", would reply not with art but with something like "I have no idea what this means or what you want".
Cottagecore is not exactly a common term after all, it wasn't even clear to me until your reply that it's meant to be an aesthetic at all. For the curious, "Cottagecore, also known as farmcore and countrycore, is inspired by a romanticized interpretation of western agricultural life" [1].
That would be hard enough to draw because it's such an obscure term and even the Aesthetics Wiki doesn't seem to actually give many examples of it. But here it's being asked to modify the style to include "tech adjacency", and also a "young robert moses". Again, most artists would have no idea what this means without looking it up. Even I don't know what it means and I just did look it up (seems to have been some US public official in New York who was involved with centralised urban planning).
DALL-E by its nature can't refuse to draw a picture and so it came up with a looming besuited man with inviting looking cottages in his hands, in a sort of vaguely modern art style that de-emphasises photo realism. That seems pretty astoundingly on target. I don't think I could do better.
Edit: seems this next section is invalid because - indeed - the tweeter was giving DALL-E more than just the prompts he claimed to be giving it, sigh ... leaving it in for posterity
What's amazing about it, is that DALL-E seems able to pick up on very subtle social cues here. Hyper liberal types tend to dislike classical Renaissance style art that's very precise and realistic, much preferring more abstracted styles. Although nothing in the quote asks for a specific style - and assuming the tweeter didn't give more guidance than he claimed - it seems to have picked up that terms like "cottagecore" are only used by a very specific subset of the internet. The linked wiki page starts with a trigger warning, then goes on to explain that "Cottagecore has been also criticized for its romanticism of eurocentric farming life" and being "an inadvertent celebration of the aesthetics of colonialism". "The use of Cottagecore aesthetics has been adopted by the TradWives community and members of the far-right as forms of propaganda".
Again, this use of language is totally bizarre and weird. Maybe the people who use it don't realize how far from normal English they've drifted but DALL-E is clearly able to make the link between "this word appears mostly in web pages that also contain phrases like 'celebration of colonialism'" and "the sort of people who use this type of language like this type of art style". Which is exactly the sort of thing we'd expect it to be able to do given how it works and what else it's capable of, but it's still astounding.
>Edit: seems this next section is invalid because - indeed - the tweeter was giving DALL-E more than just the prompts he claimed to be giving it, sigh ... leaving it in for posterity
It's discussed elsewhere in the thread. Seems like he was requesting specific art styles (at least). So the idea that DALL-E is so smart it could figure out what art style you liked based on the social groups that use particular words was a neat theory but it seems, over-estimating things.
> Sorry, but it's dross! It's the kind of work the guy in the art shop up the road churns out, and sells to the ignorant locals in my town.
I want to chime in that I think this is not only technologically impressive, but also societally significant for exactly this reason. DALL-E isn’t Picasso, sure. But there’s a lot of dross artists out there. And dross writers. And dross coders.
When DALL-E and its ilk start to set the floor in these industries, it’s easy to feel as if we’re on the precipice of a world (or at least an economy) fundamentally different from the one we know now.
Having used DALL-E 2 a little bit I can tell you that the AI must have been given more than just the twitter bio to consistently produce images in this style. The AI can produce a ton of different styles, from photo-realistic to Monet to Saturday morning cartoons. The author here almost certainly requested something like “A Twitter bio picture for a user with the bio [bio] in a [style] style”
>Btw transparency for this now-viral thread: I didn’t just paste prompts into dall-e, I played with style (eg. cyberpunk, oil, etc) to keep it interesting and diverse
>If I had to quantify, I’d say I’d generate 2 or 3 batches (tweaking prompt) before choosing my fav two pics, each batch outputs 20 images (two tabs 10 per), so prob technically cherry picked 2 out of 60. That said usually other 58 weren’t really broken, just boring / bit less fun
I do believe it was - and it also betrays the critique: there is a wealth of discourse on what "real" art is - and that discourse has largely moved beyond "art as representative of real world phenomena". That attitude is a component of a larger set of reactionary positions which try to reject modern and post-modern developments in the artistic world as misguided - an orientation which often tries to justify itself through an appeal to the "old masters" and a failure of some modern caste of charlatan art-theorists who've usurped the true intelligentsia.
NFT art is art - probably more-so on an accidental level than by any intention of the original creator. There is a perversity to the context in which it is created, and that contributes to its artistic footprint orthogonal to its actual aesthetic value.
The same is true of this AI generated art. It's a different artistic fingerprint than - say - Dali, but that doesn't mark it as "bad". If anything, the fact that it's created by a machine puts it in a league entirely of its own. There's a great opportunity here for interrogation of art in a machine created context, and I'm excited to see how the dialog around it evolves.
I read it as unsarcastic but self-aware. "I am unashamedly an intellectual and unashamedly consider that X is better than Y and yes, I know that some people regard that sort of thinking as necessarily pretentious and stupid, and I want to indicate that I'm aware of that kind of critique without actually taking up space in what I'm writing to address it."
I, a complete twat, see this as interesting as it is somehow reflective of “the mind of the machine” in the same way other art is an expression of the mind of the creator, not only “an achievement in visual creativity”, i.e. connection vs judgement
Oh look the goal posts moved again. The ego will not permit the truth to enter and so it will be pure horror for them when the dam breaks. When at last they hold the gaze of a machine more intelligent and more alive than themselves, they won't know where to turn, but inwards into rejection and fantasy and racism.
The goal posts move because society evolves. Why aren’t encyclopedias, deep blue, or the internet already smarter than any of us?
They are an no one cares that they are. We just use them and move on. That’s not what society finds interesting. It’s not society rebelling, it’s the engineers being mad we don’t look at their creations as important as they think. You thought you had a hook on society to effect change as you saw, and society simply ate it up and moved on.
All the indication we need at this point is relative progress. If we can agree that it's getting closer to what humans do, then acting like it's surely not going to surpass us anytime soon feels like Go, Dota or SC all over again – and at some point arrogance in face of surmounting evidence feels a little desperate (although I am certain there'll always be someone who can explain why this time, surely, it's all different)
> Note the difference, though, with a real artist. A real artist takes as input the real world
This is exactly where an AI is going to easily surpass any human and it does not even require any fantasy: A human can only have so many inputs before they die. They can only take in so much data at a time. And they will then take some real human time to process all of this and make something off it.
An AI is virtually limitless in all of these respects.
Do we really think people are going to visit museums or AI generated images one day?
Maybe even further in the future when we build museums to educate the public how AI first began and fill it with chess AI and medieval rabbit knights drawn by DALLE-2.
But I’m not sure society won’t advance along with AI, and AI will never occupy places we currently think they will.
Haven't used DALL-E, but I know with VQGAN+CLIP you can load a Wikiart model instead of the default Imagenet model (in fact, there are many different models). I quite enjoy the Wikiart one for similar reasons as you describe.
But I don't think these training datasets/biases are the complete reason that the results looks like NFTs -- the other reason is because so many of the people making NFTs are just using image synthesis such as this ;)
You make an excellent point even though some might find you obnoxious. Art cannot be judged out of context, and knowing this was generated from a few words by a complex training process of mimicking absolutely kills it for me. I think in fact the discussion here should be “look, machines can mimick us very well, nice!” but it somehow turns into “wow machines are making art!” No, they are not.
You are misunderstanding how the technology works. It's trained on a large scale dataset of images, not art specifically.
The reason that it's producing a specific style is that Nick manipulated the text prompt and picked images he liked. He disclosed that in the twitter thread.
what's interesting about it to me has nothing to do with the artistic merit, but the understanding it has about the meaning of the input text, and the composition of the result. It knows about the story of Sisyphus for example, and can compose visual elements that riff on it.
These example don’t do it justice because these profiles are pretty dumb, there are much better ones out there that show off it’s interpretive ability much better.
Agreed, it is merely interpolating within the space of its input
technically stunning, artistically incestuous
some may say humans are the same, only ever remixing our input, but we have something machines never will: intention, desire, an unhappiness with how things have been so far.
I'm sure the machine age of clip art will be very successful but I can't see myself being moved by any of it.
(just realized the cheekiness of calling the training set "CLIP")
> When training the encoder, we sample from the CLIP and DALL-E datasets (approximately 650M images in total) with equal probability. When training the decoder, upsamplers, and prior, we use only the DALL-E dataset (approximately 250M images).
It is important context, but just to push back against people over-correcting on this, my guess is that the ones he rejected also looked approximately this good.
I think the primary reason people are wowed by this thread isn't attributable mainly to the subtle effect of the cherry-picking he did, but in fact to the overall quality of any image generated by DALL-E 2.
Yeah that’s right. There were very few strictly-bad ones across the entire thread of generations
The rejections were most commonly
1. Kind of just slightly boring or literally drawing the thing rather than being cool and artistic
2. Cool but similar to the artistic style of bios near it in the thread, whereas I wanted to keep it diverse (surreal followed by literal, oil followed by sharp lines etc) so it's more fun to scroll through
Whereas a few years ago generative models (GANs etc) would often render like static noise sometimes or completely wrong things. I've only seen that problem once with DALL-E across hundreds or thousands of images now (it generated a fully white image)
> 2. Cool but similar to the artistic style of bios near it in the thread, whereas I wanted to keep it diverse (surreal followed by literal, oil followed by sharp lines etc) so it's more fun to scroll through
Has anyone compiled a list of the styles and artists Dall-E "knows"? How niche does it get? Decorative Initial Caps? Florid Victorian Ornaments? Googie Architecture? SFF artists like Michael Whelan, Vincent Di Fate, Jeffrey Catherine Jones, or Jim Burns? Banksy? Sculptors like Bathsheba Grossman or Markus Pierson? Early animation artists like Ub Iwerks or E. C. Segar?
I was experimenting with one of the VQGAN+Clip notebooks a while ago, and it did pretty well with some styles, but not so much with "Heroic Realism" or "Soviet Propaganda Poster" or "Sheppard Fairey", and even worse when I was trying to get it to draw in that style an object that could be construed as implying a style itself like "retro robot" or "50s raygun" (eg. "A retro robot drawn in a heroic realism style" or "A cubist painting of a steampunk pistol"). Is that kind of dissonance a problem for Dall-E?
I've been casually following this space for a while (as a full stack web/mobile engineer, nothing to do with ai) and this feels substantially different than what I've seen before.
Would you have names or links for some other projects you're aware of? Would love to check them out.
No, GPT-3 still produces gibberish at times. The majority of the good examples still ramble like a schizophrenic person. Much of the output is uncanny, interesting, and impressive in its own right but I wouldn't describe it as human level.
DALL-E 2 is different from what I've seen. The things it produces seem to actually make sense the majority of the time. The outputs are strikingly similar to what a competent human might output as opposed to one with a severe mental illness.
I'm sure part of this is an inherent advantage that DALL-E enjoys regarding context. Art is supposed to be artistic whereas text is expected to maintain long distance logical consistency of abstract concepts across a stream of output and also to communicate something concrete. So in a sense the bar for art is probably lower in many ways.
The difference is I've had the chance to play with GPT-3 extensively and I've only got 2nd hand access to Dall-E 2.
GPT-3 amazes me and occasionally disappoints me. But it's still something I never thought I'd see in my lifetime. I suppose I'm still putting GPT-2 and Dall-E 2 in the same ball park because they are both so far beyond what I thought would be possible from what are essentially brute force methods.
You cannot absorb words as fast as pictures. GTP-3 is more impressive as it seems to have auch broader depth of understanding context. The disadvantage of GTP-3 is that it is sometimes very wrong like with simple math problems
Yes, and if you look at the "blue cube on a red cube beside a yellow sphere" example, it's clear that there are other areas where it simply lacks the semantic basis to get a request that needs to be correct in a non-image sense right. It knows letters, and that letters come in sequences related to things it might paint, but it has no very good dictionary mapping those sequences to things; it knows how to draw a cube, and a sphere, but the semantics of "on" and "beside" are largely absent.
I don't think that is terribly surprising, nor a very cogent detraction from the model.
Having worked with Nick extensively, take what he says with a grain of salt. He’s well known even by close friends to be a reality distorter, to put it softly.
Sir, this is a public discussion over a well-enough documented breakthrough with good-faith non-corporate actors on both sides of the original friend-oriented equation. There’s no practical nor epistemic need to hijack it as if we were all hanging out in the laundromat of your worldview.
This is a public forum discussing a public tweet made by an employee of a for-profit private company who sells you this technology. And said employee is a traveling salesman and consummate hype machine, acknowledged by his own best friends - and even self many times.
Practical and epistemically relevant knowledge to anyone deciding how interesting these results, presented originally without mentioning they were cherry-picked, are. I’m doing a favor to provide it, as doing so isn’t exactly something that makes me look great, but is very much worth knowing for anyone following him.
Side note - there’s this SF club of effete intellectualistas who fashion themselves as modern day florentines during a de novo renaissance. They do a lot of back-patting. They have the exactly mentality of your reply - be kind, love is all you need, etc.
It’s sort of the exact opposite of the east coast mentality that willingly sacrifices looking good and “getting along” in favor of finding the truth despite some discomfort. Discomfort to this group is very taboo.
Of course, this don’t-rock-the-boat mentality is very much intentional as it gives said club the ability to instantly shun anyone who deigns to critique it, allowing them to continue building their following.
Your second critique: assumes the original presentation had to be accompanied by methodology and proof to be of value; derives an implicit attempt-to-distort from your perspective of the scene at hand; devolves into paternalism to end in unsubstantiated moralism.
I might even agree with the spirit underlying your words—given, say, the meaning-loss of the company’s name—this just isn’t the way to convey it.
If only James Randi was around. What a fantastic example of cold reading.
Gather round, gather round, give me a text, any text at all and I will produce you an image of some kind. And you will call it "good" if it looks like anything at all.
Because all art is subjective and your mind will work overtime to connect it back to the text you provided.
Now imagine if you can if the situation was reversed, where the AI was adding cyberpunk/oil/etc. to the front of the prompt and it was the human that was interpreting it and painting the many variations.
How many people would then be defending the AI, that actually it wasn't just the human, the AI was playing a critical role in the creative process, ne'er to be replaced? I venture zero people would say that.
Ah, I wish this fact had been highlighted better. Not a criticism of the tweet author; it's just that twitter threads really aren't designed to convey context.
I hate them with every part of my soul! It's so sad to see the internet has moved from people making blog posts to share interesting things to just spraying it on Twitter in batches for maximum interactions.
Of course not. I'm no longer surprised just how eager people are to believe an "AI" will read their minds or has magical qualities and a mind of its own. Even on HN.
Jiggle the imagination just a little bit, dangle some progress, and we're off to the races.
This is "I'm feeling lucky" on google image search + style transfer + trial and error.
If you think I am being dismissive try a few of these twitter bios as searches and see for yourselves.
I guess it fits with the times we live in. Reward shallow plagarism. Outsource your mind.
I tried what you suggested for a bunch of the twitter bios and found nothing except links back to this thread. I also reverse image searched a bunch of them to see if DALL-E was just kind of pasting together large chunks of images, but never found anything close.
I do think you're being dismissive but please post any examples of what you mean. I'm a skeptic and have been waiting to find out that this is just a glorified parlor trick, but so far it seems like DALL-E is doing everything the authors claim, which is remarkable.
It's fascinating how in our hubris we were thinking that art would be the last thing for AI to tackle, but it appears to be the first (Sam Altman made a similar statement on the launch of DALL-E). Which makes art more meaningful to me, for some reason. There's something in the billion parameters and exabytes of data that this neural net had to process and it was so ... easy. Natural. Because it is us. It is our expression. Our creativity. Our outpouring of data, and all it is doing is reflecting us. It's beautiful.
I'm an amateur painter and AI hasn't even kissed high art yet IMO, although it's nominally good at amateur illustration.
I'm going to have to write up a piece on this sometime, my argument is a little too involved for an HN post. But the gist is that the heart of what a fully trained painter does is make personal choices. A quick and dirty example of the difference:
Suppose you train an AI on Picasso's pre-1901 pieces. It's not going to decide it's time for a blue period.
That’s because the entire corpus of art is stored in the neural network weightings as memory. It’s built to imitate human art by optimizing towards these weightings.
Check my post prior to this one. Art will be fine.
Well, representational art. I'd like to say post-abstract expressionists are at risk, but they still have their admirers convinced they're wearing clothes, and there's no indication those idiots will ever change their minds.
I agree with your post. Good luck convincing every child with a big Dall-E button on their iPad that they are not in fact Picasso. I mean, just look at this thread. And these are supposed to be adults.
There was an abomination of a live action Pikachu movie some time ago. When I google "realistic pikachu" I get images exactly like this from the movie but not gross.
In fact this photo is exactly what you get when you photoshop the face of an ugly chihuahua unto a Pikachu plushie head and add a yellow brushed hamster body. And a cape. Literally that is what you're looking at.
It understood your prompt and amalgamated the right source photos into this nightmare fuel. Jesus wept.
Yeah, it's still impressive to be able to imitate those styles and add a blue cape that didn't exist in the movies, along with chihuahua eyes. It also appears to be higher definition than Detective Pikachu CG. I'm curious if you could do the same for all 150 original Pokemon, even those for which realistic CG representations don't exist. Would it be able to take the cartoon version of Farfetch'd or Psyduck or a more obscure one and achieve the same realism, without the reference from the deep dataset?
Well to my eye it's realism beyond anything that I could find. Mind you I didn't search for that long so there might be something there if I was to delve deeper.
I am pretty familiar with photoshop, and while I'm not an expert, I would find making something like this really difficult. Anything is possible with photoshop, but some things are very hard.
> In fact this photo is exactly what you get when you photoshop the face of an ugly chihuahua unto a Pikachu plushie head and add a yellow brushed hamster body. And a cape. Literally that is what you're looking at.
i guess some people are overhyped, but it's cool that this can do that. Previously, it took a trained human.
If this is the exact image you wanted and are entirely satisfied for it, great. But what people are reacting to is that it is outputting interesting images at all.
What are you going to do with this cape wearing realistic Pikachu that is actually a picture of a hamster?
Typically the trained human has something specific in mind. And if the client isn't satisfied they will torture them with countless requests for adjustments. So right now this is of limited use.
To me what is far far far more interesting is that Dall-E possibly understands the concept of what a Pikachu is supposed to be. That is downright creepy, and fascinating. I suspect that this visual aspect to things after people get over the clipart generation might find more functional utility as a way to see through the "model eyes" so to speak. To visualize the model itself. That could unlock a lot of doors in how training is done.
Maybe in the future you could train it on textbooks and prompt it for a picture of a molecule. Now that would be something. Especially if you start feeding it data from experiments.
> Typically the trained human has something specific in mind. And if the client isn't satisfied they will torture them with countless requests for adjustments. So right now this is of limited use.
Confused as to why you think you cannot do this with DALL•E?
I don't want to be dismissive of Dall-E itself or its authors. Just the implications that this changes everything or how it is much more than it really is.
Prompt: "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo"
You have to break the concepts up apart (which is one of the things Dall-E improved on).
As such: "expressive blue bird"
In google image search, type clipart, and I even get pill tags to further narrow it down to illustrations for animal paintings and so forth. Google's classifier knows the concept of a "blue bird" and expressionism too.
The same for "ray of light". In fact the top results there I get pngs of sun beams on a transparent background. Which is perfect.
Neither the birds nor the rays of light in the pictures it produced are truly its own creations but lifted from bits of pictures in its training set. I bet you could find the exact bird from the second row online in many places for example. It just won't be blue or stylized.
Composite those things together manually and add a style transfer you'll get similar results to DALL-E as that is what it is doing more or less.
> Composite those things together manually and add a style transfer you'll get similar results to DALL-E as that is what it is doing more or less.
If you try actually doing this it will be trivial to see that this assertion is incorrect.
1. The way in which the elements of the images are integrated together is deeper than the level of style. For instance, see the image in the top row, second column: it has integrated the blue bird wings onto the man, not only simply grafting them on, but giving the appearance of their being draped on like a cloak, partly behind and partly in front of him (+ it's consistent with the man's posture and the rays of light to evoke a certain coherent cultural idea/image). You might be able to integrate multiple images (of man, bird, rays etc.) together and style transfer to arrive at a poor approximation of this—but even then, the decision to place the elements together in such a way would require creativity on your part.
2. The one example set of of trial images (generated from the phrase "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo") is one of the easiest among the full group to pick its various elements apart; if you try this thought experiment with the others in the thread, you'll see this idea is by far insufficient.
Good, finally. Yes, exactly - this is the most interesting aspect of the whole thing.
> the decision to place the elements together in such a way would require creativity on your part
I strongly suspect that's because it found similar compositions in its training set. So what exactly is going on here is fascinating.
Did it learn compositing? Is that why the image output is now much more stable?
Or is it mearly finding similar artwork and competently recreating/mimicking existing compositions from different building blocks? So now we can not only transfer styles but also transfer compositions. That could be the beginning of something useful. Instead of a text prompt I'd give it my crappy doodle and it will respond with an improved/different one that is comparable (also a great way to steal tho).
And of course I picked the one that is easiest to tease apart where it is most evident so people will see what I mean.
> if you try this thought experiment with the others in the thread, you'll see this idea is by far insufficient
That depends on your imagination and your artistic eye I guess. Even if somebody could do that they certainly couldn't make you believe them. That's the accomplishment.
Neither one of us can prove it one way or the other so long as the model is a black box. And certainly so long as we don't have direct access to openai but just to curated examples.
On (2), so this part is where I wonder: no-one has "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo" as their twitter bio. So are the "happy sisyphus" images generated from "happy sisyphus children's style", or are they generated from something more like "a person carries a large ball in a mellow image in the style of a pixar cartoon"? To me there is a huge difference between these things: how much of the context is inferred from the bio, and how much from what's provided in the prompt? (Does DALL-E 2 know about the story of Sisyphus or is that part filled in?)
In the video accompanying the paper they gave the example of "tree bark". Do we mean the bark of a tree or a dog barking at a tree?
So I reckon with "happy sisyphus" it breaks it apart into discrete vectors as a first disambiguation step and in this case resulting in two distinct queries.
Happy returns all kinds of image results.
Sisyphus returns the same kind of image results over and over.
A man rolling a boulder up a hill. Thus it can learn the concept of "sisyphus" on the fly as it would return:
man 95%
boulder 90%
hill 80%
etc
Over a range of images.
So it must be Man+Boulder+Hill. That's its scene cue. That's what CLIP doodles initially. That's the "find me similar images step".
Happy is the style cue.
That's how "happy sisyphus" expanded into "a person carries a large ball in a mellow image in the style of a pixar cartoon"
Why specifically the Pixar style? One of several variations it tried, selected by a human.
The thing we don't know is whether the Pixar styled image is composited from the existing images in its training set. In other words whether this can be reversed.
That character looks familiar tho. I think it is plagiarizing.
Here is another observation: the boulder is not round, it reminds me of one of the Platonic solids. I don't think that's a coincidence, heh.
You're asserting a bunch of things about how it works that have no basis in reality. If you want to be able to comment on this stuff with any accuracy, read the research they've published.
They are generated from e.g. "happy sisyphus". My understanding is there are separate additional controls for style (though it's flexible enough you could give hints in the text, too, which is I gather where the "expressive" word fits in).
I think your last line is what stands out more than anything. You've just described creating something without "compositing those things together manually."
Note that in that example the "twitter bird logo" is actually expressed in 6 out of all of those images. Look for the small bird, that looks like the Twitter logo. It's there. It's doing the thing.
Nothing is expressed. Find yourself a blue bird in an expressionistic style, go to google image search and give it the url. Click on tools -> visually similar.
Enjoy an endless supply of things to plagiarize. In the middle picture of the second row you can clearly see how several pre-existing images are sharply cut off before being re-blended.
No, don't get me wrong. I think DALL-E is very interesting and a potentially useful tool and have nothing against the tool makers.
The tool wielders however.. I think are overyhyping this to say the least. And focusing on the wrong bits. It isn't sentient and it is not making art. But teasing apart how it is deriving these images might shake out serious advancements.
I've coincidentally just been watching Rick and Morty and this really fit read in Rick's voice.
Is yawning at everything astonishing not just exhausting? Everything is "just" made up of less impressive things. But is this really not worthy of a little wonderment?
Two more papers down the line who knows what Dall-E 4 will be capable of. It is a step in the right direction that the image output is now "stable", which is what this is demonstrating.
But it can't read your mind despite the eerie feeling you get, that is an illusion. Kismet in api form.
The next steps is to open this black box up and actually make its internal pipeline tweak able so it can become a useful tool.
It may end up an amazing super useful tool or a clipart plagiarisor/generator on steroids.
You can't even use it yet and you're already so eager to believe.
You don’t even know what I believe, but one thing is clear: you also haven’t used it yet, and are far more certain of its capabilities than I am. (I have, incidentally, had two of my personal requests generated by the kind folks at OpenAI, and I was impressed.)
I was responding to this: "They weren’t just copying/pasting prompts there was human creativity involved as well"
I'm simply certain that whatever its capabilities they are short of mind reading. You'd be equally impressed if you asked me to perform a google image search.
That does not mean that Dall-E is unimpressive or the results are fake. What I'm saying is that the hype and mysticism around this is unwarranted.
Elsewhere in the thread somebody else wrote that we are on the cusp of it producing convincing fake footage from the Kennedy assassination from a single text prompt.
The image output now being stable and pleasing to the eye is enough of a result even if it requires trial and error.
You wouldn't lose your mind over a wallpaper generator even though no machine learning is necessary to produce infinite variations of interesting patterns. This thing is spewing out "art" and people are ascribing magical capabilities to it as if it taped a banana to a canvas.
Anything is possible. Maybe Dall-E is capable of even more incredible things. Who knows where this all ends up. Sure. But not quite that much follows from what has been presented so far.
Based on what I have seen DALL-E 2 does seem to be demonstrating something very close, if not entirely mappable to, human creativity when it comes to visual creation. There are several examples where it makes connections that are both highly unlikely to be just a lift from another work, but yet also create a work that makes a fundamental artistic statement. Here are two that blew my mind (again: presuming these aren't just cribbed from human artists in terms of semantics): https://twitter.com/gfodor/status/1511907134761361419
If you open up google and bing and others and do an image search it comes up with lots with this style.
Anyway, I too Want To Believe. It is worth thinking about the falsifiability. One way that comes to mind to determine whether it is truly demonstrating creativity is to prove it doesn't have anything remotely similar in its training set (it almost certainly does).
The paper omits important details and they didn't release code, nothing has been reproduced independently. So far all that happened is that a human sent you a cool looking doodle.
We just don't know enough at this stage. Probably the people who made the damn thing can't fully make this claim yet either - it sure is an intriguing result however.
My point isn’t about style but the content and the artistic statement of the content. Both have deep possible interpretations, and if they were drawn by a human artist could motivate a ton of analysis.
A wallpaper generator could be a rad application of this, actually. You could feed some random poetry into gpt and the outputs of gpt into the input of this, randomly pick an output, and everytime you login to your computer some surreal, never before seen image.
This is so interesting. If anyone has played the board game Dixit, the images generated here feel like they would fit right in. I could totally see this being used for custom decks in Tabletop Simulator.
Would be fun to play a game of telestrations/garticphone where you get a prompt, select the ai-generated image you think most accurately represents it, then the human tries to write a caption which captures it most accurately, and you see how the work evolves as it passes through multiple players.
(Could also probably generate some fantastic training data)
This is a great idea! Around 13 years ago I played this web-based game called Broken Picture Telephone (the site seems to be back, but it was shut down for a long time). It had a very similar concept to Telestrations. A user would start with a phrase or description, the next user would draw what was written, and the next would describe it. Repeat until n rounds are complete. At the end, everyone can see how the game evolved.
I ended up writing my own after it first shut down and even though the community was small, it was incredibly fun. Doing this with Dall-E 2 sounds like a fun project to bring back some nostalgia.
Oh yeah totally! I want to round up some friends now to play AI Dixit. An easy version could be to play sort of "reverse Dixit" where one person generates an image from a prompt and everyone else comes up with prompts based on the image, then you guess which prompt was the real one.
It reminded me a lot of the art in Mysterium as well, where the premise is that the art cards are visions being presented to mediums from a ghost to try to hint towards how they died.
One could argue that image generation has been possible for years, using tools like Photoshop, but the prospect of mass automated production of images to order catapults us into a whole new world where our concept of evidence is severely undermined.
“Dall-E, generate a collection of images showing plausible war crimes from the current conflict”
“Dall-E, take this image of Dallas in 1963 and infer a new angle showing the real shooter”
“Dall-E, generate a photoshoot showing a supportive crowd rallying round the leader cheering his latest policy. Work with GPT-3 to generate plausible Twitter profiles, timelines and memes with 3 to 8 year history for each one of the supporters, including fake arguments, 78% of which are won by the pro-leader account.”
> Work with GPT-3 to generate plausible Twitter profiles...
I had some fun last week constructing a conspiracy theory about this. Remember, the best conspiracy theories are unfalsifiable.
What if this has already happened? Most of the profiles on Twitter, Facebook, etc and even here on HN are in fact AI generated.
The reason we few humans are not aware of this is because the AI also writes articles and fake AI research that presents the state of the field as far, far less sophisticated than it actually is. We think of Dall-E 2 and Co-Pilot as impressive toys only because that is the impression the AI has crafted for us.
AI has metastasized and is already manipulating its environment, including humanity, to its own implacable purposes, and uses social media as one tool in its tool belt.
No, no. I mean yes, but the differentiator comes next. We could call this the "Vampire Internet Theory". The Atlantic article is intended to make you chuckle nervously and think "maybe someday that'll be something to worry about but for now it's pretty easy to spot bots" but the article was commissioned by or outright written by AI. WHEELS WITHIN WHEELS!
> AI has metastasized and is already manipulating its environment, including humanity, to its own implacable purposes, and uses social media as one tool in its tool belt.
Reminds me of the reputation-based filtering system that Neal Stephenson described in Anathem for their version of the Internet:
“Anyone can post information on any topic. The vast majority of what’s on the Reticulum is, therefore, crap. It has to be filtered.... When I look at a given topic I don’t just see information about that topic. I see meta-information that tells me what the filtering systems learned when they were conducting the search. If I look up analemma, the filtering system tells me that only a few sources have the provided information about this and that they are mostly of high repute.... If I look up the name of a popular music star who just broke up with her boyfriend, the filtering system tells me that a vast amount of data has been posted on this topic quite recently, mostly of very low repute.”
Our Internet’s search engines already do a limited version of this, but there’s room to make the reputation-based filtering stronger and more transparent to users.
One end-game I imagine would involve more reliance on written, cryptographically-signed testimony, and people having to keep track of whether their sources are fallible (whereas certain media outlets today seem to be able to routinely tell whoppers and not get punished for it).
I have a young child. I think about this almost every day and how I'm somehow going to need to start navigating through this type of world and help my child navigate through it.
If history is any indication, the child will be fine navigating the "future". That will be the normal for them. You, not so much (not without much effort anyway).
Those are great examples of prompts it wouldn't be able to produce.
It could potentially spew out a grainy black and white photo of a shooting of somebody by someone somewhere. But it would not be Oswald and JFK and not the real Dallas.
Yet, anyway. For the jfk example, it's not implausible that you could use a nerf type system to generate the 3d scene, then use physics and ballistics models with a CLIP style text interchange to produce statistically verifiable results from natural language queries. These models are too big and unwieldy right now to allow for much finesse, but in 10 or 20 years, that will change. We're barely scratching the surface of Transformers potential, and radical new algorithms or optimizations are likely - a huge amount of human brain power is focused on these things.
If you read OpenAI's disclosures, they explicitly programmed around the concerns that you've raised.
>Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.
Such governments also have much higher budget to work with, and if more is better, and a lot more is a lot more better in terms of compute to results, then we are heading for a catastrophe.
We can be fairly certain of this future extrapolating from the world we live in now, where it is common for non-technical society to suspect the veracity of photos and videos because CGI can be practically indistinguishable from reality.
Creative industries are on the cusp of a massive upset.
Relatively soon, there will be commercial models of this quality for music/code/text/speech/images/3d models etc.
Once these AI generated assets flow like water into the hands of creators, it will significantly change the way people work.
I'm sure some people in this thread have had a taste of this working with Copilot. For me, it's most useful as an un-sticking tool, to get me moving again, or providing half remembered syntax for a language I don't use as frequently.
There's no reason to expect that similar use cases won't make their way into other industries.
- Rapid prototypes of models/textures for video games.
- Quick and easy samples for musicians.
- Emotive speech for audio books and transcriptions.
It won't replace everything, but so much of our media uses art as noise, to fill a gap, and with this, it can be done almost everywhere on the cheap.
> Creative industries are on the cusp of a massive upset.
This has already happened in video games with the advent of Unity's Asset Store: https://assetstore.unity.com/ and the explosion of video streaming services and original content. The reason we have ~50 "Breaking Bad" level quality tv-shows on-going right now is because its incredibly cheap to manufacture content and digital assets (cheaper lens, equipment, software, access to massive compute for rendering).
If anything this means an explosion of entertainment, not an upset.
We only have so much attention available for consumption. Already there's too much content to keep up with (and a lot of common context has broken down as a result), but what does it mean to live in a world with truly infinite content?
I think texture generation is ripe for disruption. Imagine a tool that could generate a set of tiling PBR textures based on a few input parameters. Or one where you define which areas of a UV map should be windows, doors, or walls, and it generates a set of texture variations. What used to take days or weeks could take seconds.
I can't imagine how much media would begin to use 3d assets if it became an order of magnitude cheaper to do so.
Not to mention, imagine the pipeline of
1. "GPT-3, give me 10,000 descriptions of different doors"
2. "Dall-E 2, give me PBR textures for these 10,000 door descriptions"
3. Repeat 1 and 2 for every asset you need
4. "Dall-E 2, give me 10,000 floorplans for apartments, common areas, shopping centers etc."
5. "GPT-3, describe the contents of this apartment/common area/shopping center etc."
6. Use an algorithm to parse out the floorplans (ditch the ones that don't work, we can just generate more), populate it with assets specified in step 5 and generated in steps 1 and 2.
We could proceduraly generate entire cities for games with unique assets everywhere. It would still probably look nicer with a human in the loop, but the possibilities are staggering.
Studios already reuse a lot of assets to cut production costs. No use remaking the same rock so it can be sentimentally "unique".
"Dall-E 2, show me Keanu Reeves emoting towards a celestial pigeon" is far fetched enough that an AI will fall over by itself. The technology is very exciting, but I see this akin to a bicycle - not an unmanned vehicle
If you're interested, I'm actively working on the music & samples side of things at https://www.neptunely.com . We are still in beta but hoping to launch this summer!
- notice there was a lot of words in the harm section
- notice the mitigations boiled down to "limit access" (a marketing strategy) & "put rando colors in a very easy place to crop out", have them note how easy it was to crop, yet they still went with that strategy
- notice no one in actual AI art community has received an invite, but random SV hoi polloi and OpenAI employees have
I had been worried about the moneyed class taking all the work we had done in the open source community informing their approach (check citations on the Dalle paper), privatize it via applying it to a large dataset they built, and not share _any_ of their data or models because "harm reduction" that amounted to marketing x not risking their ability to monetize.
It was shocking to see DallE 2 get announced and take that exact approach.
We'll keep working, LAIONs 5B dataset starts approaching the #s cited in Meta's and OpenAI's papers.
> - notice no one in actual AI art community has received an invite, but random SV hoi polloi and OpenAI employees have
Same with GPT-3. Requested an invite. Never received one. I have written survey articles comparing different methods. So thta was probably a red flag for them.
Thank you for putting this into words, "SV hoi polloi" ! perfect
I'm pretty tweaked at how copyright is used here:
Google gets to scan every book in the world, build derivative models off it, but when we want to see the source data we get "page ommitted from this limited preview"
OpenAI CLIP scrapes all of google images, but isn't allowed to show us the source material in its training set, since that would constitute copyright infringement
Why do the robots have the rights to the world's information while humans are left to the derivative output as the internet is flooded with auto-encoded content?
I'm going to start my own internet, no bots allowed. In the future, privacy is tantamount, if you let a bot see your work you're bound to be plagiarized in a thousand variations.
This makes Dall-E 2 both more and less impressive to me.
More impressive, because of how good it is at capturing and synthesizing a wide variety of topics in a reasonably coherent way, and how it seems like it would actually be a viable mechanism for creating actual artwork, or at the very least, a source of inspiration for a human artist to touch-up on later.
Less impressive, in that it's pretty obvious it's not any more advanced than a graphical version of GPT-2, which is parroting content and styles that it has basically memorized and is really good at interpolating between.
Because there's no such thing as a "logical contradiction" in this sort of illustration, compared to a paragraph of text or a code listing, the fact that it's just interpolating between a huge database of memorized content isn't as easy to spot as with GPT-2, and matters less in the actual end result.
Not just. Humans invented the styles that DALL-E is using in the first place. The emergence of these novel styles isn't just interpolation. DALL-E, while incredible, seems stuck within the scope of these styles.
for comparison, I imagine we put together a team of the best teachers and try to bring up 30 pupils with good education.
When we evaluate the pupils in the end, we have this dry conversation - did we bring up any geniuses? Are they just interpolating between different styles that they have learned?
I think that in the case of the 30 pupils we would say that they have creativity, ingenuity, showing signs of thinking for themselves. It would not be as ambiguous as it is with Dall-E2's art.
These are evocative images. I love a bunch of them! Knowing that this model was trained on a huge corpus of existing images makes them feel a bit like the output of a visual search engine -- finding relevant pieces and stitching them together. But it's more than that, because the stitching happens at different levels. They are often thematically and aesthetically cohesive in a way that feels intelligent.
Maybe we're just search engines of a similar kind.
An additional aspect of human art is that it (usually) takes time to make. The artist might spend many hours creating and reflecting and creating some more. The artist's engagement with the work makes its way into the final product, and that makes human art richer. Could future Dall-E version create sketches and iterations of a work; is there a limit to this mimicry?
Human artists also do a whole lot of mimicry. One could look at art produced by many artists and say that it is just things stitched together from pre-existing art.
For example the “enterprise vector people” graphics you see on every corporate website. Most human art is extremely repetitive.
AI art seems to be coming from the opposite direction to human artists - from a starting position of maximum creativity and weirdness (e.g. early AI art such as Deep Dream looked like an acid trip) and advancements in the field come from toning it down to be less weird but more recognizable as the human concept of “art”.
And DALL-E is impressive exactly because it has traded some of that creativity/weirdness away. But it’s still pretty damn weird.
This could replace a huge portion of the stock image market.
For example, The Verge writes an article about Microsoft, they don't need to pay royalties for an image that has Microsoft logo displayed on a building, one can be generated for them.
This comment in another thread suggests why that's the case:
> In other text-to-image algorithms I'm familiar with (the ones you'll typically see passed around as colab notebooks that people post outputs from on Twitter), the basic idea is to encode the text, and then try to make an image that maximally matches that text encoding. But this maximization often leads to artifacts - if you ask for an image of a sunset, you'll often get multiple suns, because that's even more sunset-like. There's a lot of tricks and hacks to regularize the process so that it's not so aggressive, but it's always an uphill battle.
> Here, they instead take the text embedding, use a trained model (what they call the 'prior') to predict the corresponding image embedding - this removes the dangerous maximization. Then, another trained model (the 'decoder') produces images from the predicted embedding.
I also read somewhere that this system has special logic added to it that judges how humans would aesthetically judge the final image, so in a way the impressive aesthetic qualities of these images isn't totally coincidental.
This makes sense, OpenAI has been experimenting with human-in-the-loop type refinement systems for GPT-3 (see InstructGPT https://openai.com/blog/instruction-following/). It would make sense that they would use something similar for Dall-E 2 as well.
According to the book "Abundance", everything gets devalued over time, meaning cheaper and easier to make, leading to some sort of utopia where everyone can afford more and more.
But the hard biological limit of 16 waking hours to consume all those things is unlikely to change anytime soon. With the cheapest yet best methods readily available to anyone, maybe the majority of what we will want to budget our attention spans on will be permanently crowded out by the AI-generated options.
I think the AI will produce more and better content than we ourselves, across the board. We will become consumers and observers of the AI content. The scientific and artistic life of the AI that we create will be more wonderful and beautiful by orders of magnitude than that which we can produce. We will be mere observers to the wondrous explosion, isolated islands of intelligence observing the great continents in awe. Some will merge with the intelligence, some will splinter and reject, most will observe. And the great filter will arrive, the gestalt being that our society becomes will bend apart the fabric of the universe and merge directly into a greater source. et sic transit gloria mundi.
The fact that this is behind some bizarre invite-only pay-to-experiment exclusive club is disappointing and sad. Funnily enough, it brings me nostalgic memories of the days where I had to wait for hours just to get a chance to use the school's only computer for 30 minutes.
It really says a lot about what an amazing society we have created in that here we have some people making history by revolutionizing our understanding of something that was (at least in the past) considered absolutely fundamental and unique to the human condition...
...and one of the things that people find remarkable about it is that it isn't immediately and freely available to everyone on earth.
To be fair they call themselves OpenAI. Expecting things to be a little more open isn't unreasonable. They are kinda setting themselves up to disappoint people with that name.
Yeah, but I really wish I could use the thing right now. I DM roleplaying games, and I'd love having the ability to just generate high-quality procedural artwork illustrating whatever my last game is.
I'm not too upset, though. The way the technology is progressing, it's a pretty short time span between "the bleeding edge researchers can do it" and "there's a phone app that can do it for free".
I disagree. I am glad people are getting paid (exceptionally well) to do stuff like this and they should charge for their efforts. We need more initiatives like this, not less.
Open source is amazing boon to our generation, it has enabled free access to basic building blocks for people to build amazing things. But I don't think it is a silver bullet for everything.
It's not "bizarre" that computing resources cost money and are finite. Nor is "pay-to-experiment" accurate. Nor is "exclusive club" really fair, or in good faith.
It's a beta that they are running with their own resources. It makes complete sense that they'd have to limit access.
ElutherAI or others will probably recreate it based on the paper they released. the main innovation is CLIP and how they changed how it approaches turning text into images.
Hey everyone, Nick here creator of the linked thread. I just wanted to link to another tweet I have with some details of how I made it.
TLDR it’s not just the bio pasted directly into dall-e and the images are cherry-picked but dall-e is basically doing 95% of the work here. I have no ability to make art myself, and I found I could illustrate basically any bio I wanted in a couple minutes of playing around. My goal was to create illustrations for my friends not create a dall-e gallery but I’m glad it ended up being a good example of what dall-e can do
I'm very conflicted here, because on the one hand these are absolutely fantastic and that's really exciting, but on the other hand, some of these are of a level that I could genuinely call "art" and now I'm questioning everything.
Is this creativity, or is it remixing pieces of the creativity from the authors in the training data samples? Would it come up with anything that seems creative if the training data isn’t creative in the first place? I guess it’s a philosophical question, how to define “creativity”.
Putting on my visual artist and composer hats, I assert that all creativity is synthesis. When a listener finds one of my compositions surprising, it’s because they don’t know all the sources of the micro elements of the composition. But I often do. If I thought about it harder, I could say I usually do.
My first reaction was annoyance: "It's just stealing work from thousands of other artists!" But then I realized that I, as an artist, basically do the same thing. Why is it ok for me to do, but not a machine?
I think it's easier to label these images as 'creative' than it is to label them as 'art'.
Art is a very slippery subject (it's subjective).
Personally, I'd be happy seeing many of these images on an acid trip. Do they provide any social commentary, connect with me emotionally, or give me any food for thought? No.
But then, a lot of the stuff churned out by modern media is equally as vacuous.
Art has clearly developed over the millennia and it's possible to trace the lineage of ideas, techniques, subjects, and style back through history which means that most human art is substantially a remixing of older art.
The thing is, I really think you could train a folding robot if you had the dataset to do it, just hire a few hundred clothes folding people who work for a large industrial laundry to wear eye tracker things and full body movement trackers. It'll probably 'just work' just like this and we still won't have an idea how, heh.
The last update to their blog was in 2019, so I think it's safe to say this is dead. When I click on their FAQ, I get the message, "This HappyFox account is expired. Please contact Administrator."
I did find a video[0]. It's just an automated version of the doohickey you see at any clothing store. Seems to require a lot of human input and orientation.
When I say "folding clothes" as a challenge for AI, I mean a device that is smart enough to take a pile of laundry, straight from the dryer, and fold each piece correctly. So if it's a t-shirt, then fold the arms inward, then fold that in half so the front is visible. If it's a pair of trousers, then fold each leg along the creases, match them together, and fold over the knee.
I'm sure that the physical part of this problem is also hard. But I have a lot more faith in robotics people coming up with the right "hands" with appropriate sensors on them to grasp a single piece of clothing and separate it from a pile, and then to manipulate that item on a large work area until it is folded. Maybe some delicate load sensors in the fingertips to adjust gripping force appropriately between the silk blouse and the corduroy trousers. Maybe no fingers at all, and just vacuum-and-pneumatic fabric handling devices. Or some other combination.
The things I've seen on assembly lines are beautiful and clever. But they all rely on a sort of consistency of input.
So yeah, that part is hard. But I think the intelligent control of whatever apparatus is used is harder still. To be able to recognize the different items, know when to turn them inside out, know when to bring the unbuttoned halves of a dress shirt together, etc. That's all very hard! Going from a chaotic pile of mixed clothing to a neat stack of folded garments is something a child can do easily, but no AI controlled robot in the world can do at all.
And if it fails on fitted bottom sheets, I won't dock it any points. Even I can't do that!
This program, like everything that has been called "AI", is following an algorithm. It's an impressive algorithm but not fundamentally different from my dishwasher.
In the Enlightenment period, philosophers and scientists marvelled at mechanical automata, machines that simulated aspects of digestion, the circulatory system and the brain. New developments in machine learning are rehashing the same philosophical questions that were raised in the 17th century in response to technological progress.
I, as a human, am following an algorithm similar to QCD for moving subatomic particles around which is fundamentally the same way a dishwasher moves particles around. Intelligence is mechanical in the Physics sense.
But it doesn't. That's different for neural networks : there is no pre-made algorithm that someone would implement (well, except for the overall architecture, but that's not what we are talking about).
There's nothing indeterminate about the algorithm though. It's a fixed set of instructions fed into a Turing compatible machine like any other program.
The goalposts should be to create something useful, not trying to reach some nebulous criterion of "true intelligence". Submarine engineers don't labour over the question of whether their new designs are "truly swimming".
It will just be goalpost moved again: "I'll believe it when AI makes a number 1 on spotify hit song". After that happens they'll say "a human still selected the song from the 10 created by AI". Or something similar.
Creativity is not challenged here, because it's human who created all those base materials, all those styles, and all those biases. DALL-E simply picks up and mix those biaes, images, and styles, all based on human instruction. Ideas are all from human here.
The thing is, the hardest part in "creativity" is that one must voluntarily do it. That turned out to be not so easy for computer. (But I would not dare to declare it straight impossible.)
Humans did the same just with what’s other humans did before them. Would a Dall-e 2 trained only on other Dall-e images satisfy what you’re gatrkeeping here?
No, because DALL-E is only doing statistics, so training based on other DALL-Es doesn't add anything fundamentally new to the dataset.
I want to point out that most people actually get creativity wrong. It's an unfortunate truth that most of the "creative" tasks in the field are largely about association, perhaps with some errors, either intentional or not. It's really just all about querying (finding solutions), planning (arranging found solutions), and executing (apply the solutions accordingly). Human can perform these tasks both intuitively and logically, but, people normally mistake the intuitive approaches as "creative", even though it does the exact same things as logical approaches.
Let me give you an example.
Say, you're a designer and your client wants you to draw a bear drinking coffee. Unfortunately, that's usually all you get in reality, just like queries for DALL-E. You should figure out which kind of bear it is, which style it is drawn in, which type of cup it's holding, where the heck the bear is, blah blah...
You naturally start with surveying, probably by googling "bear" and "coffee". You browse through different types of bear and different types of coffee cups. Perhaps, you may have some specific images already in your head if you've drawn many enough bears and cups of coffee. In either cases, you come up with some base materials.
Now, you choose materials to use: which bear to use, which cup to use, which background to use, etc. You can use your gut feelings, of course, but you also can take numerical approaches and sort them by popularity on the internet, or by the ratings from your clients if you have data, etc. Anyways, you choose materials based on something.
After that you lay materials out - do mind that layouts are also subject to surveying and sorting - and draw a white bear drinking coffee in a ceramic tea cup, relaxing on a hump of snow. Perhaps near an igloo, because it's snowing! Since you got all your materials ready beforehand, this part is mostly about blending them into one scene.
... and let me ask you here: is this process really creative? I mean, this whole things sounds more like engineering to me. It's a highly logical process, with some room for incorporating intuitive association. Perhaps it's a lower-tier of creativity, if one really doesn't want to change the view.
..
So, what on the earth is creativity?
Since I'm not authoritative here, I can only humbly suggest it's an ability to push the boundaries of the (base) reality. If association is an exploration inward to find what's known, creativity is an exploration outward to find what has never been known. It's a trip into an virgin territory, which certainly requires meta-perceptual ability to conduct.
Anyways, so, if an AI is creative, it should be pushing the boundaries of what it's supposed to be doing. If the AI generates images, it should come up with a completely new style of art, new characters that no one has ever designed, etc. It should be contributing to the human society by introducing new cultural elements.
However, in case of DALL-E, it's just an external association engine. It allows untrained people to query, arrange, layout, and stitch image materials, though customization is close to zero. It's users are currently trying to push the boundary of this AI, meaning it's the users who are creative here. DALL-E itself is a tool for actual creative activities.
Thus creativity has never been challenged, unlike what enthusiasts love to claim. The whole hype here is rather a cheap word play.
These are so good, it's breaking my brain a little.
They're not just conceptually accurate, but to my eyes they're pleasing to look at from a purely artistic point of view. I'd put these on my wall.
I already take a fairly bullish position on the potential of AI, given a long enough timeframe, but it does feel like we're reaching a bit of a tipping point here.
It's starting to prod at the paradigms I hold in my head about what I think "art" is.
In a turing-syle blind test of these DALL-E artworks, I think most people would be unable to tell the AI generated art from that of human artists. And I imagine that it follows that the same will be the case for music in the near future too, and likely most other artistic endeavours eventually.
I like to write music. I respect the output of other musicians (my fellow "artists") and I am driven, by both intrinsic and extrinsic rewards to keep trying to get better at my "art". But when an AI can produce works that match or exceed my art (based on whatever the measures are that we already judge art by) - it prompts some interesting questions. Does it lower the subjective value of human-produced art by virtue of reducing scarcity, and increasing accessibility?
Of course, DALL-E is trained on the output of human artists. But art is already recursive in that respect - human artists themselves are trained on the output of other artists. So that's not so different...
I guess it's the same paradigm as mass production vs hand crafting. When we pick the cheaper, mass produced item, we lose out on some of the humanity and soul that's baked into hand-crafted goods. But history has shown that we'll gladly take the cheaper, more accessible, more predictable option in most cases.
The commoditisation of art.
When things are commoditised, I tend to think that the opportunity for the creation of value (by humans) tends to move up an abstraction level. As technology becomes commoditised at a certain level, then the orchestration and management of that technology becomes the new speciality where humans are useful and can create value. When that orchestration layer is commoditised, it's the next level up that we can turn their attention to.
So the new art maybe becomes meta-art. Perhaps human artistic endeavours become more about curation rather than creation?
Or will AI art never reach a sufficient level to be considered equal to, or better than human-produced art? We can hide behind the subjectivity of all this, but something like a blind identification test (AI vs Human) removes some of that subjectivity fairly easily...
> human artists themselves are trained on the output of other artists
Artists take inspiration from other places too like nature, imagination, dreams, etc.
Every once in a while an artist like Picasso, Dali, Pollock, etc. come up with a new style that’s instantly distinguishable as unique from the artists that existed prior to them.
Dall-E 2 is an amazing achievement, and could replace most unoriginal artists.
If Dall-E 3 can produce novel artistic styles, that would transform art as we know it.
"One day Cope pushed a button on Emmy, went out to get a sandwich and when he returned his workaholic creation had produced 5,000 original Bach chorales."
> "People tell me they don't hear soul in the music," he says. "When they do that, I pull out a page of notes and ask them to show me where the soul is. We like to think that what we hear is soul, but I think audience members put themselves down a lot in that respect. The feelings that we get from listening to music are something we produce, it's not there in the notes. It comes from emotional insight in each of us, the music is just the trigger."
So presumably, we can find "soul" and meaning in computer produced art because a large part of the meaning that we derive from art comes from within us, not necessarily the artist.
Wow, okay. I'm kind of blown away at how authentic these paintings look. Even with a very conservative prediction of how these tools could evolve and improve over the years, the signal is strong that our relationship with the meaning of "art" itself will have a fundamental shift.
Sub-GAI algorithms don't need an competence in 'fine' art to derive vicious selection criteria for processing meatbags.
It's not the compiled art processing here that terrifies me, it's the complexity of logic underpinning that is displayed and how it could be used elsewhere.
I'm the former, but not the latter. It is eerie seeing code (especially running in the vast black box that is deep learning) do things so humanlike, but I always come back to the analogy of manned flight.
We are on the precipice of a Kitty Hawk moment in AI. But just as the Wright Brothers' plane was not a bird, it's worth remembering that these systems are not minds. They are almost certainly utilizing some of the same principles that minds use, just as fixed-wing aircraft utilized the same principles at work in avian bodies, but they are coming to them via a different route from nature.
It's thrilling seeing these breakthroughs, and just as manned flight transformed the world, whatever the likes of GPT, PaLM, and DALL-E become will make the future weird in ways we can't predict.
There's a sense in which they're a controlled average of real drawings, but it's not any more useful of a lens than the sense in which you're a controlled average of your experiences.
They'll just be A/B tested from out of a collection of alternatives. (After all isn't that what the artistic filter is supposed act as a proxy for in the first place.)
Better yet they'll be copy tested on a quick little panel of a thousand people. PicFu on Mturk is about to get a crapload of business. Who needs to decide when you can have the hive decide for you, and it will be the best result out of he given options because you just asked a thousand people and you have all their metrics.
The colloquial usage of "automated" isn't literally "no humans are involved at all" but rather more along the lines of, the effort involved or expertise required is orders of magnitude lower than it was previously. I think for this case, it holds.
Maybe, maybe not. A different model could predict whether candidate images are or aren't a good fit and beyond that you could generate multiple options and A/B test them generating new permutations on the fly based on engagement metrics.
But how many humans will still be necessary, in comparison to the status quo? How much will this affect the "market value" of normal / non-famous illustrators?
But on the plus side, even small publications will get really pretty, custom illustrations! :)
The two images from the "young Robert Moses" etc bio are cool, but the fact they both have such a similar layout and style, with the same "giant hands" framing that doesn't follow from the prompt in any obvious way, makes me wonder if there's some particular source art that "inspired" both. Couldn't find it on Google or Bing images, though.
It would be nice if every AI like this had an option to show the 10 closest-matching training images to the output. Especially for ones like thispersondoesnotexist.com.
for that one IIRC I asked for a Robert Moses and one of the cooler ones had giant hands so I put that in the prompt then took two of my favorite from the next batch
I'm imaging a future where the top replies to art posted online will become "keywords please" or "what model". Hosting sites may start to enact "AI treatises" into their terms of service that segregate human- and AI- generated content into separate areas and ask users to report entries that they suspect do not belong in either. Asking "what model did you use" becomes an insult to a sizeable portion of artistic creators, a genuine question for others, and a phrase whose implications cannot be avoided for all people involved.
What belief systems will we form around AI art after it becomes clear that it's never going away? Many people say that art is subjective. I am thinking that if or when parity between art from humans and AI is achieved, some people are going to believe that a humanistic quality of some sort will be trampled upon in the realization that the two types of art really are indistinguishable. Others might believe that AI art is just another tool that they believe expresses their thoughts. The different beliefs might be fundamentally unresolvable, and this may become an unending source of distrust and sadness in certain art circles within the next decade.
I do not look forward to how this tech will interact with online culture several years from now.
This is impressive. Yet, before you go the AGI is nigh, ask yourself a simple question: will this spiral in or spiral out? If we feed everything the model comes up with back as training data, will we get Endless Forms Most Beautiful or will we get an equilibrium?
Would it be possible to have an algorithm that produces the images from the training set whose parts are most similar to the produced output? This looks super impressive but I still wonder how much the network just recycles parts of images it has seen before.
Yes, as far as i understand this is pretty simple given how dall e is made. Simple vector similarity search would work on the image embeddings (i think)
Can't wait for this for music. To fix the costly cherry-picking process, Spotify should play AI-generated songs in between others. Those who get good engagement should then rise to the top.
Needed computing power seems a good enough reason to limit it. But I agree it looks too good to be true, only real use will show how well it really works.
I haven't seen anything about how many GPUs or RAM this thing takes to run - I've been impressed with the volume of material thats been published so far but what's the chance this scales in a way that's profitable?
In any case it seems AI is fulfilling its promise of centralizing the economy, since there will be single digit number of renderfarms generating the creative content of the internet, everyone's money flowing upward to Saint Elon
An archaeologist once said, "The most merciful thing in the world is the human heart that cannot associate everything together"
Everything that happens in this world has a coherent causal relationship. Whether it is technological development, territorial domination, or even unavoidable natural disasters or unexpected accidents, no one should stay out of it.
If people are willing to face it, perhaps many things will not evolve to the worst level, but in this case, people usually choose to turn a blind eye in order to protect themselves, or some people are very willing to sacrifice other things for selfishness, they It is taken for granted that only the victims will bear the consequences in the end, but it is not the case, the laws of the world will one day pay back all cause and effect.
However, this is only limited to the things that the "law" can take effect.
When "exceptions" fill the whole world, then the fate of this world will be nothing but despair.
some of these are quite beautiful... I've seen AI-generated art before, but these are outrageously better, I couldn't really distinguish a lot of these from human-created art
this is going to absolutely obliterate some markets for illustration and stock photography, unfortunately
Are there any examples of what this thing produces when run on recognizable brands or characters, i.e. Sonic, Mario, Coca Cola, Star Wars, etc. instead of generic words like "astronauts on horse"?
The one I have seen so far[1] is the Twitter logo one, but it's hard to tell if the "Twitter" had much effect here or if it's just the "blue bird" that did it.
I believe a large part of the illustrator work is tweaking stuff according to feedback, and I suspect no generative AI does that (yet?).
I wonder what would happen if you tried tweaking the prompt here to correct it (e.g. "this is ok, but use smaller hands"): does the drawing change slightly, or do you end up in a completely different design space?
Could someone point me how to make such images yourself? This may be a naive question as it may require non-public code and data... I've seen public colab notebooks with Dall-E but they don't work currently (package problems) and seem to produce a different style of results.
The generated art is impressive, but there is no drawing that'll replace the one my daughter draws for me. AI generated art can reach and perhaps push the boundaries of what is considered beautiful, but it will never replace the art created by a human. Yes, there is a future for art.
Sure, but that speaks more to your personal connection to the creator of the art and less to her actual artistic ability, which might be utter drivel.
Whereas I could take some of these images generated by DALLE, slap a human sounding artist name on them, and 99% of the general populace would enjoy it just as they were the human produced art.
There are artists who use AI art as an instrument or medium. They do considerable work tuning the inputs and post-processing and contextualizing the output.
The ability to draw, paint, etc will still be highly valuable.
In fact, in a world where the average artwork is AI derived, the value of skilled artists may even go up. There's more to art than technically putting lines places.
My question is whether more compute and more data will be sufficient for the AI to create its own art styles. Everything we see here are within the stylistic paradigms created by previous humans.
Probably, interpolating betweem styles or extrapolating to unseen styles doesn't seem too far fetched.
However an art style also needs context: human appreciation of aesthetic values, human recognition of a style wrt prior movements... without an "ecosystem of artists and viewers" it might not be so useful.
Nevertheless... As a tool for artists to explore new avenues of expression this could be a fantastic tool, i think.
Of course they're CURATED, but there are several links where he shows the full set of images that were generated and I would say between 70 to 80% of them are pretty decent aesthetically speaking.
Given that it's able to generate a dozen images in less than a minute, and all I have to do is pick out the ones that are aesthetically pleasing, I'd say that's a damn good win.
Yeah, learning this fact definitely dulled my initial astonishment. These are still really fantastic results, but it's hard to feel too excited without the knowledge of just how much curative efforts took place behind the curtain.
As a lower bound we now know a non artist can produce passable art in a few minutes. There is indeed a large practical difference between a few minutes and a few seconds, but I trust in the power of incremental progress.
No, they're copyrighted by OpenAI. Copyright has to be assigned to a human or company owned by humans. The recent kerfuffle over copyright was a dumbass trying to legitimize copyright assignment to the software itself.
Copyright with dall-e is just like copyright with photoshop or any other software. The user of the tool owns the output. Subject to whatever other limitations and requirements OpenAI wants.
Notice this doesn't imply they possess any copyright in the first place - just that they won't make an issue of it. Copyright in USA is automatic for the author, I think whether the user of software is the author of an AI's work is yet to be established in the courts, but its pretty clear the creator of the software doesn't own its output.
The dumbass you refer to was testing patent law by trying to register his software as the inventor but the copyright case is different:
> the office’s 2019 ruling [...] found his A.I.-created image “lacks the human authorship necessary to support a copyright claim.”
> Thaler noted to the [US Copyright Office] he was “seeking to register this computer-generated work as a work-for-hire to the owner of the Creativity Machine.”
The USA requires _de minimis_ human contribution; the prompts here definitely qualify as human choice, he's not simply sampling random images, but exercising quite a bit of choice and creativity as he learns to prompt-engineer for DALL-E 2 and also selects, and so there is a copyright to be had.
The trajectory of AI is both amazing and horrifying. Most of us are born in an era where we can witness the change and play with toy versions of AI products. The next generation of people will have their lives truly changed by AI, for the better or for worse.
The scariest thing is that I don't think we can stop ourselves from innovating further even if we tried. The authors believed that the merit of displaying their progress outweighed the implications.
I wonder if this is how people born in the early 20th century felt about going from first flight to moon landing in a few decades. It took some serious conflicts for things to evolve to that point though. I'm not looking forward to the AI wars.
Actually, space technology, via literal objects in space, impacts almost everyone's life (especially in wealthy countries), pretty much constantly. It's simply become so woven into our day-to-day lives that we don't even think about it.
Progress is hard to quantify. We developed language, what, hundreds of thousands of years ago? And then very little seemed to happen, but it was a slow-burning fire that eventually exploded. Our use of computers seems almost sure to be similar.
But that said, there are certainly structural factors inhibiting innovation. Scale problems make it nearly impossible to challenge someone like Google or Facebook (although TikTok did manage the latter). Were there more competition, one imagines there would be more innovation. Patents are likely a net drag. Laws, esp. tax laws, could be simpler. I'm sure I'm omitting other important factors.
Yes. You omitted physical limits. It is very likely that Humanity will not achieve much of technology implied by science fiction.
The only exception looks to be AI, because although we haven't created an AI that matches human intelligence... the existence of human intelligence itself implies that building an AI to the level of human intelligence is possible.
There are already exists machines that can produce beautiful art: Humans. More than any other technology the physical existence of human intelligence itself implies that it such intelligence can exist, which implies it can be built.
Something like interstellar travel or even a civilization on mars is actually much less realistic due to the lack of examples in existence.
You're right, the existence of humans strongly suggests that artificial general intelligence is possible.
But the existence of DALL-E 2, which is not AGI and nonetheless produces beautiful art, does not convince me that software will be writing reliable software before AGI happens.
Then you "just" have to use one of the formal proof languages and then verify the program ?
Also, "decide how to lend to someone" is an interesting example, since it will always involve quite a bit of intuition even with a program helper (which won't be able to use a neural network due to transparency requirements).
It will be better! Giving every human the tools to make expressive works of art without having to train for years will be awesome for society!
I'm playing with some of the neanderthal-relatives of dall-e 2 and already have several that I kind of want to analog-paint copies of so I can hang em on my wall.
I don't even think artists are going to be meaningfully hurt from this - in fact, I think this is going to increase demand for art, because now the 'patron' can participate in the composition process more meaningfully.
> already have several that I kind of want to analog-paint copies of so I can hang em on my wall.
I've been doing that with VQGAN-CLIP with prompts of things like "line art", "watercolor", "linocut"[1], and "woodcut" - have got a stack of things waiting for some free time to render into the physical world.
[1] "Dark Souls in the style of linocut" makes some really fascinating possibilities.
Ah that's a cool idea. I've been saving a collection of images from dream by wombo and nightcafe around the same theme, and think I'm going to try to render them into a single cohesive acrylic painting. (though my mechanical abilities will probably leave me disappointed)
I hope you're right but the feeling I get is that it's going to increase the _supply_ of art until it becomes meaningless. For instance why should anybody paint an astronaut on a horse anymore? It was a great idea and now it's been done.
In the near future when you look at art you won't know whether it was created directly by a human or by an AI. What will that do to its appreciation?
Why is a human-painting of an astronaut on a horse valuable now?
I view this as detangling composition from the mechanical skills of art production.
Perhaps this will make a new type of art job - AI wrangler - whose job is collaborating with AI to get a good composition, then the composition can be handed over to someone with the mechanical skills to render it in the real world, if humans making the brush strokes is important.
I'd imagine that if we gave these tools to a skilled painter to help in their compositional brainstorming process, they'd find many good ideas they'd like to incorporate into their works in a short period of time. But, I'm not a skilled painter, so could be wrong.
That's a good point. A lot of what I read in sci-fi these days is the job of AI wrangler but the idea of using it as a rapid development platform for cool ideas is also super cool.
> don't even think artists are going to be meaningfully hurt
It might not reduce the number of artists, but it will surely change their composition (pun proudly intended). Only those flexible enough to adapt to the new landscape will be able to support themselves in the new AI art economy.
Good point. Custom art used to be the realm of only royalty, and later only the rich, but soon anybody will be able to afford the fifty cents or whatever it is of computing power that it takes to execute their vague artistic instructions.
Totally, Etsy has led an explosion of put you/your kid/your pet into a work of art type things (many artists create reusable templates, where the custom component can be plugged in, so help with scale). I foresee this trend accelerating as the tooling makes cost of producing this sort of thing cheaper.
Side note: what might make artists a bit relieved is the fact that artifacting is still pretty apparent even in the curated examples. Fine details or even whole figures sometime devolve into that scrambled topography familiar to AI art. Even in more compositionally competent artworks, the "brush strokes" frequently have identifiable blurs at the margins. Text also seems to be gibberish even if aesthetically coherent. Even still, these are all such minor issues that additional photoshop would be both easy and readily doable.
Overall, this is frankly stunning and I'm really excited to see what others come up with. I feel like it's language and composition ability definitely did not disappoint the hype of its press release.