A few of my favorite results: https://m.imgur.com/a/bPETAG6 https://m.imgur.com/r9YdYRE https://youtu.be/tp2IuT-cgHc
Oh man, this video is crazy.. awesome.. crazy - not sure.
The images are creepy, creepy as hell.
It just needs to be trained
Compare 2:33 in the above fictional video to this real one:
These are previews of future wars, much like the wars in the decades before WW1 previewed the utter slaughter that was to come thanks to the new technological era of mechanized warfare.
Much more accessible to regular people.
Ability to be anonymous and hide in a crowd of drones.
Really? Their name says OpenAI.
Anyway, how do we know their examples were not cherry-picked? Do they have an online demo?
They have everything to lose by lying. If they say that these examples are not cherry-picked, then we have no reason (a priori) to doubt them.
On a side-note, the fact that you could doubt the results are real is telling: each of their compute-heavy experiments shakes our belief and further reinforces their reputation.
The singularity approaches.
This bothers me. More than it should.
Imagemagick used to do the same thing - but they judiciously renamed their `convert` executable to `magick`. Still not perfect, but an improvement.
If your package introduces one command-line executable, it should always be called the same as your package.
# Who owns binary?
pacman -Qo /bin/ls
# What files come with package?
pacman -Ql coreutils | grep ......
ImageMagick is also far more than the command line utility that it provides to interface with it's library. Say that it had been written by a third party, but did the same thing using the ImageMagick library as a dependency, would it then be fine for it to have a different name?
I've written a few follow-ups to this, with some public notebooks as well that produce qualitatively different results. If anyone is interested, it should be pretty easy to find the BigSleep method, which steers BigGAN in a very similar way to this, as well as the Aleph notebooks, which use the DALL-E decoder or Taming Transformers VQGAN to generate images with CLIP, depending on the version of the notebook.
Aleph2Image notebook https://colab.research.google.com/drive/1Q-TbYvASMPRMXCOQjkx...
What I mean is we have style transfer in the domain of text, and we can style transfer with images. And we can generate images from text. Can we style transfer from image to text or vice versa? Can prose be rewritten in a manner that, in some sense, adheres to aesthetic principles of impressionist painting?
Presumably there would be some kind of informational representation of text style discernable to an image generation system. And just like an artistic style can be extracted from a painting and transposed to a photograph, perhaps an interpretation of textual style could be applied to a photograph despite them being different mediums.
What would that even look like? I don't know, but I find the possibilities fascinating.
The temptation, I think, is to make a first pass at answering this question in a frustrating, cartoonishly shallow way. And I think systems will possibly be developed that just go ahead and do it before people are culturally ready to understand it in a non-frivolous way. Everyone needs to get those reactions out of their system, I guess, but there's a more nuanced possibility here that might allow clashing of dissimilar categories in ways never previously contemplated.
When you make this please name it SynesthesAI
Each of these links has about 9 images and the prompts that made them. Sometimes the image does not look like an animal right off the bat, then it seems like you asked for something the network had to say.
That Syd Mead style ... mwa
Were you able to complete the 1050 iterations, and how long did it take?
- the model usually locks in within 200-300 iterations, so if you don’t like the result by then, retry
- in fact, you can tell if the model is off to a good start within 25-50 iterations and I encourage you to cherry-pick runs early and often; don’t be afraid to restart
- time to render depends on which GPU you get from colab, but I usually run the renders for 10 minutes a pop. About 1-2 minutes if I run them on a 3090 locally
- the prompt plays a big role in the quality of the result; “A painting of a dog playing fetch” will usually turn out better than “dog playing fetch”
- lucidrains/bigsleep produces better results generally than lucidrains/deepdaze (this is my subjective preference)
- the colabs linked to from the big-sleep GitHub repo produce poorer results than running them as a python package locally (this one might honestly be placebo)
However, it can get taken very literally, in that you might get a picture that features a frame around the painting.
Another thing that comes to mind as a corollary is that the AI seems to like being constrained in its outputs. So adding something like “in the style of Monet” to the end of the prompt will return much more coherent results.
The Deep-Daze author's most popular T2I mashup: https://github.com/lucidrains/big-sleep
One of the models is lucidrains’ implementation of the excellent Big Sleep model by Ryan Murdock. The other models are mostly based on the work of Federico Galatolo.
The queue is temporarily paused as I’m upgrading the hardware to a better GPU, but I encourage you to browse the existing renders or submit your own for when the server is back online.
Honestly, I probably wouldn't, I don't want my mental vagaries of what a Grue looks like to be set by some random image.
.club was on sale for $1.17
An interesting thing about CLIP is that when it doesn't know what something looks like, it instead generates pictures with the search text in them. That's why it confuses "an iPod" with "a piece of paper with iPod written on it".
The results were quite something - https://m.imgur.com/tfWLsSR
And here I was hoping to generate extensive phallic imagery on this most auspicious of nights.
This “CUDA” is Nvidia only if memory serve, correct?
Disclosure: I am the author of this list.
I'm familiar with SIRENs and CLIP, but not immediately obvious how the two are utilised here.
I find that big sleep is also a better model than the one linked here (deep daze), generally.
I’ve generated several hundred images myself and found a few real treasures. Here’s a few of my personal favourites:
“A painting of a murder in the style of Monet” 
“A photo of fellas in Paris” 
“A painting of Thanos wearing the Infinity Gauntlet in the style of Rembrandt” 
I definitely agree that in the general case the examples are underwhelming, but I believe there is a lot of potential here. Personally I’m super excited to unlock the potential of human-guided, AI-assisted creative tooling. Some Colab notebooks let you active explore the latent space of a model to direct the results where you want them to go. As the generate-adjust feedback loop gets tighter we’re gonna see some crazy things.
In fact, you could probably train it with existing visual descriptions from movies.
if only! just spent an hour on fresh installs of debian 10 and ubuntu 20.04 with python 2.7 and 3.6 alternatively - not having it
I understand it has a lot of required packages but please, write a bloody install guide
Does anyone have any similar resources for other forms of media generated via natural language inputs?
Definitely going to update with this!
Then will cause another wave of copycats selling some generated text-to-images as NFTs.
This is why I like HN on a Monday morning.
I couldn’t think that would be possible, very interesting.
What those are, I have no idea.
Is there a way to perform a similar translation with music? For example, if you play in D Minor (the saddest of all keys), is there a way to map the key or some other musical characteristic to a word and have the images be generated with the intermediate being the primary source? Or would the approach be to map images to certain characteristics of music directly?
Even something based on spotify's music labeling api would be super interesting!
Indeed, what I see is an album cover generator.
“A man painting a completely red image” is very much a dadaist collage. The only complaint is that the ‘man’ could be rather more recognizable as such.