Hacker News new | past | comments | ask | show | jobs | submit login
I recreated famous album covers with DALL-E (lucytalksdata.com)
230 points by lucytalksdata on Aug 20, 2022 | hide | past | favorite | 128 comments



DALL-E is still highly probabilistic in its judgement. For instance, in this article, it keeps putting "fire" in the the background on something that is likely to be on fire, rather than lighting up the person.

I have a similar experience. In my own experiment, I can't get DALL-E to turn off the street lamp at a bus stop in the darkness. I've tried "no light", "broken street lamp", etc.; no use. Any mention of "street lamp", the scene will include a working street lamp.

It's just more probable that a scene with a lamp in the darkness must have that lamp providing light, and this is something that DALL-E will not break out of.


I have experienced violent or harmful settings to be avoided by DALL-E. E.g. setting a person on fire. Same with drowning - seems to be impossible/hard to generate


Violent images likely have not been part of the training data for obvious reasons.


And this likely also explains why the OP had a hard time generating pictures of naked babies.


I asked for "The Walking Dead directed by x" and got a content violation, I guess because my prompt included "dead".


I got a content violation warning when I tried to get it to generate a facade… too close to “facial” I guess.


It didn't get "swimming to a dollar" right either. I think it doesn't "understand" spatial descriptions unless it happens to find an image match the description.


It definitely struggles with relationships between objects, often confusing them instead (e.g. printing the baby on the dollar bill instead of swimming to it in this case)


I gave a prompt about a kid reading the Harry Potter book in the bed. It generated a kid wearing Harry Potter glasses reading a book. Pretty close, but also quite different from what I meant.


Hmm, I haven’t tried DALL-E yet but Midjourney mentions that negative prompting dont tend to work well. See here: https://midjourney.gitbook.io/docs/resource-links/guide-to-p...

They got a solution for that, which is using their —no argument. https://midjourney.gitbook.io/docs/imagine-parameters#prompt...

I haven’t checked if DALL-E has that option too.

Otherwise, you could try other variations like:

street light, light pole, lamp pole, lamppost, street lamp, light standard, or lamp standard

I copied that from Wikipedia :)

Best of luck!


I think DALL-E might be programmed to not depict violence, which is why it doesn't "like" rendering humans on fire.


How long until an unrestricted DALL-E -level model emerges? Seems only a matter of time.


Stable Diffusion is what you're looking for. It should be released to the public quite soon actually.


Yep, here’s interview with the founder of Stable AI, which includes their intent to release Stable Diffusion and be more open than OpenAI:

https://m.youtube.com/watch?v=YQ2QtKcK2dA


And based on the creator's tweets, it's likely to be released tomorrow: https://twitter.com/EMostaque/status/1561314736742060033


The person is on fire. On top of the fire in the image.


I agree, the picture of the man is literally on the picture of the fire. Might have said one man is engulfed in flames, is immolated, alight, or other different phrasings.


Clearly it will have trouble making Rene Magritte paintings along those lines, too :)


Is this "confirmed fixeD"/different in DALL-E 2?


Worth noting that DALL-E automatically “rejects attempts to create the likeness of any public figures, including celebrities". So, you wouldn't be able to get an image that included the 4 Liverpudlians. It does allow you to create fake faces. Might be fun to try and recreate Miles Davis Tutu, Aladdin Sane, Piano Man.


It will yell at you and threaten your account with termination if you try to create anything based on a living person's face, from what I can tell.


Yep. I once tried to create a cartoon dinosaur with hair in the “style” of an ex President (yellow and combed forward), and was warned with a potential ban.


My experience was that if you name a celebrity (and the request isn't blocked) it quite often generated something that has the same general vibe of the target, while also looking entirely unlike them.

It reminds me of how TV shows often have a president that resembles the current president in superficial ways, while being distinct enough that they won't get sued.

I'd be interested to know why this happens.


There’s a high standard of harm for libel of public figures — knowingly false plus actual malice, iirc. Seems more likely it’s just bad at these details. How would you test it, since it’s not going to have sufficient or reliable source data for ordinary people?

ETA: OK, well, based on the following comments, it has a prohibition on living people, but you can’t libel the dead. So it either is bad at faces or it has a prohibition there too. The article would have mentioned if DALL-E said it wouldn’t render Lennon or Harrison. QED, bad at faces?


Man, after seeing Stable Diffusion's output, DALL-E's looks just janky. Like watching a propeller plane after seeing a jet.

Crazy how fast the tech is moving.


DALL-E is capable of very high quality photorealistic images with the right prompts.

Here's one I made: https://imgur.com/yAzKkHb

“High detail, macro portrait photo, a handsome Australian man with a strong jaw line, blue eyes and brown hair, smiles at the camera, set in an outdoor pub at golden hour, shot using a ZEISS Supreme Prime lens”


This particular prompt form, including the lens type, is from this reddit thread:

https://reddit.com/r/dalle2/comments/wsi97q/_/ikyjqhh/?conte...

> High detail, macro portrait photo, a [physical descriptor, regional identity, etc] man/woman with [eye color] eyes and [hair color] hair, smiles at the camera, set in a [field/dimly lit room/whatever] at golden hour, shot using a ZEISS Supreme Prime lens

The suggestions in that thread are quite effective, particularly the notion of reducing synthetic ‘beauty’ for a more human appearance.


> shot using a ZEISS Supreme Prime lens

that seems too specific. You wouldn't even request that in real life, unless you were trying to be ironically pretentious.


DALLE is trained on labelled data, if a photographer uses these terms to describe their work, using these in the prompts gently herds the image generation towards that kind of media.


"shot using a ZEISS lens, by a hipster with an ironic mustache"


Supreme Primes are modern, $20,000+ PL mount cine lenses, so probably not popular even among trust fund hipsters for shooting stills.

To be fair, if I had the funds, I'd certainly consider it, though I consider myself slightly mental in my choice of photography equipment.


After getting access to the beta, combined with all these HN posts -- I've determined DallE2 is neat but no where as great as the initial samples made me believe.


It is actually incredibly capable but if you're looking for photorealistic images of people, it needs very specific directions. I learned a lot from this person creating AI portrait photography: https://old.reddit.com/r/dalle2/comments/wsi97q/some_of_my_p...


An upvote for whoever can give me a prompt to generate an image of someone who's been massaged so much their body has been flattened, as if they were made of dough or jelly or something.

I spent ages on this earlier getting nowhere. I'm starting to think DALL-E is better if you don't really know what you want and you're just fishing for ideas.


Ok you owe me $3! This is a really hard prompt, and only got close-ish with inpainting. Got the base figure with "massaged relaxed flattened person, flat, flat, flat, flat, claymation", then finally got it to add a not-too-terrifying face with "photograph of smiling white woman laying on the ground, promotional photography". Final tweaks to erase some artifacts (it really didn't want to believe the figure on the left was the referenced woman) was "photograph of a wooden floor with a white mat and small plants, overhead shot".

DALLE is hard! Curious to see if I can be beat.

https://imgur.com/a/tuyGjxp


Ha ha that's much closer than I got, well done! Yeah it's difficult. I didn't think of flat, flat, flat.


What are the results when you describe "a person without bones receiving a massage"?


> an image of someone who's been massaged so much their body has been flattened, as if they were made of dough or jelly

do you want a realistic looking one? 3d rendered? what do you have in mind exactly?


Anything really. I tried cartoons, digital art, watercolours, pixar style. None worked.


also what is the PCs budget?


I didn't know that I needed this.


Have anyone given a prompt to Dall-e of designing a company website and included “make it pop!”?

Maybe the AI will finally get what designers always complained about annoying clients.


Prompt:

> Create a website design for a company that sells propane and propane accessories. The name of the company is Strickland Propane, a local propane dealership. Make it pop.

Results:

* https://i.imgur.com/Jv7NJEN.png

* https://i.imgur.com/5Uiyg1R.png

* https://i.imgur.com/LL1DC11.png

* https://i.imgur.com/buv5BvS.png

So there you have it :p


Another one.

Prompt:

> Create a website design for ACME Corporation, a company which produces a wide array of products that are dangerous, unreliable or preposterous. Include customer quotes from a dissatisfied Wile E. Coyote prominently on the page. Make it pop.

Results:

* https://i.imgur.com/WK3QBj9.png

* https://i.imgur.com/Bghgzjt.png

* https://i.imgur.com/XLyYx76.png

* https://i.imgur.com/QTSyFTc.png


More still.

Prompt:

> A website design concept for Apple Inc, in Neumorphism design style, showcasing the next generation iPhone Pro Max. Make it pop.

Results:

* https://i.imgur.com/lU4Mf0X.png

* https://i.imgur.com/ROMJEjT.png

* https://i.imgur.com/U4sDheM.png

* https://i.imgur.com/3bRXCs1.png

Tbh this is probably the worst one yet.. By that I mean, the results for this prompt are the least reflective of the text in the prompt. Sure it got the iPhones, but it doesn't really feel like a website design, and it doesn't feel like Neumorphism design style.


More.

Prompt:

> Website design for Weyland-Yutani Corporation. The Company was founded in 2099 by the merger of Weyland Corp and Yutani Corporation. Weyland-Yutani is primarily a technology supplier, manufacturing synthetics, starships and computers for a wide range of industrial and commercial clients, making them a household name. The website design for The Company is mobile first. Make it pop.

Results:

* https://i.imgur.com/JyhYK5b.png

* https://i.imgur.com/J5aPXCH.png

* https://i.imgur.com/ksqrW09.png

* https://i.imgur.com/uXBcGa5.png


Thanks a lot!

Hopefully I am not asking too much, but can you run the same prompts without the “make it pop”? So we can learn what is “pop”


Dall-E 2 will give varying results even for the same prompt. If we want to learn what is "pop" we would have to ask it very many times.

But I will give it a try.

First I do three more runs where I give it the same prompt that I gave it initially;

> Create a website design for a company that sells propane and propane accessories. The name of the company is Strickland Propane, a local propane dealership. Make it pop.

Results.

Run 1:

* https://i.imgur.com/qqOJHXj.png

* https://i.imgur.com/fao6NMs.png

* https://i.imgur.com/yH2Lned.png

* https://i.imgur.com/SiH3hmU.png

Run 2:

* https://i.imgur.com/1rppjpR.png

* https://i.imgur.com/6KiITZn.png

* https://i.imgur.com/uQTei8M.png

* https://i.imgur.com/ITacN1P.png

Run 3:

* https://i.imgur.com/KAm6Bc9.png

* https://i.imgur.com/LBBdotk.png

* https://i.imgur.com/ZPD32oV.png

* https://i.imgur.com/UkdSiZM.png

And then I do three runs where I give it same prompt, but this time without "make it pop";

> Create a website design for a company that sells propane and propane accessories. The name of the company is Strickland Propane, a local propane dealership.

Results.

Run 1:

* https://i.imgur.com/MYIGCcu.png

* https://i.imgur.com/Iw9JjVf.png

* https://i.imgur.com/aRaD4ew.png

* https://i.imgur.com/B7lOV3u.png

Run 2:

* https://i.imgur.com/9J2T4Mq.png

* https://i.imgur.com/058wHEH.png

* https://i.imgur.com/KVdhYVL.png

* https://i.imgur.com/sT2U3DM.png

Run 3:

* https://i.imgur.com/ub1EpvY.png

* https://i.imgur.com/6TkAI1U.png

* https://i.imgur.com/XnWGaUm.png

* https://i.imgur.com/IL03niN.png

Tbh I don't think we can draw much of a conclusion from this..


I agree, Dall-e doesn't seem to understand what "pop" is either. Thanks for the effort in indulging me, though! I appreciate!


STONIA PRONGAND! I would not buy a propane container which looked like that.


There are some quite intriguing answers to your query. We now have a thing that generates "art" in response to an input.

I suggest learning how to use a few graphics packages instead when you have a need. eg, this evening my wife wanted to print a birthday card for a dog that she is boarding. I fired up Inkscape and discovered a bug wrt handling an imported photo, scaling it and cropping it with a mask. lol! I was able to export it to a .png and print from something else.

Anyway, I was a little unfortunate and IT is still rather shakey. However, discussions about DALL-E are way more interesting than actually using it yourself. You see some seriously intelligent solutions to getting a desired image that is surely a form of programming. DALL-E does not even have a real natural language pre-processor because it is hamstrung to dump pejorative terms etc. This means that I can draw the classic, dreadful plan of the cargo hold of a slaver ship from the 17/18C and its cargo but I doubt DALL-E can.

The q&a about this thing are way more complex and interesting than its actual output! A is right, I it isn't. DALL-E is quite interesting but it really isn't intelligent. The intelligent bit is getting the input right for the desired output. Perhaps another model could be developed for that.


This isn't really how DALLE works AFAIK but a very fun idea nonetheless. Here's a quick more simplified experiment than the great work above: "website design mockup.", with and without "Make it pop!"

https://imgur.com/a/fy2Uq4x


I love seeing people experiment with this technology. You can feel we’re on the cusp of something great - whatever it is, we’re just not quite there yet.


> The question is, when the music blows up and the artwork becomes a signature, like the Rolling Stones' Tongue & Lips, who will own the copyright?

That’s what trademarks are for.


If they had bothered to read the license agreement they would know whoever generates the art owns the copyright. Since it's a pay-for action the copyright is owned by the payee.

So really they generated these and never bothered to do the research of their own question.


I have, like the article author presumably also does, a profund doubt as to whether generated works of this kind can be free of any copyright as long as the tool used is itself created using myriads of copyrighted works (as training data). I certainly do not trust the claims of the tool creators; they have all the incentive to ignore any copyright problems in order to get a tool which is usable.

And, as the article states:

> But seriously, how creative and original can you be with something that is trained on the works of millions of other creators?

> To me, it is unclear whether you can actually call these works your 'own' at all, because there's always someone else's touch on it.

> […] users of DALL-E will also never be sure whether they are generating something that is 'theirs' or just a knockoff of someone else's work.


OK, so I give you license to use this URL I just generated to generate your own stuff.

It's pay for action (send me a penny if you find anything worthwhile), and the copyright is owned by the payee:

https://images.google.com/


The URL has not been "generated" in the same sense. You are retrieving an existing string. The images from google are not "generated" in the same sense, they are indexed from google's search algorithm.

The generative models, specifically for DALLE here, compute pixel unique images. You might say these models index a subset of an extremely high dimensional space (pixel count * RGB color values) using a query. Traditional search engines build an index from nothing and then use a search query to find the best matches in a more discrete space.


If I made a website where you could type text and get images, and I said that your held the copyright to any images you got, could you safely act on that assumption? What if my web site was simply a proxy for Google image search?


How do you know that the album covers are not part of the corpus of images that DALL-E was trained on in the first place?


Interestingly related, I just used AI image generation to create my EP cover.. first I tried running luciddrains dall e 2 PyTorch implementation using the prompt “death by Tetris EP album cover 2022” unfortunately I am using a Mac Pro so the gpu was not able to work. Then I tried imagen PyTorch implementation and used same keyword. This time it was working with the CPU unfortunately 2 days in we had a power outage so I had something but nothing complete. So I fed the generated image into the google dream generator and got my album cover!

https://willsimpson.hearnow.com/


There are a lot of articles focusing on how close does DALL-E match some prompt, but I wonder if this is a suboptimal way to explore the medium.

What if you can get a lot more out of it by embracing the unexpected responses. Can it be a tool for exploring lateral thinking? You provide a prompt computer responds with images that are a prompt to human. A baby swiming next to a dolar bill outputs a distorted person face inside a dolar bill with some baby features, could be the start to a rabbit hole of prompts and images where you'll end up with something completly different than your initial expectations.


It's interesting that the prompts that would do badly in a Google image search also seem to be the ones that make poor prompts. Basically, it seems that rather than describing a scene, you have to try to give an analogy for some image(s) that it might have in its training set - which is why, I believe, "banana in the style of Andy Warhol" produces a much higher quality result than "Outline of prism on a black background in the middle of scene splits a beam of light coming from the left side into rainbow on the right side".


Although AI artists will destroy a lot of jobs, it will also create demand for new jobs for people who specialize in “paint overs” – taking a high concept output created by AI artists and touching it up to perfection.

Or perhaps even beyond just a paint over, and into the realm of recreating an entire AI artwork but with a human touch to get details just right.

Looking forward to it.


I am already doing exactly that, and am getting paid for it.

I am a logo artist and I sell pre-made logo designs. Before the current AI services I had to come up with visual ideas by myself, like a caveman. Now I use the AI to generate a bunch of sketches and blurry ideas, and then use my graphic design experience to polish them up to a usable level. Here's how it looks. https://imgur.com/a/DKTsKdC

I am absolutely sure that a lot of people are doing the same right now, just keeping quiet about it.


Thank you for that perspective. The linked work is clearly the work of a skilled professional.

I am intrigued by the use of AI as a form of creativity assist. As someone without any talent for this, the left pictures are useless for me, as I don't know how to take them into something like the pictures on the right. The point of a sketch is to show them to a customer, but if you would show these sketches to me, I wouldn't know which one would turn out great and which one wouldn't.

Given that, do you feel that the generated sketches are useful as a base sketch? I mean, you could probably have used any of the existing NFL teams logos and as inspiration, instead of letting the software remix them for you.


Using existing logos for inspiration (like the NFL ones) is a tightrope because when you get inspired too much you get into legal problems. No one wants a plagiarised or copied logo. Every small borrowed idea, color, composition or a detail is a potential problem. And if you share too much of them - you have an unusable logo. No such problem with AI logos. You can be as much inspired as you want by it, up to a straight copy if you find it good enough.


Just tested it after looking through your album, and DALL-E seems pretty good for getting a logo concept generated.

This is what I got for the prompt "honey badger logo for an NFL sports team"

https://imgur.com/a/i2nnii2

Dunno what's going on with that last guy though, perhaps he's had one too many concussions on the field...


Yes, Dall-e 2 is definitely much better at it than other current AI services. But, most people still don't have access to it. Like me for example. I've been in the waitlist from the start, got a cool portfolio - still didn't get in. Maybe I'm just unlucky, or maybe the problem is that I'm in Ukraine. For some reason OpenAi GPT-3 Playground is restricted in our country, so I expect that other OpenAi products might be closed to me for the same reason.

I've seen many examples of Dalle-2 logos similar to yours. It seems like it got at least to a step "good enough to be usable, but with quality on the cheaper side". Which is super impressive and already puts a lot of designers out of work. I sell cheap stock logos on the side, I would feel that loss of income. But right at this moment there's definitely still a quality ceiling for it and the AI didn't put me out of the job completely. If you want a good and expensive-looking logo, you still need a professional human.

I won't be surprised when that changes too and the AI's get to my level. They are already so much closer than I could ever imagine. But at this moment they aren't there.


Here's another spin on this:

Imagine you're a software developer. In the near future, your manager wants a feature implemented in your company's app. He throws together a short mail with requirements and sends it to the prompt engineering department.

The prompt engineers fix a few typos, clean up the grammar and pepper a few secret sauce keywords around like "in the style of firefox", "in the style of kde". This get thrown into Microsoft Copilot 3.0 that barfs out a bunch of code.

Copilot's code has inconsistent indentation, three different method naming conventions and some variables named in a foreign language. It runs, but crashes if you tap on the lower left corner of the screen and allows you to drag the order quantity below 0. But that's ok, it's why we employ software engineers like you in our company. You will use your years of coding experience to touch up AI code to perfection. Better get the details just right until Monday all-hands!

Still looking forward to it?


Honestly, kind of yeah.

Something I've come to realise is that I'm much better at fixing other people's code than coding from scratch.

I could definitely work with that kind of tool.


That sounds better than my current job where I try to clean up other people's shitty code with atrocious dynamic typing, until the type checker is happy with it.


sounds to me like I still get paid handsomely


> taking a high concept output created by AI artists and touching it up to perfection.

When you put it like that, it sounds like a nightmare up side down world. It's not AI that's the tool for enhancing human creativity. It's humans that are the AI's tool, cleaning up the edge cases the AI artist can't handle (yet). It destroys creative jobs that give joy to people and creates assembly line jobs for them to slog in.


The humans are also the people that come up with the concept in the first place...

Also sounds like an upside down use of the tool, since AI generated art really isn't that great at composition once you've got over the magic of it being able to respond to the prompt at all (but is much better at texture and filling in boring details), and the current state of the art AI tools can produce images which conform to a human guideline sketch...


The AI is enabling the creativity of laypeople. That's still an enhancement, even if they don't have the technique to make a polished product


The purpose of jobs has never been to give people joy, but to extract value from their labor as quickly and efficiently as possible. Getting any kind of emotional satisfaction from one's work is a privilege which arguably points to an inefficiency in the market, as that energy is wasted which could better be put to productivity.

Artists, programmers and everyone else will have to find their joy somewhere other than than selling themselves to a corporation, once AI driven markets optimize away any room for "joy" and the like, and that's going to be one of the few good things about automation. The sooner we break people from the Puritan delusion that work defines a person's meaning and the value of their expression, the sooner we can once again decouple culture from the machinery of capitalism.


The new copyright washing industry is nearly upon us.


This is the most underrated/prescient comment I've seen on hn. Once prompt engineering become a mature field this is going to be a serious issue.

Finally, the crossover of creative writing x cs... for graphic design? I can't wait to watch the lawyers recoil. ::Prepares Popcorn::


As a professional artist, this sounds like hell.


DALL E works really well if you are specific enough. When you don't get the intended result, it helps to identify the element which wasn't generated properly and then improve your description of the same.

"Two men, one of whom is on fire, shaking hands on an industrial lot." can be rewritten as, "Two men, shaking hands, standing on an industrial lot. Person on the right is on fire. Camera is 30 metres away."

You can go into more specifics of the framing and the angle from which you want the picture to be take. By default, DALL E will give you the most realistic generations to your prompts unless you mention "digital art" or a particular art style. I have gotten the best results when generating art instead of photos.


I haven't gotten to try it for myself, but I've read a few of these blogs that take you through generating examples or even look-alikes to older art pieces.

It surprisingly reminds me a lot of when I traveled to Japan without knowing really any Japanese. I needed to communicate not only with friends who don't know much English either, but also other people (like restaurant wait staff, train station staff, etc.).

I used Google Translate often, but many times I or my friend(s) (or the other people) would need to re-write our statements a few times until the translation result clicked well enough in each other's languages to be understandable.


Is the issue with faces a deliberate choice by the devs?


In the case of celebrities yep. It can generate original photorealistic (or whatever style) faces, but they won’t let you generate the faces of real public figures AFAIK.


I've been recreating the 50 worst heavy metal album art using AI as well, currently at 30. Recently I've found Stable Diffusion plus DALL-E inpainting to be a good combination.

https://twitter.com/P_Galbraith/status/1560469019605344256


Very cool. But it just goes to show the impact of human creativity. The conceptual aspect.


No Smell the Glove cover, this is a black day for rock and roll!


If you gave those same instructions to humans I’m sure the output would be just as varied. I’d be interested to see a comparison between dall-e and humans.


I'd love to see what it had come up with if simply prompted for "Album cover for Nevermind by Nirvana"


I wonder, how much energy is being burned for these kinds of experiments.


probably less than the amount of energy being burned by people browsing hn.


It's going to leave all those artists without a job, you just wait!!


Went to use my invite, and OpenAI demands your PHONE NUMBER.

No excuse for it. Screw that.


Look, it's trained on these images.

It's really great and cool and all - but it's retrieving things that it was trained on.

Show me something original it did.


None of the AI generators retrieve things they were trained on, they don't work that way. Everything is original. However our definition of "original" might vary a bit, but so it will vary for any work of art any human artist do as well, as they are also trained on the same images. In the end, a lawsuit and a courtroom might have to decide if by chance someone or some AI creates a picture used commercially that seems similar to someone else's trademark or copyright.

Most of the images I've generated using Dalle 2 feels completely original. Just have a look at the reddit r/dalle2 and I'm pretty sure you'll also decide they're "original works".


Here's what I got with the prompt "something original":

https://imgur.com/a/RsQ2q1d

So I guess "something original" means "everything you've seen before on Instagram"


It’s hard when it was trained on everything pretty much. That’s the same problem as with GPT3. In my mind it’s still brute forcing a solution but instead of endless computation it’s endless examples


Is this "bruteforcing" really different from what our brains do? We see thousands, millions of little things (examples) every day. Then we combine what we've seen into something new. Probably the only difference is that DALLE's training was done once while our brains are trained every day for 80+ years.


I would say yes it is. For humans there is plenty in the world that is completely novel, and we can (and have to) reason from abstractions or first principles. Being exposed to millions of every day things doesn’t give us intrinsic knowledge of writing fiction, algebra, artistic expression etc. instead, we have to apply abstract knowledge and reason. This may well be possible for an AI system, but I haven’t seen GPT3 or DALLE do this.

It’s hard to test this on DALLE or GPT3 as the internet is a summation of all known knowledge in effect. True unbounded Original thought is hard, and given they have seen everything before, it’s impossible to know if it’s original or just seen it previously. It would be interesting in a decades time to see how DALLE or GPT3 deals with novel ideas that it was never exposed to.


Is i do smth with DALL-E auto top hacker news post i saw like 20 post like that in past 2 weeks.


DALL-E still seems very useless. Reminds me of the hype of Cardano.


I wonder how long the novelty of DALL-E will persist. HN seems to upvote anything titled "I did X with DALL-E". This is a fun post, but it's not that interesting or surprising. Still worth a look don't get me wrong, but personally didn't learn anything new from it. (eg recreating the famous pink Floyd cover with "Outline of prism on a black background in the middle of scene splits a beam of light coming from the left side into rainbow on the right side" unsurprisingly worked somewhat well).


"DALL-E" in titles is about to be replaced with "Stable Diffusion." The beta website is already live but the interesting part will be specialized fine-tuned models based on public weights. There should be more technical experimentation since Stable Diffusion weights are only a few GBs and inference can be run on any recent GPU. There might also be more controversies because it can create more uncensored images.

Somebody posted a nice comparison between DALL-E 2, Mid Journey, and Stable Diffusion: https://twitter.com/fabianstelzer/status/1561019187451011074.


I dunno, I don't see a lot of examples on that page where Stable Diffusion outperformed DALL-E 2.


I'm not saying that it does, just that like you definitely hear more about Blender than about Maya on Hacker News, you will hear more about an open source model than a commercial one.


I actually thought there were some interesting things revealed here. Particularly interesting to me were the Nevermind and Abbey Road results.

Nevermind because it showed this weakness in the model in understanding what a dollar bill is. The most novel result being the image where the baby's visage appears inside the dollar.

Regarding Abbey Road, I found it interesting that the model's concept of a public person spans their lifetime evidenced by the images where the contemporary images are used. Also interesting to me is the model's weakness in understanding specific people.

Then again though, I haven't been clicking on every DALL-E post so maybe this is old news.


For me, what was interesting to see was that it didn't manage very well with the Abbey Road question.

I had imagined that some parts of the training data would consist of the actual image, and it would find a good match for it somewhere deep into it's artificial consciousness.


It has no concept of The Beatles or Abbey Road or zebra crossings or perspective or creative composition in photography, so effectively it just mashes up some results from an image search.

If it created something very close to the original it would be overfitting. So it flails around rather randomly in photos-of-Beatles+Zebra+Abbey Road space. The results are novel, but they're also monstrous, distorted, and unartistic.


> unartistic

Says who? Maybe you don’t like the style, maybe it is derivative, but it is definitely artistic, as you allude to by saying it is novel. Monstrous and distorted can be a style.

“mathematical art, 1924, litography, abstract generative art” generates https://pbs.twimg.com/media/FanUkREXEAEH6su.jpg — Can you pick the influences? I can’t.

"low poly game asset, Cthulhu monster, 2000 video game, isometric view" generates https://pbs.twimg.com/media/FanTf3IXgAETece.jpg

I thought https://pbs.twimg.com/media/FanNq6sXkAENyU5.jpg was really interesting, because the LEGO logo is not a 100% faithful reproduction even though the LEGO logo is so ubiquitous - showing that the images are generated.

From a longer tweet thread by @fabianstelzer comparing DALL-E 2 vs Midjourney vs StableDiffusion: https://threadreaderapp.com/thread/1561019187451011074.html


Maybe it would have done better if the prompt had been "Four men - George Harrison, Paul McCartney, Ringo Starr and John Lennon - striding across a zebra crossing situated on Abbey Road, outside EMI studios in London."


It will probably get boring the same time HNers stop confusing “hmm this isn’t that interesting guys” as notable commentary that needs to be shared.


Well then it will never get boring, will it... Fair enough, it's just been striking to see how many of these sort of posts have been popping up. Like I said, it's still fun and worth a look.

Perhaps as a blogger I am extra salty when relatively low effort stuff gets upvoted over things that take a lot of work to write (in some cases, those being things I have written). But hey, that's life.


Low effort stuff can (and often is) better and/or more entertaining than high effort stuff.


Can't argue with that!


DALL-E generate us some pics of unhappy HN curmudgeons


Here ya go:

https://imgur.com/a/kgQtUMf

I don’t know why it’s got the over-18 warning. It’s 100% SFW.


The top left one is clearly an ape in a hoodie, right on point.


To me the most interesting thing about the article was actually just how bad the results are. It's interesting that you picked the Dark side of the moon one, as to me that seemed by far the worse one - none of the pictures really resembled the simple geometry of the original. While I understand why recreating the Nevermind cover was difficult, it was surprising that recreating such a simple geometric pattern failed so spectacularly.

The only cover that really worked from my point of view was the Velvet Underground one, and perhaps the Rolling Stones one. Abbey Road came closer than what I thought it would, but was pretty bad ultimately, and the other three really had nothing usable.


Just like images after text, pretty sure once the novelty of images ends we'll move to animations, music, movies, computer games, and so on.


For the past few years, I’ve long considered Hollywood failed adaptations to be a non-issue because so long as industrial civilization exists, media companies are going to squeeze what they can out of IPs. So while Game of Thrones might have ended quite poorly, I figure we’re only a few decades away from a different rights-holder giving it another go.

Thus, I look forward to the AI-generated versions of famous works that deepfake the original cast into speaking (hopefully) better-written dialogue. Imagine when this technology is widespread, fanfic authors rendering their interpretation of works with the descendants of DALL-E. Everyone gets the dream adaptations and sequels and finales they want.


The trouble with this, for me, is that wouldn’t such a world necessarily mean the end of new actors (not to mention new worlds and stories)? After all, why hire unknowns when you can stack the cast with the A-listers of all time?

And if everyone stacks the cast thusly, there are no opportunities for new actors to work with old and be mentored + opportunity to simply works since every bit of work is what adds up to make a journeyperson.


Unless copyright changes a lot, the A-listers are going to expect a lot of money for a film with their name on the poster and their carefully curated likeness throughout, whether they have to turn up to shoot it or not.


Totally, but their estates won’t be as precious (as we’re seeing with holograms already). And once a certain critical mass is built up, it’s built up — and it’s be foolish not to use it.


Still gotta populate the training data


I get that, but I find it hard to see why one would look for NEW training data from actors who will never have the opportunity to become as seasoned as their forebears, when you have 100+ years of filmed historical data (among other formats) from which to pull your players?


Generative art and by extension DALL-E has a very fast attention decay imo. We can't avoid noticing patterns and getting bored by them. This makes art and music fun because the things that stick and stay interesting JUST DO for some unknown reason.


I love this kind of bad humor of HN.


Then you’re gonna loooove these computer-generated “jokes”…

http://joking.abdn.ac.uk/jokebook.shtml




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: