I linked to this when DALL·E 3 was originally announced, but perhaps it's more appropriate now.
Last year I generated around 7,000 images using DALL·E 2 and uploaded them to https://generrated.com/
I've been re-running the same prompts using DALL·E 3, although haven't updated the site yet (although I'm planning to). So far I've created 2,000 like-for-like using those prompts.
----------
In the meantime, here are some things I've noticed with DALL·E 3 vs. DALL·E 2:
- the quality is astounding, especially illustrations (vs. photographs) — as I've been looking at the DALL·E 2 images I've constantly felt like the old images look like potatoes now we have DALL·E 3 and Midjourney (even though at the time they seemed stunning)
- you will struggle to get an output that references a specific artist, but it will sometimes offer to make images in the general style of the artist as a compromise
- it can get quite repetitive when you ask it for concepts — if you look at the 'representation of anxiety' images on Generrated, you'll see that there's a huge variety in them, but as I've been running them with DALL·E 3 it seems to prefer certain imagery (in this case, a human heart under stress appears a lot), and the 'discovery of gravity' will include a tree and an apple 80% of the time
- some of the prompts need guidance to get the output you desire — 'iconic logo symbol' works well on DALL·E 2 to create a logo, but with DALL·E 3 will often produce a general image/painting with a logo somewhere in the image (e.g. a NASA logo on an astronaut's suit rather than a logo of an astronaut)
Those are some I can remember off of the top of my head. But it's so much fun to play with!
Despite being trained not to mimic a specific artist, it’s incredible how the last image of anxiety with impressionism style (bottom-left) is close to the Wanderer above the Sea of Fog [0] (which is not even an impressionist painting!). It feels like it still very much rely on the underlying paintings used in the training material if the prompt is not more specific (also confirmed by the apple and tree example).
If you ask it for 4 images with an exact prompt, it'll generate 4 identical (or almost identical) images. Then if you ask it to re-run it with a different seed, it will say that it's done it but it'll still generate the same images.
Question: I’m trying to have DALL-E output renders of websites, but fail every time. I give it style, sections, tone, placement, etc… have any suggestions?
And it's fantastic by the way: it understands way better instructions than midjourney; also can do decent logos.
But I didn't cancel midjourney because it has more options, and is better at producing stunningly beautiful things.
The other comments are right though, the more time passes, the more open ai looks like a platform. But just like the Apple's platform, or twitter's, fb's, ms's, etc., it will adopt the features of the top apps built on it and kill them mercilessly.
It's incredible, but does seem to suffer from moral filters.
I tried to generate some DnD character art, and it generated an absolutely perfect depiction except for the wrong skin color. I tried multiple times to have it change it, but it replied every time with "there were issues generating all the images based on the provided adjustments". Asking it to change the outfit or gender was no problem though.
What's interesting to me is how AI-generated images (not just DALLE but also Midjourney and others) have a specific look and feel. It's typically characterized by high contrast and high saturation. Anyone know why that style is more likely to be the output?
It seems to me that I can get most any look and feel I want. Most of the ones other people post seem to have a very different look and feel to my own.
That said, you can sort of get a default look and feel if you just give a short prompt, and then it will tend toward the ones that are favored by RLHF. I prefer very long prompts.... as long as it will allow.
So if you do "a cool treehouse" you'll get sort of the default look. It will be very different if you say "treehouse, naturally occurring, in an old beautiful tree with branches that are low and spread widely and have lots of character and hanging moss and thick bark and curvy roots and mushrooms on a rocky outcropping from a mountainside. photograph, golden hour, sun through trees, damp from rain. Treehouse is part of tree, with fractal forms and live shaped wood and stone and stained glass and glowiness. art nouveau, gorgeous colors and fantasy design"
It's funny that the same people who complain that AI is "cheating" and uncreative, often are the ones who go to so little effort to get good results. It's not like it takes any arcane knowledge to get good images, but if you can use some imagination and string a lot of descriptive words together you can get so much better results.
One thing i found neat about ChatGPT DALLE-3 is that it's usually generated via prompts like yours. "a cool treehouse" actually used "Oil painting of an old-fashioned treehouse in a serene autumn setting. The tree's golden leaves provide a canopy for the cozy wooden hut with a thatched roof. A ladder leans against the tree, and children play below, collecting fallen leaves." as the prompt.
Furthermore, i found it's very easy to tweak the general design. A cool image, copy the prompt, tell it to make it more X with Y and Z, and you start producing a really neat prompt.
So far as someone with a lack of mind-image but who enjoys creating computer graphics (3d, animation mostly) it's proven as a really neat test bed. Hallucinations are almost a feature in this to me, granted these aren't strictly that - just saying i find it's RNG flavor over my prompt is really nice for exploring.
It’s just lack of aesthetic talent, while competent images they’re very tacky looking. It’s not fundamental to AI because if you know what you’re doing you can push it to interesting aesthetics. Although Midjourney does some things behind the scenes to push everything towards that look you’re talking about.
Midjourney has been collecting human preference data for about a year now - every time you generate an image there and click on the one you want to enlarge you're providing a signal as to which image a human being preferred.
So my hunch is that humans prefer high contrast and high saturation images!
Reminds me of The New Coke, a flavor designed by having people provide very short term feedback on different flavor combinations.
Most feedback processes for generative models are based on asking the user to draw immediate sentiments rather than having them provide deeper art and style critiques
Try photography terms, as they're less likely to have uploaded HDR'd content under postings that include the technical specs of what they shot with, e.g. "24mm f/1.2" or "70mm f/2.8" or "D820" and such.
50mm (optimally with "Nikon" or "Canon" or similar) or 35mm will probably get you the most "natural" looking FoV. (Adding "lower" # fstops will get you a lot more depth-of-field/bokeh.)
The old adage is "f/8 and be there" so f/8 might get the most natural images if you want to specify an fstop(?)
This is where things like img2img in StableDiffusion really came in handy as you could simply apply an entire prompt like a photoshop filter..
There's a lot of variation between monitors so high contrast/saturation is far more likely to look the same across monitors. Digital art in general has moved towards this style.
I've got two LG 27" 4k monitors, same model number but produced several years apart, and while one monitor can easily show light grays like #EEE, it just looks white in the other.
The CFG is a trick used to trade variety for apparent quality, applied to images it causes high saturation and contrast. It's used because the training data set is very noisy.
The paper: https://arxiv.org/abs/2207.12598, of course using CFG change the sample distribution from the training distribution giving it that specific look.
A lot of it is driven by underlying params - if you decrease style param a bunch in Midjourney, some of the high contrast & high saturation affinity goes away.
There seem to be a lot of startups that are simply a thin veneer on top of the OpenAI APIs. We're going to see more and more functionality swept up into the OpenAI offering. If I was in one of those startups I'd be getting a little bit worried.
I don't know why anyone would be surprised at this turn of events. It's been pretty obvious all along that it was only a matter of time before OpenAI started productizing, so unless your startup is serving a very specific niche that you know inside and out, it's hard to see how anyone could justify starting a business whose value proposition is "combine these two ready-made OpenAI APIs".
They also keep silently worsening their API (persumably for a moat)
When turbo-gpt3.5-instruct launched you could see logprobs of words that you had in the prompt, then suddenly one day you couldn't, because it's "not possible" or "not available" or something like that.
That doesn't actually exist in reality. You will always have to build on someone else's platforms.
Convenience store? You have to have land (and or rent: dependent), you have to pay taxes to the government on that land, you have to buy product from suppliers/vendors (platform), you have to advertise on platforms (newspaper, radio, tv, online, something), you have to hire (dependent on labor, dependent on advertising for hiring). You're probably going to end up with an accountant eventually if you have any success, dependent. You need a wide variety of basic supplies to operate your store (from t-shirt bags to cleaning supplies), dependent. You need a POS system, dependent. The list is very long.
There's no scenario where you can ever not be dependent on other platforms and services, in one form or another.
I'd challenge anybody on HN to name a case where you can avoid them. Selling loose rocks out of a cave by shouting out for customers? Maybe.
Need an operating system? Now you're dependent on a platform. On Ubuntu, on Windows, something.
Need to do AI? Say hello to Nvidia, Intel or AMD (but almost always Nvidia). Now you're dependent.
Need energy? Utility companies, now you're dependent on another company. Which is no different than any other critical dependency for your business: it goes down, you go down.
You need a datacenter for self-hosting. And or you need a cloud. And or you need fiber to the home. And or you need server hardware. And or you need an office building. And or you need electricity. And or you need xyz and on it goes for any variety of scenarios you can possibly name.
Most likely you're dependent on dozens of other platforms and services and there's absolutely nothing you can do about it.
Want to reach people? Say hello to Reddit, or TikTok, or Instagram, or Google search, or 30 other options, but most likely only a few are going to work really well for whatever you're doing. Now you're dependent.
The only actual option is to pick wisely - to the extent possible - what/who your dependencies are, and be prepared to switch to alternatives if you can.
Seeing the money some thin apps have already made, seems fine if you have your expectations set correctly. Ie small app, quick to publish, get what you can while you can.
I find it crazy that the first example they show is generating images for a science project. One would think that science is concerned with actual observations from the world, not generations that might be misleading or false. It’s kind of the arch-example I usually see for why student access to these tools is problematic.
> I am doing a report on cirrus clouds for my science class. I need photorealistic images that show off how wispy they are. I am going to compare them to photos I took of puffy cumulonimbus clouds at my house yesterday.
So the images are used for comparison against the photos that the researcher has already taken.
If they were to straight up base their thesis on AI made images, then I'd agree with you. But in this case it seems to be used as supplements, which seems fine to me, especially when used to highlight the difference between a "real" photo.
The images generated are much more coherent and compliant with the instructions than Midjourney's.
However Midjourney's has more beautiful, artistic pictures. Their recent upscaler is also very good. Midjourney is also much better at capturing the "style" of an artist.
Hands have definitely gotten much better in the latest version of Midjourney. Text is the one everyone struggles at, but early samples of Dall-E 3 had some promising examples.
Interesting that none of the new features (DALLE-3, Advanced Data Analysis, Browse with Bing) are usable without enabling history (and therefore, using your data for training).
Is it better than DALL-E 3 in Bing? I heard they ruined it by censoring it so much it was mostly unusable.
I've not paid for ChatGPT Plus as it just seems too expensive for my use, but i've been quite intreasted in getting GPT4 access, adding DALL-E 3 to the mix makes it more worthwhile for me now.
I’d be worried if I were MidJourney now. This seems just significantly better for all sorts of things other than pure art stuff. Anything that requires text or strict following of instructions is immeasurably better in Dalle3 than in MJ. And ChatGPT can take your vague, bad prompts and turn them into pretty good instructions for the model. I actually downgraded my MJ plan since it became clear to me that I’d be using it far less now. Hopefully they can come up with a response to it, but it’s a tall order to integrate an LLM the way OpenAI has done here.
I'd be worried if I were Midjourney specifically because Discord is a terrible user interface. Trying out DALL-E 3 on a regular web interface ia so much better.
I'd try out MJ again if they had a regular website. I don't even like OpenAI as a company, but I can't stand using Discord like that.
This is also my big complaint with Midjourney, and the reason I’m going to cancel my subscription after testing DALL-E 3 through ChatGPT over the last week. The discord chat slash command interface is a horrible user experience, especially on mobile.
> I'd be worried if I were Midjourney specifically because Discord is a terrible user interface.
Yeah, Discord is just nasty for that. It got a little better when I found out how to create my own "server" (what?) so my stuff didn't get lost in a sea of other messages, but it's still not good.
I'll probably continue to put up with it until the moment when OpenAI images are of equivalent quality, but not after that.
See the documentation on making your own "server" (as they call it, I don't know why... I'd call it something like a "channel"). That gives you a semi-private space where only the things you've done show up. Your images still show up in the main stream, I think, but you won't lose your stuff in a flood of other people's stuff.
Discord still kinda sucks, but not nearly as much.
Overall I prefer MidJourney's styles and feature set. But it's really, really hard to make MidJourney draw the things I want, especially when there's specific/detailed scenery I want to depict. The latter is now quite doable using DALL-E 3 even though the drawing itself may not be as good as MidJourney.
I recently generated images for a presentation. It took about 30 tries to generate 5 suitable images. But I burned 60 MidJourney generations and in the end none of the results were satisfactory. But because they were ugly but because they didn't properly depict the concept I wanted.
Now, if I can import a DALL-E 3 image into MidJourney and then use Zoom Out from there, that would be wonderful.
What prompts are you using? Unsure about what exactly you mean, but seems it's capable of generating different styles pretty easily. Here is three (x4) examples: https://imgur.com/a/X0IbtIX
> Create four images made with as different styles as possible (eg plastic, metallic, organic and something else). Make the images about a PC desktop computer standing on a desk, and a screen that shows Hacker News frontpage
> Create four new images with more different styles
> Create four similar images with the style "out of this world"
I'm guessing that it's giving you that kind of image because usually photograph captions don't use keywords like "photorealistic", it's mostly when you're doing illustrations, concept art and similar, that you use it to convey it's trying to imitate a photo. So it's creating images that look like those images, rather than actual photos.
This one, to me, could be mistaken for not only an actual photo of a dog, but my own dog. (aside from the cyborg implant, which is a bit of a giveaway :) )
I didn't even bother saying "photorealistic", but I did give lighting hints:
"tricolor english shepherd future cyborg parts on head and front leg, very futuristic tech, replaces side of head and eye, black anodized aluminum, orange leds, big raised camera eye where normal eye would be, with colorful rich deep teal artificial iris, in worn future urban park with pond and stone bridge but pretty near dark with streetlight above beautiful dog and technology"
I’ll re-up here that I’ve collected all the ChatGPT system prompts together [0], including DALL•E 3 [1]. I wanted them all in one place as I work on the next release of my ChatGPT AutoExpert [2] “custom instructions”, which will include custom instructions for Voice Conversations, Vision, and yes, DALL•E.
I'm not an artist by any means but I don't have to pay for MidJourney anymore separately, everything all in ChatGPT now and I can get the same if not better results.
Me, my wife and children can now play with this and become artists (if they choose to be) now without switching websites.
What a great time to be alive, this is the future.
I quickly found that this feature has rate limits: "I apologize for the inconvenience, but due to rate limits, I'm unable to generate images at this moment. Please wait for 14 minutes before generating more images. In the meantime, I'm here to help with any other questions or information you might need."
"I'm sorry for the inconvenience, but I cannot provide original photorealistic images directly. However, I can help guide you to sources where you might find such images or provide more information on cirrus clouds to support your report."
Note, the wrong grandparent distribution but at least the structure is right.
ChatGPT even provided a decent prompt for Dall-E version.
However the Dall-E version was giving horrible cyclical graph monstrousities that in no way resembled tree, just graphs with multiple fathers, mothers, complete non-sense.
Also, I was hoping to see pictures of people but that also was failing.
It have some restrictions on i.e generating something based on works from a century back, but it is impressive.
Told it to generate images based on the song Vincent (that generated some Van Gogh style drawings), then ask it to generate the same, but with Tintoretto style (couldn't use some newer artists, even using the song had objections), and then added corrections on some of the generated pictures, with impressive results.
It looks like a translation job. I mean it asks in a language that Dall-E speaks, for something that somewhat was implied in what I said, in the way that Chatgpt understood it.
Not available via API yet, right? I've been hosting a stable diffusion instance for a while, and even with the latest SDXL models Dalle3 really blows it out the water, would love to try it out.
Were there any improvements in distancing generated images from the training set? I'd like to use AI images commercially, but I'm always afraid of some person claiming that the image looks just like their work.
Edit: I see they tried to make sure the image doesn't look like the "style of living artists", and added the option of people to opt-out their images from the training set. Progress, but is this enough? I don't think so.
MY favorite party of dalle3 in chatgpt+ is being able to conversationally ask for changes to the output. I am a big fan of SD, but this is very very good.
I wonder when the API release will be. I have a site ready to upgrade to DALL-E 3 but I need the API! I tested out some of my prompts I've used with DALL-E 2 and they are way way way better when run with 3. So much better it's quite shocking haha. The earlier announcement said "later this Fall" but I don't see any update yet on this, anyone know?
Been using this feature for about 2 weeks now. Works very well, but straight prompting works better from my tests. I'm sure there's a way to get a direct prompt inserted through CHATPGT though, I've just compared their prompt creation to my custom one in bing images.
I wish we could use popular culture with DALL-E. I wanted to troll my boss by asking something like "draw me Papa Smurfs playing on his computer". Sadly it's overly cautious and refuse to touch anything copyrighted...
It still doesn't seem to have an understanding of the "logic" of certain structures. I asked it for a photorealistic image of a Vekoma Boomerang roller coaster, which has a very specific layout and type of track. Not only did it generate images with far more intricate sets layouts than a boomerang would ever have, it also still generates pieces of track that don't connect to anything. Like, it has a general sense of what a photo of a roller coaster looks like, but none of the specific details that make a coaster work.
I next asked for a diagram of a diverging diamond interchange, and it came up with 4 images that looked like 3D models of interchanges on a white background, but none of them were diverging diamonds.
I'm going to experiment with describing the thing I want to see in more detail, to see of Dall-E 3 can make a nice image out of it.
Does anyone know how access rolls out? I'm a ChatGPT+ customer, and it seems like it took weeks to get access to their image prompt feature after they announced it.
Yeah, this isn't working at all for me. In fact, it denies any ability to use DALL-E. ChatGPT+ customer, using GPT-4.
Edit: here is what I get when I use their own example prompt:
Me: I am doing a report on cirrus clouds for my science class. I need photorealistic images that show off how wispy they are. I am going to compare them to photos I took of puffy cumulonimbus clouds at my house yesterday.
ChatGPT 4: I'm sorry, but I cannot generate or provide photorealistic images directly. However, I can guide you on how to describe, find, or use them in your report (yadda-yadda-yadda).
Edit #2: I tried logging out and back in and now I have it, so if it's not working for you that might be worth a try.
Nice, but when I try to generate photos, it still looks more like digital art than anything else, especially faces. Was this done intentionally by OpenAI?
Yeah I liked one of its suggestions for a sailboat with rocky forested coastline in the background, so asked it to refine it to look photographic. But the result still looked more like an illustration.
You can just tell it not to use its own prompt. For example, use the prompt
"Use the following prompt to generate an image. Do not use your own prompt: "A watercolor painting of a teal coffee cup with orange juice splashing into it"
This generates a single image with the specified prompt. No other images are generated.
I've been using it for a week or two. It's been amazing. Granted i just use it for fun and sometimes to explore visual ideas, but still - i enjoy it far more than i thought i would.
I've been playing around with it for a couple of weeks and it seems to work fine. It's a bit slow (sometimes up to a minute) but otherwise seems stable.
That's correct. You can't upload an image and modify it.
The closest you can get (which is still very far) is to upload your image to GPT-4 Vision and ask it to give you a prompt to describe the image, and then put that prompt into DALL-E 3.
What is their "content policy"? One of my standard test prompts for these image generators is: "Can you draw an illustration for a web based collaborative environment for using Jupyter notebooks, Latex, Linux, GPU's for doing data science?" and no matter what variant on this I try, I get back "I'm sorry for the inconvenience, but I was unable to generate images for your request due to our content policy."
EDIT: "Latex" (pronounced "lay-tech", the document editing software from Knuth) is what violated the policy.
Note that if you use jailbreaks to circumvent even spurious safety guards, you're at risk as others have faced of being banned permanently from using the service and will miss out on future AI developments from their product
Last year I generated around 7,000 images using DALL·E 2 and uploaded them to https://generrated.com/
I've been re-running the same prompts using DALL·E 3, although haven't updated the site yet (although I'm planning to). So far I've created 2,000 like-for-like using those prompts.
----------
In the meantime, here are some things I've noticed with DALL·E 3 vs. DALL·E 2:
- the quality is astounding, especially illustrations (vs. photographs) — as I've been looking at the DALL·E 2 images I've constantly felt like the old images look like potatoes now we have DALL·E 3 and Midjourney (even though at the time they seemed stunning)
- you will struggle to get an output that references a specific artist, but it will sometimes offer to make images in the general style of the artist as a compromise
- it can get quite repetitive when you ask it for concepts — if you look at the 'representation of anxiety' images on Generrated, you'll see that there's a huge variety in them, but as I've been running them with DALL·E 3 it seems to prefer certain imagery (in this case, a human heart under stress appears a lot), and the 'discovery of gravity' will include a tree and an apple 80% of the time
- some of the prompts need guidance to get the output you desire — 'iconic logo symbol' works well on DALL·E 2 to create a logo, but with DALL·E 3 will often produce a general image/painting with a logo somewhere in the image (e.g. a NASA logo on an astronaut's suit rather than a logo of an astronaut)
Those are some I can remember off of the top of my head. But it's so much fun to play with!
----------
Edit: I quickly put together 3 comparisons between v2 and v3: https://imgur.com/a/L9DYCSA