Is the latest Midjourney now considered better than Stable Diffusion? If so, will Stable Diffusion catch up? I strongly prefer the open source nature and ability to run locally of Stable Diffusion.
SD can be better than MJ if you dig really deep into the open source nature of SD. Open source UI and plugins, community-made models of many types and uses, tribal knowledge of techniques... It’s very complicated and requires a lot of searching, reading, installing and experimentation.
The base tech and models straight from Stability AI give pretty crap results if you just plainly describe a scene.
MJ in contrast provides a great results out of the box. Say anything, get beautiful picture of something. From there you need to figure out specifically what you actually want.
However, if you really want to iterate on just specific details of a scene with a specific layout, with specific characters in specific poses, with specific style elements, MJ is too chaotic to control at that fine detail. So to is SD out of the box. But, if you take the time to learn how to install and use ControlNet, highly specialized models/LORA/textual inversions from wildly varying sources, in-painting, latent upscaling, hook up Photoshop/Krita/Blender integrations, ect, ect… you can eventually get very precise control of SD’s results. And, then new better tech releases next week! :D
> Is the latest Midjourney now considered better than Stable Diffusion?
Its better in terms of having a no-user-configuration service available that gets you from zero to decent results with nothing more than prompting.
SD is better in available specialized customized models (finetuned checkpoints and the various kinds of mix-and-match auxiliary models and embeddings that can be used with them), not having banned topics because it is self-hostable, available tooling and UIs that expose tuning parameters and incorporate support for integrating techniques like guided generation with various types of ControlNet models, animation, inpainting, outpainting, prompting-by-region, etc., with image generation.
Midjourney provides much better images by default. It's really impressive.
Stable Diffusion's advantage is in the huge amount of open source activity around it. Most recently that resulted in ControlNet, which is far more powerful than anything Midjourney can currently do - if you know how to use it.
Look around a bit for info on controlnet. You can use depth maps, scribble in where you want objects to be, or place human poses in a scene and SD will use it to generate an image. You can combined multiple controlnet models and control how much they contribute to the scene. The level of control available is pretty awesome. I say that as someone who was in the DALL E beta and used midjourney for a few months (though I guess I don’t know what advancements they’ve made in the last few months).
Both StabilityAI and the open source community are working on improvements to Stable Diffusion.
Keep in mind StabilityAI is also pursuing LLMs and the host of other model types, whereas text to image is Midjourney's single core competency and value prop. Midjourney is very focused on staying ahead.
edit: I wanted to add that the extensive training costs can be prohibitive for the OSS community to fully participate. Coordination via groups such as LAION can help, but gone are the days of individual OSS participants contributing directly to core foundational model training.
In fact here's a list of painters whose style is immune to direct mimicry in Midjourney because their name is banned:
Ambreen Butt
Jan Cox
Constance Gordon-Cumming
Dai Xi
Jessie Alexandra Dick
Dong Qichang
Dong Yuan
Willy Finch
Constance Gordon-Cumming
Spencer Gore
Ernő Grünbaum
Guo Xi
Elena Guro
Adolf Hitler
Prince Hoare
William Hoare
Fanny McIan
Willy Bo Richardson
Shang Xi
Wang Duo
Wang E
Wang Fu
Wang Guxiang
Wang Hui
Wang Jian
Wang Lü
Wang Meng
Wang Mian
Wang Shimin
Wang Shishen
Victor Wang
Wang Wei
Wang Wu
Wang Ximeng
Wang Yi
Wang Yuan
Wang Yuanqi
Wang Zhenpeng
Wang Zhongyu
Xi Gang
Xie Shichen
Xu Xi
I have this list because I recently made a site[1] that displays the 4 images from a prompt of "Lotus, in the style of <paintername> <birth-death dates> [nation of origin]" for every painter listed on wikipedia's "List of painters" -- except for those in the above list.
The fact that they banned both Xi and Jinping separately to prevent Xi Jinping was surprising to me. Twice as banned as Adolf Hitler.
[1] https://lotuslotuslotus.com - small chance you get an NSFW image if you hit upon Fernando Botero or John Armstrong, perhaps there's more.
So if you're an artist and don't want your style to be used in an AI product, all you have to do is changing your name that includes a variation of controversial leader names (xi, adolf), or slightly offensive name (dick, gore)?
> So if you're an artist and don't want your style to be used in an AI product, all you have to do is changing your name that includes a variation of controversial leader names (xi, adolf), or slightly offensive name (dick, gore)?
Nope, that doesn't stop models from being trained on your art. It makes it somewhat more difficult for people to prompt specifically for your style, but your art still influences output, and there may be other ways (e.g., titles of specific works) to deliberately and specifically evoke it in particular.
ChatGPT made me a python script for automating pasting results, as well as most other tasks related to this project.
I already had a discord bot I wrote by hand before for downloading the images.
I thought of the project in the morning while my kid was getting ready for school and had it running jobs before we were out the door, worked a little before I left for work, and and a little more after work, and it was done before dinner time.
I listened in to their weekly chat today and it sounded like they'd be happy if they could just ban all political mimickry. I think the AI images of Trump being arrested (which looked like Midjourney to me) were disappointing.
Maybe that just means banning images that are meant to fool people, and obvious satire would be ok, but they might be ok erring on the side of caution.
(Nothing in the talk was this explicit, but this was my read of the subtext)
It's been noticeably better than Stable Diffusion since v3 at least (I wasn't paying attention before that). It's on v5 now, and I think MJ has continued to get better faster than SD through this time period.
ControlNet for Stable Diffusion may be an exception to this.
There's lots of versions of Stable Diffusion so I've had a hard time knowing what to compare exactly. But from what I've seen none of them come close to Midjourney.
Stable Diffusion does more things though, like in-painting where you can erase part of an image and then have it recreated. I've seen videos of people doing impressive things with in-painting and extensive regeneration of each portion of an image until it's just right. Seems like a ton of work though. Still I've had some fun using it to modify images or extend images.
Midjourney v4 was already better than Stable Diffusion. The new Dall-e (which you can use on Bing) I also find it better...
The main difference with Stable Diffusion is that you can fine tune with your own dataset. There's img2img, and a bunch of other tools. But the base model it's really worse than competitors right now.
Also SD can do porn, which Midjourney forbids for some reason. They're leaving an assload of money on the table and somebody will nab it sooner than later.
The Dall-e API sucks so much right now, I’ve been experimenting with it the past few days and it produces a lot of horrors. I even used the Dall-e prompt book as a guide but still so many more misses than hits. Even when it gives a non horrifying image it’s just decent. 5/10 rating
I started testing out the official Stable diffusion API and it already gives you way more control than the Dall-e API and seems to produce less horrifying images that are better quality but I feel like Dall-e understands the prompts better. 7/10
I would love to try mid journey but I uninstalled discord years ago and have no plans to ever reinstall it ever again. So I’ll wait for the API access if they ever do it. 0/10 (only for being discord only)
Standard Dall-e 2.0 is worse than Stable Diffusion... Dall-e Experimental, is available on Bing, which I guess is Dall-e 3... Similar approach of GPT4 they did with Bing as well I guess...
From what I understand, SD doesn't handle color space correctly (or at all), hence all the weird saturated blue-magenta-orange-beige gradients in a lot of its example outputs. And why its output often feels more like a bad Photoshop collage than a proper blend. It's probably trained on unmanaged sRGB. In which case the SD model is fundamentally flawed, since doing math in sRGB space is nonsense and causes bias (specifically those saturated gradients are a sign of just that...). Although I don't know for sure, I didn't find any color management code in their scripts when I looked for it, so I'm assuming this is the case.
I'd be happy to be corrected if anyone knows the details on this in SD.