> which is the reason newer models like Stable Cascade are not commercially-frie...

ipsum2 · on Feb 13, 2024

It's highly doubtful that Midjourney and OpenAI use Stable Diffusion or other Stability models.

cthalupa · on Feb 13, 2024

Midjourney 100% at least used to use Stable Diffusion: https://twitter.com/EMostaque/status/1561917541743841280

I am not sure if that is still the case.

refulgentis · on Feb 13, 2024

It trialled it as an explicitly optional model for a moment a couple years ago. (or only a year? time moves so fast. somewhere in v2/v3 timeframe and around when SD came out). I am sure it is no longer the case.

liuliu · on Feb 13, 2024

DALL-E shares the same autoencoders as SD v1.x. It is probably similar to how Meta's Emu-class models work though. They tweaked the architecture quite a bit, trained on their own dataset, reused some components (or in Emu case, trained all the components from scratch but reused the same arch).

jonplackett · on Feb 13, 2024

How do you know though?

minimaxir · on Feb 13, 2024

You can't use off-the-shelf models to get the results Midjourney and DALL-E generate, even with strong finetuning.

cthalupa · on Feb 13, 2024

I pay for both MJ and DALL-E (though OpenAI mostly gets my money for GPT) and don't find them to produce significantly better images than popular checkpoints on CivitAI. What I do find is that they are significantly easier to work with. (Actually, my experience with hundreds of DALL-E generations is that it's actually quite poor in quality. I'm in several IRC channels where it's the image generator of choice for some IRC bots, and I'm never particularly impressed with the visual quality.)

For MJ in particular, knowing that they at least used to use Stable Diffusion under the hood, it would not surprise me if the majority of the secret sauce is actually a middle layer that processes the prompt and converts it to one that is better for working with SD. Prompting SD to get output at the MJ quality level takes significantly more tokens, lots of refinement, heavy tweaking of negative prompting, etc. Also a stack of embeddings and LoRAs, though I would place those more in the category of finetuning like you had mentioned.

emadm · on Feb 13, 2024

If you try diffusionGPT with regional prompting added and a GAN corrector you can get a good idea of what is possible https://diffusiongpt.github.io

euazOn · on Feb 13, 2024

That looks very impressive unless the demo is cherrypicked, would be great if this could be implemented into a frontend like Fooocus https://github.com/lllyasviel/Fooocus

millgrove · on Feb 13, 2024

What do you use it for? I haven't found a great use for it myself (outside of generating assets for landing pages / apps, where it's really really good). But I have seen endless subreddits / instagram pages dedicated to various forms of AI content, so it seems lots of people are using it for fun?

cthalupa · on Feb 13, 2024

Nothing professional. I run a variety of tabletop RPGs for friends, so I mostly use it for making visual aids there. I've also got a large format printer that I was no longer using for it's original purpose, so I bought a few front-loading art frames that I generate art for and rotate through periodically.

I've also used it to generate art for deskmats I got printed at https://specterlabs.co/

For commercial stuff I still pay human artists.

throwanem · on Feb 13, 2024

Whose frames do you use? Do you like them? I print my photos to frame and hang, and wouldn't at all mind being able to rotate them more conveniently and inexpensively than dedicating a frame to each allows.

cthalupa · on Feb 14, 2024

https://www.spotlightdisplays.com/

I like them quite a bit, and you can get basically any size cut to fit your needs even if they don't directly offer it on the site.

throwanem · on Feb 14, 2024

Perfectly suited to go alongside the style of frame I already have lots of, and very reasonably priced off the shelf for the 13x19 my printer tops out at. Thanks so much! It'll be easier to fill that one blank wall now.

xionplean · on Feb 16, 2024

I use comfyUI/SD and MJ and I have never seen anything on the level of what I get out of MJ. Nothing at CivitAI is impressive to me next to what I get from MJ.

Of course, art is so subjective none of this has any real meaning. MJ routinely blows my mind though and it is very rare something from SD does. The secret MJ sauce is obviously all the human feedback that has gone into the model at this point.

I think AI video will be a different story though. I think that is when comfyUI/SD will destroy MJ because MJ is simply not going to be able to have an economic model with the amount of compute needed.

soultrees · on Feb 13, 2024

What IRC Channels do you frequent?

cthalupa · on Feb 13, 2024

Largely some old channels from the 90s/00s that really only exist as vestiges of their former selves - not really related to their original purpose, just rooms for hanging out with friends made there back when they had a point besides being a group chat.

orbital-decay · on Feb 14, 2024

Midjourney has absolutely nothing to offer compared to proper finetunes. DALL-E has: it generalizes well (can make objects interact properly for example) and has great prompt adherence. But it can also be unpredictable as hell because it rewrites the prompts. DALL-E's quality is meh - it has terrible artifacts on all pixel-sized details, hallucinations on small details, and limited resolution. Controlnets, finetuning/zero-shot reference transfer, and open tooling would have made a beast of a model of it, but they aren't available.

xionplean · on Feb 16, 2024

You obviously have no idea what you are talking about or just make stupid anime porn.

For fine art MJ destroys everything and it is even close. I say this with using comfyUI/SD all the time.

But you do you and your anime porn.

orbital-decay · on Feb 16, 2024

I'm actually a person making technical decisions (art decisions in the past) in a VFX/art studio, and I'm talking about production use. No generative AI currently passes any reasonable production quality bar, but is being tried by everyone for doing the work that can't be done or is cost-prohibitive otherwise, for example animation, long series with style transfer, filler assets creation etc. Anything that only has a text prompt can be discarded instantly. You have to be able to finetune it on your own material for consistency (of course I'm not talking about dubious 3rd party models), you need higher order guidance (e.g. controlnets, especially custom ones) and many other things. In the hands of a skilled person, a trivial Krita/Photoshop plugin (Firefly, SD, SD realtime) blows anything MJ can offer out of the water, simply because it has all that and you can't do much with text, it doesn't have enough semantic capacity to express artistic intent. I'm not even starting on animation.

In fact, anything that involves non-explicitly guided one-shot generation of anything with light/shadow/colors/perspective is entirely out of the question with the current crop, because all models are hallucinating hard and aren't controllable within a single generation. There are attempts at fixing the perspective without explicit guidance, but it's going to be a long way and it's not super relevant to how things are done anyway.

And for fine art, nothing beats a human painter, doing it by throwing prompts at AI mostly misses the point. I'm not even sure what you mean by fine art in this context, actually - surely not generating artsy-looking images from a prompt for fun?

yreg · on Feb 13, 2024

That's not really true, MJ and DALL-E are just more beginner friendly.

programjames · on Feb 13, 2024

I think it'd be interesting to have a non-profit "model sharing" platform, where people can buy/sell compute. When you run someone's model, they get royalties on the compute you buy.

minimaxir · on Feb 13, 2024

More specifically, it's so Stability AI can theoretically make a business on selling commercial access to those models through a membership: https://stability.ai/news/introducing-stability-ai-membershi...

thatguysaguy · on Feb 13, 2024

The net flow of knowledge about text-to-image generation from OpenAI has definitely been outward. The early open source methods used CLIP, which OpenAI came up with. Dall-e (1) was also the first demonstration that we could do text to image at all. (There were some earlier papers which could give you a red splotch if you said stop sign or something years earlier).

yogorenapan · on Feb 13, 2024

> AI desperately needs a GPL equivalent

Why not just the GPL then?

loudmax · on Feb 13, 2024

The GPL was intended for computer code that gets compiled to a binary form. You can share the binary, but you also have to share the code that the binary is compiled from. Pre-trained model weights might be thought of as analogous to compiled code, and the training data may be analogous to program code, but they're not the same thing.

The model weights are shared openly, but the training data used to create these models isn't. This is at least partly because all these models, including OpenAI's, are trained on copyrighted data, so the copyright status of the models themselves is somewhat murky.

In the future we may see models that are 100% trained in the open, but foundational models are currently very expensive to train from scratch. Either prices would need to come down, or enthusiasts will need some way to share radically distributed GPU resources.

emadm · on Feb 13, 2024

Tbh I think these models will largely be trained on synthetic datasets in the future. They are mostly trained on garbage now. We have been doing opt outs on these, has been interesting to see quality differential (or lack thereof), eg removing books3 from stableLM 3b zephyr https://stability.wandb.io/stability-llm/stable-lm/reports/S...

keenmaster · on Feb 13, 2024

Why aren’t the big models trained on synthetic datasets now? What’s the bottleneck? And how do you avoid amplifying the weaknesses of LLMs when you train on LLM output vs. novel material from the comparatively very intelligent members of the human species. Would be interesting to see your take on this.

emadm · on Feb 14, 2024

We are starting to see that, see phi2 for example

There are approaches to get the right type of augmented and generated data to feed these models right, check out our QDAIF paper we worked on for example

https://arxiv.org/pdf/2310.13032.pdf

sillysaurusx · on Feb 13, 2024

I’ve wondered whether books3 makes a difference, and how much. If you ever train a model with a proper books3 ablation I’d be curious to know how it does. Books are an important data source, but if users find the model useful without them then that’s a good datapoint.

emadm · on Feb 14, 2024

We did try stableLM 3b4 with books3 and it got worse in general and benchmarks

Just did some pes2o ablations too which were eh

sillysaurusx · on Feb 14, 2024

What I mean is, it’s important to train a model with and without books3. That’s the only way to know whether it was books3 itself causing the issue, or some artifact of the training process.

One thing that’s hard to measure is the knowledge contained in books3. If someone asks about certain books, it won’t be able to give an answer unless the knowledge is there in some form. I’ve often wondered whether scraping the internet is enough rather than training on books directly.

But be careful about relying too much on evals. Ultimately the only benchmark that matters is whether users find the model useful. The clearest test of this would be to train two models side by side, with and without books3, and then ask some people which they prefer.

It’s really tricky to get all of this right. But if there’s more details on the pes2o ablations I’d be curious to see.

protomikron · on Feb 13, 2024

What about CC licenses for model weights? It's common for files ("images", "video", "audio", ...) So maybe appropriate.