Midjurney is still so far ahead it's no competition. Did a lot of testing today and firefly generated so much errors with fingers and stuff, not seen that since the original stability release. Anyone know if the web firefly and the Photoshop version is the same model?
It's worth noting the difference in how the training material is sourced though, Midjourney is using indiscriminate web scrapes while Firefly is taking the conservative approach of only using images that Adobe holds a license for. Midjourney has the Sword of Damocles hanging over its head that depending on how legal precedent shakes out, its output might end up being too tainted for commercial purposes, and Adobe is betting on being the safe alternative during the period of uncertainly and if the hammer does come down on web-scraping models.
I don't think it really matters whether or not Midjourney themselves are liable, the output of their model being legally radioactive would break their business model either way. They make money by charging users for commercial-use rights to their generations, but a judgement that the generations are uncopyrightable or outright infringing on others copyright would make it effectively useless for the kinds of users who want commercial-use rights.
I wouldn't loose sleep over this if I was working for Midjourney. Copyright lobby is powerful, and when bad actors like Microsoft, Disney etc. jump onto the AI bandwagon and put their legal weight on their side of the lever, everything will turn out well (for them).
I'm presuming you're not including Stable Diffusion when you say this; the fact that SD and its variants are defacto extremely "free and open source" presently put it way ahead of anything else, and are likely to do so for some time.
As far as I can tell anyone who’s creating images is using midjourney. This is likely the same “Linux is open so it’s way better” tell that to the trillion dollar companies that bet against that.
To be honest most of the AI generated images I find online are generated by Stable Diffusion, the fact that you can't generate NSFW images with MJ makes also a big difference.
This comment is breaking my brain. If you're not trolling, like, you do know what operating system the overwhelming vast majority of the "cloud" runs on, yes?
I’m perfectly aware of that. But you know what operating systems the overwhelming vast majority of PEOPLE use, yes?
Sure likely more machines run Linux on servers but that’s like saying your body has more bacteria than your own cells. Technically correct but actually bullshit.
I share the same opinion, but also dislike these tests because each system benefits from a different approach to prompting. What I use to get a good result in MidJourney won't work in StableDiffusion for example. Instead when making these comparisons one needs to set an objective and have people who are familiar with each system to produce their nicest images - since this is a better reflection of the real world usage. For example, ask each participant to read a chapter/page from a book with a lot of specific imagery and then use AI to create what they think that looks like.
Regarding image generation in Photoshop I can confirm two things:
- It is excellent for in and out painting with a few exceptions*
- It remains poor for generating a brand new image
*Photoshop's generative fill is very good at extending landscapes, it will match lighting and according to the release video can be smart enough to observe what a reflection should contain even if that is not specifically included in the image (in their launch demo they showed how a reflection pool captured the underside of a vehicle.)
Where generative fill falls apart: Inserting new objects that are not well defined produces problems. Choosing something like a VW Beetle will produce a good result as it is well defined, choosing something like "boat", "dragon", or even "pirate's chest": will produce a range of images that do not necessarily fit the scene - this is likely because source imagery for such objects is likely vague and prone to different representations.
1st note about Firefly: Anything that is likely to produce a spherical looking shape tends to be blocked - likely because it resembles certain human anatomy. This is problematic when doing small touch ups such as fixing fingers.
A special note about photoshop versus other systems: Photoshop has the added problem of needing to match the resolution of the source material. Currently it achieves this from combining upscaling with resizing - this means that if one is extending an area with high detail, that detail cannot be maintained and instead is softer/blurrier than the original sections. It also means that if one extends directly from the border of an image, then a feathered edge becomes visible which must be corrected by hand.
I currently test the following AI generators, feel free to ask me about any of these: StableDiffusion (Automatic and InvokeAI), OpenAI's Dall-E 2, MidJourney, Stability AI's DreamStudio, and Adobe Firefly.
None of these can do text well. There's a model that does do text and composition well, but the name escapes me. And the general quality is much lower overall, so it's a pretty heavy tradeoff.
I believe this is at least one solution, and one that the folks at stability themselves were pushing hard as a next step forward in the development of LLMs.