I've also been using this one (wasn't sure at first, they just migrated from the /hlky/ namespace on GitHub), but I have no idea at first glance what the differences are.
I will say that this one has had REALLY active development as new features have been coming out, and is pretty polished at this point (albeit I'm using it more as a toy than anything, but it's awesome to have a quick way to use the new features that have been shipping out).
I tried using a docker container and it took 3 min to generate a prompt. However it seems that 2:45 min is somehow spent on tje GPU and finally the remaining 15 seconds the GPU gets utilized.
I haven't had the time to look into this yet, but it does seem to work.
I've gotten it running with a Radeon RX 6800 on Ubuntu Linux 22.04 (with overwriting PyTorch with a ROCm-supporting version), and on Windows 10 (in a very barebones way using ONNX), but are there better, more full-featured ways to get it running on Windows? Would love to know.
You mean a VM on a machine with a GPU? Or does it have to be a bare metal machine? What is a good provider of suitable VMs/machines?
And what do you do after you SSHed in? The installation instructions seem to be for windows users (click here, then click there ...) is there a linux script that does the installation automatically?
Just a VM with a GPU, doesn't need to be bare metal. AWS/GCP/Azure has em, but for GPU cloud instances, boutique vendors like CoreWeave, runpod, lambdalabs.com, vast.ai, paperspace may be more competitive.
I like this one but had some trouble with using img2img. Maybe my image was too small (it was smaller than 512x512). Failed with the same signature as an issue that was closed with a fix.
So, and this is an ELI5 kind of question I suppose. There must be something going on like "processing a kazillion images" and I'm trying to wrap my head around how (or what part of) that work is "offloaded" to your home computer/graphics card? I just can't seem to make sense of how you can do it at home if you're not somehow in direct contact with "all the data?" e.g. must you be connected to the internet, or "stable-diffusions servers" for this to work?
You can think of it more like this:
If I do 100 experiments of dropping stones at variable heights and measuring the time it takes for the stone to land on the ground I have enough datapoints to make a linear estimation of gravity by using linear regression. So based on my data I create a model that the time it takes for a stone to fall is sqrt(2h/9.81). Now if you want to figure out how long it takes for your stones to fall, you don’t need to redo all the experiments and can instead rely on the parameters I give you (say 9.81 in this case) to calculate it yourself.
With these models it works exactly the same way. Someone dropped millions of rocks and created a formula of unbelievable complexity and what they now did is they released that formula with all their calculated parameters into the world. What you do when you ultimately use Stable Diffusion is you just calculate the result of this formula and that is your image. You never have to process those images.
Yes, and another reason for the small model size and the novelty of the underlying paper [1], is that the diffusion model is not acting on the pixel space but rather on a latent space. This means that this 'latent diffusion model' does not only learn the task at hand (image synthesis) but in parallel also a powerful lossy compression model via an outer auto encoder structure. Now, the number of weights (model size) can be reduced drastically as the inner neural network layers act on a lower dimensional latent space rather than a high dimensional pixel space. It's fascinating because it shows that deep learning at its core comes down to compression/decompression (encoding/decoding), with close relation to Shannon's Information Theory (e.g. source coding/channel coding/data processing inequality).
Oh, wow. Now that you mention how it's similar to lossy (if not the same as) compression it all makes a LOT of sense. This is great. I teach IT and I already do a bit on how lossy compression works, (e.g. hey, if you see a blue pixel and then another slightly darker one next to it, what's the NEXT likely to be?) and this is something of an extension of that.
Then maybe we should remind about this 25,000:1 ratio when an artist complains about his copyrights being abused. The model doesn't have space to actually copy his works inside, it can only memorise the equivalent of a thumbnail from each input. A very small thumbnail, scaled down 150:1 per width and height (square root of 25000). That's like a grain of rice on the screen.
That's not how it works though. Instead of applying arbitrary content detail reduction, the model is an attempt to distill the core of what makes a particular artist (or phrase, face, object etc) unique.
When programming, it will often take a long time and a lot of code to get to a few final lines that do what you want. You cannot say the final result is a "thumbnail" of all previous efforts. Rather, it is the apotheosis of it.
Some artists spend decades developing a style that looks like a kid could do it as well. Still, there is something unique in there, that a trained eye will recognize. Converting that particular style to a formula and making that freely available is at least somewhat morally ambiguous.
It is not the same as trying to mimic a style. It is cloning the essence of a style and making it readily available to anyone who asks for it.
Sure, it's not copyright infringement, but you could argue that this takes away from the hardship the original artist had to go through to perfect their style.
Fair use might work but maybe not? If I were to argue against it, I'd probably compare something like a recording of music vs. a MIDI file. Same raw data scaling.
That’s the interesting part: all the images generated are derived from a less than 4gb model (the trained weights of the neural network).
So in a way, hundreds of billions of possible images are all stored in the model (each a vector in multidimensional latent space) and turned into pixels on demand (drived by the language model that knows how to turn words into a vector in this space)
As it’s deterministic (given the exact same request parameters, random seed included, you get the exact same image) it’s a form of compression (or at least encoding decoding) too: I could send you the parameters for 1 million images that you would be able to recreate on your side, just as a relatively small text file.
Not exactly. There’s no real logic per se, just data. It’s made up of tons of floating point numbers that define relationships to other floating point numbers.
> As it’s deterministic (given the exact same request parameters, random seed included, you get the exact same image) it’s a form of compression (or at least encoding decoding) too: I could send you the parameters for 1 million images that you would be able to recreate on your side, just as a relatively small text file.
For any input image? Or do you mean an image generated by the model?
I meant images generated by the model. Now that I think of it I could just send you the sampled vectors and you could feed that to the vector to image part.
My understanding is that images will not be bit identical due to GPU physics and decimal precision. Images from the same seed may be for all practical intents and purposes indistinguishable - but there are some flipped bits involved.
That's not my understanding. The same seed value to the device's random number generator should results in the exact same outputs - there's a bug being chased down in the MPS (MacOS) backend where the fixed random seed doesn't output the same image on different computers.
I've heard something a bit in between what you're both saying. For the same machine with the same seed / parameters [0], your output is deterministic. But once you change hardware or OS you will probably get bit-level differences that won't make a macro-level difference.
No idea how true that is, but on my windows machine, same params/seed is definitely deterministic.
[0] a help string in the SD source code recommends the ddim_eta parameter (which isn't exposed in most web UI or GUI's, including the OP github) stay at the default 0.8 for deterministic sampling. I have no idea if this means changing the value from 0.8 produces non-deterministic results with the same hardware/os/params/seed. Or if they just mean changing this from 0.8 will make your SD not match the online model but still be deterministic itself. But in my testing, changing this value gives no useful changes to the image generation, so I keep it at 0.8
This is a fascinating idea. Have StableDiffusion generate an image from the image you'd like to "compress" + a random seed. Feed that output to an adversarial network that compares source image to output and scores it. Try again with new seed.
After running for a while, the adversarial network outputs a seed, and you now have a few characters representing a reasonable approximation of your image.
I expect something after jpegXL will be a neural network based compression scheme, where the client has a n GB neural net attached. There have been several that already show promising results (it's likely to be more of a standards issue than a technical issue).
In 80s there was a man (forgot his name) who claimed that one day you could store an entire high res movie on a floppy disc. One day he might be right when AI can regenerate sequences of seeds to images/video. You just need a petabyte of models stored somewhere.
This is the main reason why attempts to say that these sorts of AI are just glorified lookup tables, or even that they are simply tools that mash together a kazillion images together are very misleading.
A kazillion images are used in training, but training consists of using those images to tune on the order of ~5 GB of weights and that is the entire size of the final model. Those images are never stored anywhere else and are discarded immediately after being used to tune the model. Those 5 GB generate all the images we see.
All those 'kazillion' images are processed into a single 'model'. Similar to how our brain cannot remember 100% of all our experiences, this model will not store precise copies of all images it is trained off of. However, it will understand concepts, such as what a unicorn looks like.
For StableDiffusion, the current model is ~4GB, which is downloaded the first time you run the model. These 4GB encode all the information that the model requires to derive your images.
SD has 860M weights for the main workhorse part. At 16-bit precision that is only 1.6 GB of data, which in some very real sense has condensed the world's total knowledge of art and photography and styles and objects.
It's not a search engine, it's self-contained and the closest analogy is that it's a very very knowledgable and skilled artist.
What you interact with as the user is the model and its weights.
The model (presumably some kind of convolutional neural network) has many layers, every layer has some set of nodes, and every node has a weight, which is just some coefficient. The weights are 'learned' during the model training where the model takes in the data you mention and evaluates the output. This typically happens on a super beefy computer and can take a long time for a model like this. As images are evaluated the output gets better the weights get adjusted accordingly.
Now we as the user just need the model and the weights!
It’s all offline in 4gb file on your local computer. It’s like mini brain trained to do just one/few specific tasks. Just like your own brain doesn’t need Wi-Fi to connect to global memory storage of everything you experienced since birth, same way this 4gb file doesn’t need anything extra.
A kazillion images are used to create/optimize a neural network (basically). What you're working with is the result of that training. These are the "weights"
As someone with ~0 knowledge in this field, I think this has to do with a concept called "transfer learning" in which you once train with that kazillion of images, then use that same "coefficients" for further run of the NN.
Nah, transfer learning is when you take a trained model, and train it a little more to better fit your (potentially very different) problem domain. Such as training a cat/dog/etc recognition model on MRI scans.
The goal is usually to have the more fundamental parts of your model already working and you thus need way less domain specific data.
Here, you're not training anything, you're running the models (both the CLIP language model and the unet) in feedforward. That's just deploying your model, not transfer learning.
Looks great but I use Linux and the README is fairly Windows-centric without warning. It'd be nice if there were clearly deliniated sections for Windows vs *nix.
There's a (very ironically named) "Manual installation" section which might seem to be the answer for Linux, but then it's not immediately obvious which preceding sections are Linux or Windows without doing critical thinking.
I've been looking into this for the last 2 days. Unless you're running an M1 Mac or newer, you're SOL.
Stable Diffusion is built on PyTorch. PyTorch mainly has been designed to work with Nvidia cards. However PyTorch added support for something called RocM like a year ago that adds compatibility with newer AMD cards.
Unfortunately RocM doesn't support slightly older AMD cards in conjunction with intel processors.
So my 32gb pretty powerful 2020 16in MacBook Pro isn't capable of running Stable Diffusion.
Any native app will likely have to rely on a remote cloud gpu. And boy, those are fucking expensive. Been researching what I need to stand up a service the last few days and it isn't cost friendly.
> I've been looking into this for the last 2 days. Unless you're running an M1 Mac or newer, you're SOL.
And not just any old M1 Mac. Last week I got it running on my 2021 8GB M1 MacBook Air and it's slow. Images at 512x512 with 10 steps take between 7 and 10 minutes to generate.
It's the only thing I do that hits performance limitations on the 8GB machine so there's no regrets on that score, but with the way this stuff is progressing 16GB+ is a realistic minimum for comfortable use.
That's what I initially followed, too. But there does seem to be a better way - I've just installed CHARL-E [0] (mentioned elsewhere in this thread) and it was trivial to set up. Literally download the dmg, drag into Applications, and run.
It's an electron app and you can either download it with the weights, or without and add them separately.
Using that it just took slightly over 5 minutes on my 8GB so it's a little bit quicker for me. Maybe the code has improved since I cloned stuff over a week ago, or maybe it's just different system resources when it is run. Either way, it looks like the easy way I've been waiting for.
I've been running the Intel CPU version [0] for a while now on a 2013 MacMini. Works fine; it takes several minutes per image but I can live with that.
Great stuff, and works on an 8GB M1 Air taking between 5 and 10 minutes for between 5 and 15 steps. As a suggestion, perhaps add the option for setting the CFG too (I know, it's open source etc, but it's just a suggestion).
Funny, it probably does a whole lotta things, but it can't create the `~/Desktop/charl-e/samples/` directory? That seems like it should be relatively trivial...
Same! I wrote a public web app so that I could access the model from my phone [0]. This is how I found Replicate [1]. Their SD model is very cheap to use. While we wait for a native Mac app, I recommend accessing the model straight from their web UI.
People recently figured out how to export stable diffusion to onnx so it’ll be exciting to see some actual web UIs for it soon (via quantized models and tfjs/onnxruntime for web)
Very cool! Can you link to where this is taking place?
A commenter mentioned today it might be possible to pre-download the model and load it into the browser from the local filesystem rather than include such a gigantic blob as an accompanying dependency, fighting different caching RFC's, security/usage restrictions, and anything else that might inadvertently trigger a re-download.
MidJourney needs a lot of prompt engineering too. And Dall-E also. If you look at the prompt as an opportunity to describe what you want to see, the results are often disappointing. It works better to think backwards about how the model was trained, and what sorts of web caption words it likely saw in training examples that used the sorts of features you’re hoping it will generate. This is more of a process of learning to ask the model to produce things it’s able to produce, using its special image language.
The metadata and file names of the images in the source data set are also inputs for the model training. These keywords are common tags across images that have these characteristics, so in the same way it knows what a unicorn looks like, it also knows what a 4k unicorn looks like compared to a hyper rez unicorn.
The results in midjourney are significantly better than SD. I find it much easier to get to a good result in MJ and I've been trying to understand why. Anymore insight you could share?
Good engineering. Midjourney likely has a lot going on under the hood before your prompt actually gets to Stable Diffusion. As an example you can check out this research paper [0] which seeks to add prompt chaining to GPT-3 so you can "correct" it's outputs before it reaches back to the user. There's also no rule that states you can only make one call to SD, MJ likely bounces around a picture through a pipeline they've tuned to ensure your generated image looks more reasonable.
Midjourney takes their base models and does further training/guidance on them to bring out intentional aesthetic qualities. One of their main goals is to ensure that that their “default” style is beautiful no matter how simple the user’s prompt is.
Midjourney is doing "secret sauce" post-processing to enhance the image returned from the model. SD just gives you back what the model spits out. That's how I understand it at least
I've been having a lot of fun with Stable Diffusion and Midjourney.
One thing that is very powerful with Stable Diffusion is using text inversion ( https://textual-inversion.github.io/ ) - you can add additional input samples to further extend the possibilities beyond what is included in the original model.
You can, though you might run into memory limitations running it on a GPU. There can be tuning done to lower the VRAM utilization, but I have been lucky enough to not need this - I do some CG work and ran into VRAM limitations there, so I'm on a 3090 with 24GB.
You can always run it on a CPU and utilize your RAM instead if needed, though the training might extend to 24+ hours that way.
For some genuinely incredible results try this pattern for instruction:
Portrait of {Name of some type of identity such as "Faerie Princess" or "Dragon Queen"} {Name of a celebrity such as "Scarlett Johansson"}, beautiful face, symmetrical face, tone mapped, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and Greg Rutkowski and Alphonse Mucha and Boris Vallejo and Johannes Voss and Aleksi Briclot and Michael Komarck.
Run several iterations of the same query as some results will have anomalies.
I have a 6gb 1660ti, barely holding on. Is a new 12gb card good enough for now, or should I go even higher to be safe for a few years of sd innovation?
I'm using it with a 2070 (4 year old card with 8gb vram) and it takes about 5 seconds for a 512x512 image. It's been plenty fast to have some fun, but I think I'd want faster if it was part of a professional work flow.
It was the defaults for the webui I used. Faster than I expected too, but the results were all legit.
Edit: Got home and was able to double check. It's actually a solid 10 seconds per image with the following settings: seed:466520488 width:512 height:512 steps:50 cfg_scale:7.5 sampler:k_lms. Still quick enough for some fun, but could be annoying if you're need to do multiple iterations a minute.
It sounds like there's forks that are able to work with <=8GB cards. And I'm not sure but I think the weights are using f32, so switching to half might make it yet easier still to get this to work w/less memory.
But yeah the next generation of models would probably capitalize on more memory somehow.
How is M1/M2 support for SD? Is there a significant performance drop? Presumably you would be able to buy a 32GB M2 and be future proof because of the shared memory between CPU/GPU.
The original txt2img and img2img scripts are a bit wonky and not all of the samplers work, but as long as you stick to dream.py and use a working sampler, I have had good luck with k_lms, then it works great and runs way faster than the cpu version.
Works great on 32gb ram but I'm honestly tempted to sell this one and get a 64gb model once the m2 pros come around. This is capable of eating up all the ram you can throw at it to do multiple pictures simultaneously.
In my setup at least it runs essentially in CPU mode since there is no CUDA acceleration available and metal support is really messy right now. So while quite slow I don't run into memory issues at least. It runs much faster on my desktop GPU but that has more constraints (until I upgrade my personal 1080 to a 3090 one of these days).
Yeah, I followed the instructions on a M1 Macbook Pro (Monterey 12.5.1) and it worked without extra effort. 30-40 seconds per image. I have 32GB but image generation doesn’t even use half of it.
The hard part has been to generate prompts that do what I want.
> Regarding the opening image: if it can't correctly put the marks on dice, how can it put eyes, nose and mouth correctly on a human face?
It helps if you consider it all as effectively advanced compression. Everything the model can do is limited by its architecture, the number of parameters in the model, and the accuracy and size of the training data.
The underlying architecture is a transformer (e.g. GPT3) wired to a (denoising) diffusion model.
Current flaws with this approach:
- Transformers seem to approach a "bag-of-words" model, often ignoring the ordering of the words. Among other things, this means that text-to-image models are very bad at "binding attributes" [0]. This is why "a boy wearing a red shirt and a girl wearing a black jacket" may fail (putting the colors on the wrong items, for instance).
- Autoregressive transformers have no means to correct early mistakes.
- Training data is captioned images and the captions are likely noisy and under-specified. Every time it sees a face labeled as "face" - it tries to generate a face from the distribution of _all_ faces in the data. The same goes for the dice. If the dice are just labeled "dice", but don't have a description of how they landed - the model has to guess which angle you're referring to. As a sibling comment points out, this is exacerbated by the relative frequencies of examples of the data in the dataset.
Well, I kid a bit. I’ve seen it produce some amazing results, but, generally, it has a hard time with that. Often faces end up looking blurry or having these creepy, dead white eyes. Hands likewise often end up malformed (seven fingers anyone?) and twisty. But, it seems to have a much easier time generating passable faces in close ups with the right key words. Especially if you give it an input image that already has a clear one. It also seems to have an easier time doing faces it already knows like a celebrity, presumably because it’s using a strong existing influence instead of inventing/hallucinating it.
Supposedly this is improved in their new 1.5 version which is in beta. The software is so compelling that I suspect this will be improved quite quickly. Also, I think either way workarounds will emerge, either by composing with other networks/software (some UIs have GANs for face correction) or the old fashioned way by photoshopping over the blemishes.
It’s worth noting you also get MUCH better faces and hands if you’re willing to run it for more steps (100-150). It takes a lot longer to run it at higher step counts so a lot of people don’t do it.
Yep, it gets faces mostly right, but as they say, the devil's in the details. Eyes in particular don't seem to have clearly delineated concentric circles for irises and pupils, instead they are often rendered as a "swirl".
Interesting to note how specialised GFPGAN is, as some of the other details (flowers, hair) seem to be worse in the processed image. I plan to finish this image by manually blending the best of both pictures.
How does copyright work with output images?
If someone runs the model on their own hardware, do they "own" the images generated?
If 2 people generate the same image using the same prompt/seed, who "owns" the image?
In particular:
>“Even though you argue that there is some human creative input present in the work that is distinct from RAGHAV’s contribution, this human authorship cannot be distinguished or separated from the final work produced by the computer program,” the office stated.
The US does seem to be a bit of an outlier here. The above work was granted copyright in Canada and India.
That’s one case, not a ruling about all AI generated art. It won’t be the same for every image involving AI in some way. What if you use AI to fill in a portion of an image, as with Adobe’s content aware fill? What if you use a series of SD steps but with a human selecting outputs and feeding them back in as inputs to get something else the AI could not have come up with on its own? The copyright conversation is only just beginning.
>That’s one case, not a ruling about all AI generated art
"Because copyright law as codified in the 1976 Act requires human authorship, the Work cannot be registered."
The actual ruling (and a similar USPTO discussions) are about AI generated art and talk extensively about it in the broad case. The stance of these organizations is that AI generated art is not copyrightable. I don't disagree that the line is blurred when you discuss content aware fill, where the AI is working on a portion of it, but the current use of SD, even img2img and multiple prompts, etc., quite clearly falls outside of human authorship as recognized by the US Copyright and Patent offices.
Might this change in the future? Possibly. But as it stands today, I would not make any plans that assume you can secure the copyright (in the US) to anything made with SD.
Edit: Going through and noting that I'm not a lawyer and this isn't legal advice, don't listen to some random on the internet for legal advice, get a lawyer if you need it.
> quite clearly falls outside of human authorship as recognized by the US Copyright and Patent offices.
I think these are answering a slightly different question, as they are asking if the AI itself can hold the copyright on the output. A bit like if someone tried to copyright an image and assign “Photoshop” as the author.
The question above is maybe closer to asking if the person using an ML model can get copyright on the output, in that case there is a person trying to own the copyright, so I suspect it would not be rejected so easily.
>as they are asking if the AI itself can hold the copyright on the output.
Who is?
The original question I replied to:
>If someone runs the model on their own hardware, do they "own" the images generated?
This seems to be straightforward - Thaler tried to receive the copyright for the artwork generated by his Creativity Machine. He was denied, because the copyright office does not believe that a neural network generated image has human authorship.
From the Copyright office paper: "he [Thaler] was “seeking to register this computer-generated work
as a work-for-hire to the owner of the Creativity Machine.”"
>A bit like if someone tried to copyright an image and assign “Photoshop” as the author.
This is also clearly outside of the scope of copyrightable work per the reasoning given by the copyright office.
Both questions are thoroughly answered at this moment unless Thaler wins his appeal.
Edit: Going through and noting that I'm not a lawyer and this isn't legal advice, don't listen to some random on the internet for legal advice, get a lawyer if you need it.
> > as they are asking if the AI itself can hold the copyright on the output.
> Who is?
Thaler is. I’ve only read the intro sections of the documents you linked to, so I may have missed something more fundamental later, but the key points seem to be:
> The author of the Work was identified as the “Creativity Machine,” ... the Work “was autonomously created by a computer algorithm running on a machine”
and:
> Thaler must either provide evidence that the Work is the product of human authorship or convince the Office to depart from a century of copyright jurisprudence. He has done neither.
So in this case they are asking if the AI can be the author.
Whereas the question in this thread was:
> If someone runs the model on their own hardware, do they "own" the images generated?
In that case, a human is providing a prompt to the model (providing creative input), and asking if they themselves count as the author (a human rather than a neural net), so it seems like a significantly different case.
Thaler specifically asks for himself to be given the copyright assignment in the filing, claiming that the AI is essentially creating it in a work-for-hire. He does not ask for the Creativity Machine to be assigned the copyright.
>In that case, a human is providing a prompt to the model (providing creative input), and asking if they themselves count as the author (a human rather than a neural net), so it seems like a significantly different case.
I don't know that I specifically agree with this, but this is probably due to me having read additional articles on similar filings, including one where someone took a photograph, applied a style transfer AI to it, and then tried to copyright the resulting image, and was denied, because the copyright office found that there was not evidence that the work was a product of human authorship.
Andres Guadamuz (a lawyer specializing in IP law, senior lecturer at Sussex university, and a proponent of AI generated work being copyrightable) discusses a lot of this in https://www.technollama.co.uk/dall%c2%b7e-goes-commercial-bu... - but the most relevant part to this discussion is "For the most part, the legal consensus appears to be that the images do not have any copyright whatsoever, and that they’re all in the public domain."
The user experience for DALL-E, StableDiffusion, Midjourney, etc. are all essentially the same - craft a prompt, fine-tune it, get artwork out, so his discussion should be broadly applicable to all of these similar tools.
Thanks for the link, interesting reading. I can totally appreciate the angle that some generations may be too trivial to be worthy of protection.
I happen to be in the UK, and this happens to match my expectations, but it does strongly imply more regional variation than I’d have guessed:
> The situation may be different in the UK, where copyright law allows copyright on a computer-generated work, the author of which is the person who made the arrangements necessary for the work to be created. This, in my opinion, is the user, as we come up with the prompt and initiate the creation of the specific work. I think that there may be a good case to be made that I own the images I create in the UK.
This was a Style Transfer AI - it takes a source image and recreates it in the style of a painter.
In this case, the person both took the photo that the style was transferred to, and selected the style and a variety of variables. The US Copyright office still felt that his contribution was not distinguishable from the work that the AI did.
I'll note that this is very US specific - there are a lot of counter-examples of other countries allowing for the copyright of work like this, including the EU, UK, Canada, India, etc.
What if I create a fully automated image site where people can purchase images. All the images are generated by SD based on keywords I scrape from competitor websites. Would be quite easy to create a website like that.
Is it any different than Photoshop content aware fill? Or using a camera?
Nobody would ever think about Adobe or Nikon having copyright claims over your pictures. For me it's just a tool, the artistic part is providing a good description/base image, refining and choosing the best output.
Anyway, I'm not a lawyer and we probably live in different countries, so it'll be interesting to wait for the first lawsuit.
> Nobody would ever think about Adobe or Nikon having copyright claims over your pictures
But it is illegal to share pictures of Eiffel Tower, for example.
People do it, but they shouldn't.
If I put a picture of Eiffel Tower at night in a book or any other kind of commercial product, I have to pay to use it. Doesn't matter that it's there for my eyes to see it.
The question is: are the images generated by an hyper accelerated learning machine using copyrighted material without the author's consent legal?
I think they shouldn't be and the data included in the training should be free or licensed.
yes what I mean was mixing stable diffusion into the UI styling without the fine atomic details. Like say I want a dashboard with color themes from cyberpunk complete with graphics/art/logo. It would be a "good enough" category if something can literally be told to produce a complete frontend.