Hahahahaha you sweet summer child. Training code? For an art generator?!
Yeah, no. Nobody in the AI community actually provides training code. If you want to train from scratch you'll need to understand what their model architecture is, collect your own dataset, and write your own training loop.
The closest I've come across is code for training an unconditional U-Net; those just take an image and denoise/draw it. CLIP also has its own training code - though everyone just seems to use OpenAI CLIP[0]. You'll need to figure out how to write a Diffusers pipeline that lets you combine CLIP and a U-Net together, and then alter the U-Net training code to feed CLIP vectors into the model, etc. Stable Diffusion also uses a Variational Autoencoder in front of the U-Net to get higher resolution and training performance, which I've yet to figure out how to train.
The blob you are looking at is the actual model weights. For you see, AI is proprietary software's final form. Software so proprietary that not even the creators are allowed to see the source code. Because there is no source code. Just piles and piles of linear algebra, nonlinear activation functions, and calculus.
For the record, I am trying to train-from-scratch an image generator using public domain data sources[1]. It is not going well: after adding more images it seems to have gotten significantly dumber, with or without a from-scratch trained CLIP.
[0] I think Google Imagen is using BERT actually
[1] Specifically, the PD-Art-old-100 category on Wikimedia Commons.
The SD training set is available and the exact settings are described in reasonable details:
> The model is trained from scratch 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. Then it is further trained for 850k steps at resolution 512x512 on the same dataset on images with resolution >= 512x512.
To my eye, kmeisthax's comment appears to be entirely accurate.
Well, that is to say, that assuming the facts that listed are accurate, then I agree with the conclusion: it's not "open source" at all. (And certainly not Libre.)
The things you said do not describe an open source project.
The point here is that the title of this thing is incorrect. If the ML community doesn't agree, it's because they are (apparently) walking around with incorrect definitions of "open source" and "Free Software" and "Libre Software".
Yeah, no. Nobody in the AI community actually provides training code. If you want to train from scratch you'll need to understand what their model architecture is, collect your own dataset, and write your own training loop.
The closest I've come across is code for training an unconditional U-Net; those just take an image and denoise/draw it. CLIP also has its own training code - though everyone just seems to use OpenAI CLIP[0]. You'll need to figure out how to write a Diffusers pipeline that lets you combine CLIP and a U-Net together, and then alter the U-Net training code to feed CLIP vectors into the model, etc. Stable Diffusion also uses a Variational Autoencoder in front of the U-Net to get higher resolution and training performance, which I've yet to figure out how to train.
The blob you are looking at is the actual model weights. For you see, AI is proprietary software's final form. Software so proprietary that not even the creators are allowed to see the source code. Because there is no source code. Just piles and piles of linear algebra, nonlinear activation functions, and calculus.
For the record, I am trying to train-from-scratch an image generator using public domain data sources[1]. It is not going well: after adding more images it seems to have gotten significantly dumber, with or without a from-scratch trained CLIP.
[0] I think Google Imagen is using BERT actually
[1] Specifically, the PD-Art-old-100 category on Wikimedia Commons.