EDIT 2: After playing around with this repo, I've found:
- It offers better UX for interacting with Stable Diffusion, and seems to be a promising project.
- Running txt2img.py from lstein's repo seems to run about 30% faster than OP's. Not sure if that's a coincidence, or if they've included extra optimisations.
- I couldn't get the web UI to work. It kept throwing the "leaked semaphor objects" error someone else reported (even when rendering at 64x64).
- Sometimes it rendered images just as a black canvas, other times it worked. This is apparently a known issue and a fix is being tested.
I've reached the limits of my knowledge on this, but will following closely as new PRs are merged in over the coming days. Exciting!
I had to manually install pytorch for the preload_models.py step to work, because ReduceOp wasn't found. Why even use anaconda if all the dependencies aren't included? Every time I touch an ML project, there's always a python dependency issue. How can people use a tool that's impossible to provide a consistent environment for?
You are completely correct that there are a lot of dependency bugs here, I would just like to pedantically complain that the issue in question is PyTorch supporting MPS, which is basically entirely a C++ dependency issue rather than a Python one. (PyTorch being mostly written in C++ despite having "py" in the name.) And yeah the state of C++ dependency management is pretty bad.
FYI: black images are not just from the safety checker.
Yes, the safety checker will zero out images but can just turn it off with an “if False:”; Mostly black images are due to a bug, especially frustrating because it turns up on high step counts and means you’ve wasted time running it.
My experience has been roughly 2-4/32 of an image batch comes back black at the default settings, regardless of the prompt.
Just stamp out images in batches and discard the black ones.
To get past `pip install -r requirements` I had to muck around with CFLAGS/LDFLAGS because I guess maybe on your system /opt/homebrew/opt/openssl is a symlink to something? On mine it doesn't exist, I just have /opt/homebrew/opt/openssl@1.1 symlinked to /opt/Cellar/somewhere.
You have to download the model from the huggingface[0] site first (requires a free account). The exact steps on how to link the file are then detailed here[1].
I did this but then moved the directory. When re-linking and checking with ls for the path I thought "oh, alright, it's already there". Oh well, better check with ls -l earlier next time.
Check my comment alongside yours, I got Conda to work but it did require the pre-requisite Homebrew packages you originally recommended before it would cooperate :)
I couldn't get the setup process working until I switched the python distro to 3.10, as the scripts were relying on typings features that were added in 3.10 even though the yml file specified 3.9. Was strange.
I was following the github issue and the CPU bound one was at 4-5 minutes, the MDS one was at 30 seconds, then 18 seconds, and people were still calling that slow.
What is it currently at now?
and I don't know what "fast" is, to compare
What are the Windows 10 with nice Nvidia chips w/ CUDA getting? Just curious whats comprehensive
I’m referring to there being a community effort to normalize performance metrics and results at all, with the M1 devices being in that list as well, so that we dont have to ask these questions to begin with
I only run 1 sample at a time (batch size 1), forgot to mention that, and that affects the step time.
It looks like each additional image in a batch is cheaper than the 1st image. For example if I reduce my resolution so I can generate more in a single batch
1 image, 50 steps, 320x320: 5s
2 images, 50 steps, 320x320: 8s
3 images, 50 steps, 320x320: 11s
4 images, 50 steps, 320x320: 14s
And the trend continues, and my reported iteration/sec goes down as well. It's not accounting for the fact that with steps=50 and batch size=4 it's actually running 200 steps, just in 4 parallel parts.
I just commented on another sibling comment (too late to edit the first one), but I forgot to mention my batch size is only 1. I think most people use batch size 4, so basically multiply my time by your batch size for a real comparison.
It was my bad, my script was still running a different fork. Seeing <10 second times with those parameters now. 13.6 seconds for an 3072 × 2048 upscaled image, which I'm particularly happy about.
It's falling back to CPU. Follow the instructions to use a GPU version - sometimes it's even a completely different repo, depending on whose instructions you're following.
I changed the dependency to 3.10.4 (tried 3.10.6 as well), installed python 3.10.4, deactivated and activated ldm environment, but it still uses python 3.9
Since I don't know how to use conda, I had to struggle a bit to learn how to recreate the environment.
Here's the commands that worked me for future reference:
You can now run the Lstein fork[1] with M1 as of a few hours ago.
This adds a ton of functionality - GUI, Upscaling & Facial improvements, weighted subprompts etc.
This has been a big undertaking over the last few days, and I highly recommend checking it out. See the mac m1 readme [3]
[0] https://github.com/magnusviri/stable-diffusion
[1] https://github.com/lstein/stable-diffusion
[2] https://github.com/lstein/stable-diffusion/blob/main/README-...