Hacker News new | past | comments | ask | show | jobs | submit login

Magnusviri[0], the original author of the SD M1 repo credited in this article, has merged his fork into the Lstein Stable Diffusion fork.

You can now run the Lstein fork[1] with M1 as of a few hours ago.

This adds a ton of functionality - GUI, Upscaling & Facial improvements, weighted subprompts etc.

This has been a big undertaking over the last few days, and I highly recommend checking it out. See the mac m1 readme [3]

[0] https://github.com/magnusviri/stable-diffusion

[1] https://github.com/lstein/stable-diffusion

[2] https://github.com/lstein/stable-diffusion/blob/main/README-...




Brilliant, thank you! I just got OP's setup working, but this seems much more user-friendly. Giving it a try now...

EDIT: Got it working, with a couple of pre-requisite steps:

0. `rm` the existing `stable-diffusion` repo (assuming you followed OP's original setup)

1. Install `conda`, if you don't already have it:

    brew install --cask miniconda
2. Install the other build requirements referenced in OP's setup:

    brew install Cmake protobuf rust
3. Follow the main installation instructions here: https://github.com/lstein/stable-diffusion/blob/main/README-...

Then you should be good to go!

EDIT 2: After playing around with this repo, I've found:

- It offers better UX for interacting with Stable Diffusion, and seems to be a promising project.

- Running txt2img.py from lstein's repo seems to run about 30% faster than OP's. Not sure if that's a coincidence, or if they've included extra optimisations.

- I couldn't get the web UI to work. It kept throwing the "leaked semaphor objects" error someone else reported (even when rendering at 64x64).

- Sometimes it rendered images just as a black canvas, other times it worked. This is apparently a known issue and a fix is being tested.

I've reached the limits of my knowledge on this, but will following closely as new PRs are merged in over the coming days. Exciting!


I followed all these steps, but I got this error:

> User specified autocast device_type must be 'cuda' or 'cpu'

> Are you sure your system has an adequate NVIDIA GPU?

I found the solution here: https://github.com/lstein/stable-diffusion/issues/293#issuec...


I had to manually install pytorch for the preload_models.py step to work, because ReduceOp wasn't found. Why even use anaconda if all the dependencies aren't included? Every time I touch an ML project, there's always a python dependency issue. How can people use a tool that's impossible to provide a consistent environment for?


You are completely correct that there are a lot of dependency bugs here, I would just like to pedantically complain that the issue in question is PyTorch supporting MPS, which is basically entirely a C++ dependency issue rather than a Python one. (PyTorch being mostly written in C++ despite having "py" in the name.) And yeah the state of C++ dependency management is pretty bad.


FYI: black images are not just from the safety checker.

Yes, the safety checker will zero out images but can just turn it off with an “if False:”; Mostly black images are due to a bug, especially frustrating because it turns up on high step counts and means you’ve wasted time running it.

My experience has been roughly 2-4/32 of an image batch comes back black at the default settings, regardless of the prompt.

Just stamp out images in batches and discard the black ones.


I was able not to have black images by using a different sampler

--sampler k_euler

full command:

"photography of a cat on the moon" -s 20 -n 3 --sampler k_euler -W 384 -H 384


I tried that as well but resulted in an error:

AttributeError: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'

https://gist.github.com/JAStanton/73673d249927588c93ee530d08...


hi jastanto. Im on an intel mac running into the same problem. Did you find a workaround?


To get past `pip install -r requirements` I had to muck around with CFLAGS/LDFLAGS because I guess maybe on your system /opt/homebrew/opt/openssl is a symlink to something? On mine it doesn't exist, I just have /opt/homebrew/opt/openssl@1.1 symlinked to /opt/Cellar/somewhere.

The command that finally worked for me:

  python3 -m venv venv
  . venv/bin/activate
  CFLAGS="-I /opt/homebrew/opt/openssl@1.1/include" LDFLAGS="-L /opt/homebrew/opt/openssl@1.1/lib -L/opt/homebrew/Cellar/openssl@1.1/1.1.1q/lib -lssl -lcrypto" PKG_CONFIG_PATH="/usr/local/opt/openssl@1.1/lib/pkgconfig" GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1 pip install -r requirements.txt


Thank you with those extra steps I got it working now myself. At least I think thank you. My work productivity for the next few days might not agree.


Instructions don't work here, dead ends at

  FileNotFoundError: [Errno 2] No such file or directory: 'models/ldm/stable-diffusion-v1/model.ckpt'
Looks like there's a step missing or broken at downloading the actual weights.

Going up to the parent repo points at a bunch of dead links or hugginface pages.


You have to download the model from the huggingface[0] site first (requires a free account). The exact steps on how to link the file are then detailed here[1].

[0] https://huggingface.co/CompVis/stable-diffusion-v-1-4-origin... [1] https://github.com/lstein/stable-diffusion/blob/main/README-...


I did this but then moved the directory. When re-linking and checking with ls for the path I thought "oh, alright, it's already there". Oh well, better check with ls -l earlier next time.


Can you describe how you did (/ are doing) this? Do you now need to use conda (as opposed to OPs pip only version)?


See my edit for more info. (Just ironing out a couple of other issues I've found, so might update it again shortly)


I only get black images.


You have to disable the safety checker after creating the pipe


Nice. We'll get this guide updated for this fork. Everything's moving so fast it's hard to keep track!

We struggled to get Conda working reliably for people, which it looks like lstein's fork recommends. I'll see if we can get it working with plain pip.


I really appreciate the use of pip > conda. Looking forward to the update for the repo!


Running lstein's fork with these requirements[0] but seeing this output[1]. Same steps as original guide otherwise.

Anyone got any ideas?

[0] https://github.com/bfirsh/stable-diffusion/blob/392cda328a69...

[1] https://gist.github.com/bfirsh/594c50fd9b2e6b173e31de753a842...


Same output for me also.

EDIT: https://github.com/lstein/stable-diffusion/issues/293#issuec... fixed it for me.


Boom - nice. Here's a fork with that: https://github.com/bfirsh/stable-diffusion/tree/lstein

Requirements are "requirements-mac.txt" which'll need subbing in the guide.

We're testing this out with a few people in Discord before shipping to the blog post.


Thank you for these guides!


Which Discord?


Check my comment alongside yours, I got Conda to work but it did require the pre-requisite Homebrew packages you originally recommended before it would cooperate :)


I couldn't get the setup process working until I switched the python distro to 3.10, as the scripts were relying on typings features that were added in 3.10 even though the yml file specified 3.9. Was strange.


Conda is recommended because it starts from a clean environment so you're not debugging 13 other experiments the user has going on.


are there benchmarks?

I was following the github issue and the CPU bound one was at 4-5 minutes, the MDS one was at 30 seconds, then 18 seconds, and people were still calling that slow.

What is it currently at now?

and I don't know what "fast" is, to compare

What are the Windows 10 with nice Nvidia chips w/ CUDA getting? Just curious whats comprehensive


> What are the Windows 10 with nice Nvidia chips w/ CUDA getting?

Are you referring to single iteration step times, or whole images? Because obviously it depends on the number of iteration steps used.

Windows 10, RTX 2070 (laptop model), lstein repo. I get about 3.2 iter/sec. A 50 step 512x512 image takes me 15 seconds.


I’m referring to there being a community effort to normalize performance metrics and results at all, with the M1 devices being in that list as well, so that we dont have to ask these questions to begin with

Are you aware of any wiki or table like that?


Huh, that’s the same speed I get on Collab. Pretty good.


I only run 1 sample at a time (batch size 1), forgot to mention that, and that affects the step time.

It looks like each additional image in a batch is cheaper than the 1st image. For example if I reduce my resolution so I can generate more in a single batch

1 image, 50 steps, 320x320: 5s

2 images, 50 steps, 320x320: 8s

3 images, 50 steps, 320x320: 11s

4 images, 50 steps, 320x320: 14s

And the trend continues, and my reported iteration/sec goes down as well. It's not accounting for the fact that with steps=50 and batch size=4 it's actually running 200 steps, just in 4 parallel parts.


Wow, that is over twice as fast as my Windows 11, RTX 3080ti


I just commented on another sibling comment (too late to edit the first one), but I forgot to mention my batch size is only 1. I think most people use batch size 4, so basically multiply my time by your batch size for a real comparison.


It was my bad, my script was still running a different fork. Seeing <10 second times with those parameters now. 13.6 seconds for an 3072 × 2048 upscaled image, which I'm particularly happy about.


Wait, what? On my M1 imac I’m getting about 25 minutes. What am i doing wrong?


It's falling back to CPU. Follow the instructions to use a GPU version - sometimes it's even a completely different repo, depending on whose instructions you're following.



Around 6 seconds.


I ran into:

ImportError: cannot import name 'TypeAlias' from 'typing' (/opt/homebrew/Caskroom/miniconda/base/envs/ldm/lib/python3.9/typing.py)


I followed the conda instruction which uses Python 3.9 and ran into the same issue. The workaround is to import TypeAlias from typing_extensions:

stable-diffusion/src/k-diffusion/k_diffusion/sampling.py

(before)

  from typing import Optional, Callable, TypeAlias
(after)

  from typing import Optional, Callable
  from typing_extensions import TypeAlias
This issue is tracked in https://github.com/lstein/stable-diffusion/issues/302


you can also just change the python version in the yml file to 3.10.4 and it'll work


I ran into this. You need Python 3.10. I had to edit environment-mac.yaml and set python==3.10.6 ...


I changed the dependency to 3.10.4 (tried 3.10.6 as well), installed python 3.10.4, deactivated and activated ldm environment, but it still uses python 3.9


Can you delete your environment and try again?


Since I don't know how to use conda, I had to struggle a bit to learn how to recreate the environment. Here's the commands that worked me for future reference:

  conda deactivate
  conda env remove -n ldm
Then, again:

  CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac.yaml
  conda activate ldm


Thanks, it worked


This worked for me too.


TypeAlias is only used once, you can open sampling.py and remove the import on line 10 and the usage on line 14:

  from typing import Optional, Callable

  from . import utils

  TensorOperator = Callable[[Tensor], Tensor]


What do I need for the in painting? is there a source for the models/ldm/inpainting_big/last.ckpt' file?


I used this: wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=... Found it here: https://huggingface.co/spaces/multimodalart/latentdiffusion/...

This worked afterwards: python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results


Everything works excepts it only generates black images,

did you run

python scripts/preload_models.py

python scripts/dream.py --full_precision ?


Disable safety check


What's the performance of these models ,how much pc spec required for sane operation?


Cool




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: