Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Run Stable Diffusion on Intel CPUs (github.com/bes-dev)
236 points by amrrs on Aug 29, 2022 | hide | past | favorite | 106 comments


For those who wants to know before installing on my 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, it consumes 13 GB of RAM and takes 4 minutes to generate an image with 32 steps (~2 min for 16 steps).


On my Intel i5-4590T (8G RAM) it takes around 5-6 minutes to generate with 32 steps, swapping to disk as it does consume around 13G memory total. You don't get real-time feedback but it's very usable and fun to play with. I wish there was an option to force a manual seed though.


FYI they just added the seed option to the repo today. Now waiting for img2img...


On Linux/i5-1135G7 - takes 3min very consistently for 32 steps. Memory use: ~13.5gb VIRT, 9.3gb RES.


Working in WSL (Windows 10) Ubuntu on Ryzen 5600X; uses ~11GB of RAM and takes 2m04s with the default settings.

This is the first time I've played with a text-to-image model. I was aware that so-called "prompt engineering" can be tricky, but it's wild to see it for myself. A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.


> A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.

It shouldn't really, have you tried generating a few images with each prompt with and without space?

Even with the same prompt, you can get a wide variety of quality.


> Ryzen 5600X

Ooh, I've got one of those! I've been getting by trying to run it on my PC, which for various reasons currently has a 5800 and a GTX 1050 4GB, which can just bearly handle optimizedSD at 90s/image but runs out of memory if I try to use the popular webui repo. Swapping to the 5600X might be worth it!


That's very surprising and shouldn't be the case in general (the exception being things like compound words or spelling errors maybe).

Do you have some examples?

Are you fixing the random seed? If not the variation is more likely to be that than a single space.


What OS does this need? Using Ubuntu 20.04, I'm getting stuck on openvino:

> $ pip install > Could not find a version that satisfies the requirement openvino==2022.1.0 (from -r requirements.txt (line 6)) (from versions: 2021.4.0, 2021.4.1, 2021.4.2)

I even upgraded to python3.9, which, inexplicably, is required but not available in the "supported" OS.

EDIT: apparently it requires a version of pip that's newer than the one bundled with Ubuntu.


For anyone else who runs into this issue, run this: pip install --upgrade pip

Then run this again: pip install -r requirements.txt


also, make sure you're using python3.9 or lower

https://stackoverflow.com/a/70501550/21539


I've set up a Discord Bot that turns your text prompt into images using Stable Diffusion.

You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...

Talk to it using the /draw Slash Command.

It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.

PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?


Oh and get your prompt ideas from https://lexica.art if you want good results.


then - how far away are we from having it on M1/M2 Macs, at least with regular processing? openvino may be one path I suppose: https://github.com/openvinotoolkit/openvino/issues/11554


I found this repo early on and have been using it to run inference on my M1 Pro MBP. https://github.com/ModeratePrawn/stable-diffusion-cpu

For me it runs at about 3.5 seconds per iteration per picture at 512x512.

There is also a fork that uses metal here and is much faster: https://github.com/magnusviri/stable-diffusion/tree/apple-si... but it doesn't support seeding the rng and will occasionally produce completely black output. Useful if you want to spit out a whole bunch of images for one prompt but you lose the ability to re-run a specific seed with a tweaked prompt or increased iterations.


> For me it runs at about 3.5 seconds per iteration per picture at 512x512.

Wow that's impressively fast, I have a relatively recent Nvidia GPU that still takes 10 seconds. And the GPU is already almost as big as the entire macbook


I think that's per iteration, so the total time for the image is 32 times that


Oh yeah i may have used confusing terms there. What I mean was 3.5s per 'step'. A full image takes quite a bit longer.


Ah my fault for not reading carefully. But I don't feel as bad about my big GPU anymore now


I'm using the fork here: https://github.com/magnusviri/stable-diffusion.git (apple-silicon-mps-support branch).

Pretty easy to set up, though I had to take all the Homebrew stuff out of my environment before setting up the Conda environment (can also just export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1, at least in my case).

Otherwise, I followed the normal steps to set things up, and I'm now here generating 1 image every 30 seconds at default settings. This is on a M1 Max MacBook Pro at 64GB RAM.



I've been using this on my M1 Max and it works pretty well, 1.65 iterations per second (full precision, whereas my PC's 3080 can only do half-precision due to limited memory)... a 50-iteration image in about 40 seconds or so.


Your 3080 should be able to do full precision. Are you sure you don’t have the batch size set greater than 1, or another issue along those lines?


Thank you and smoldesu for letting me know it should work, I'll have a better look into what's going on - it didn't immediately work on Windows in full precision (probably a batch size issue as you suggested) and I gave up...

I shouldn't have given up so easily, but my tolerance for annoyances on Windows is pretty low (that Windows machine is kept for gaming, the last time I used a Windows machine for anything but launching Steam was when Windows 2000 was the hot new thing...)


> full precision, whereas my PC's 3080 can only do half-precision due to limited memory

What model are you using? I've been running full-precision SD1.4 on my 3070, albeit with less than 10% VRAM headroom.


this worked fine for me, and running side by side with Intel CPU + nVidia 2070 it actually does not take much longer (and as a sibling said, seems to be working at full precision). It is one of the first things I've done that has properly made my M1 Max's fan spin up hard though!


PyTorch for m1 (https://pytorch.org/blog/introducing-accelerated-pytorch-tra... ) will not work: https://github.com/CompVis/stable-diffusion/issues/25 says "StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal. Generating one image with 50 steps takes 4-5 minutes."


Yeah you can. Using the mps backend, just set PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU for unimplementeded ops. Takes a minute but it's mostly GPU accelerated.


By comparison I can generate 512x512 images every 15 seconds on an RTX 3080 (although there's an initial 30 second setup penalty for each run)


those guys are also working on it atm :-) https://github.com/lstein/stable-diffusion/pull/179


I got it working in about an hour on M1 ultra, mostly compiling things and having to tweak some model code to be compatible with metal. It works pretty well, about 1/10 to 1/20 of performance I can get on a 3080.


Great work! Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists


Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.

Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.

[1] https://bellard.org/libnc/

[2] https://bellard.org/libnc/gpt2tc.html

[3] https://textsynth.com/

[4] http://www.mattmahoney.net/dc/text.html#1085


Yet another gem by the genius Fabrice B!

I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)


Yes, TextSynth.com is a commercial service, see pricing [1]. If his code is faster than others' (I'd certainly believe it) then it's quite valuable, and he deserves to be able to monetise it. Edit: Also, OpenAI is slashing price for GPT-3 by 2-3x tomorrow because of "progress in making our models more efficient to run" [2].

Also, he was/is competing for the Hutter Prize with nncp, however he is outside the requirements for the prize: CPU-time, RAM, but most especially that submissions shouldn't require a modern CPU (with AVX-2) or a GPU. Otherwise he could have won it. I suspect it's actually that's the biggest reason he implemented libnc without GPU support initially. He has asked for the rules to be changed to allow AVX2 and I believe they eventually will be. So he won't give away the source for nncp yet, but will have to open source it to receive the prize.

[1] https://textsynth.com/pricing.html

[2] https://help.openai.com/en/articles/6485334-openai-api-prici...


I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have

On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci

Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out

[0] https://huggingface.co/EleutherAI/gpt-j-6B [1] https://huggingface.co/EleutherAI/gpt-neox-20b [2] https://huggingface.co/facebook/opt-66b


What's the status of running SD on AMD GPUs?


https://rentry.org/tqizb explains how to install ROCm and then pytorch for ROCm

ROCm does not support APU, here is the list of supported GPU: https://docs.amd.com/bundle/Hardware_and_Software_Reference_...


Tried it, could not get it to run on my RX570. Read conflicting information whether that series is still supported or not.

rocm-smi detects the card and shows me the temps etc., but rocminfo throws an error.

I found this repo which should apparently fix it, didn't change my situation though: https://github.com/xuhuisheng/rocm-gfx803/


what does APU mean here?


Cpu + gpu on the same die/chiplet thing. So integrated gpu in and marketing speak is apu


Amd's integrated gpu together with the processor.


Where can i get up to speed on what’s coming down there pipeline in this ai/ml image making scene?

(And learn the agreed upon terms)


It’s mostly all on discord. These are the two most active ones with devs on board https://discord.gg/QNxzjUfu https://discord.gg/nZ3hkXRV


Twitter too


any profiles I should follow? just need a couple so that the algorithm suggests the other ones and the rabbit hole forms itself


I suggest starting with the companies and key public employees.

https://twitter.com/EMostaque https://twitter.com/RiversHaveWings

https://twitter.com/laion_ai https://twitter.com/StableDiffusion https://twitter.com/openart_ai

Hashtags:

#stablediffusion #dalle #aiart #generativeart

For a taste of what's coming further down the pike, I recommend senior ML researchers.

https://twitter.com/boredbengio https://twitter.com/jure


The best way I think is to try and run these models yourself. Depending on technical ability, you may want to run them on your own hardware, or use a service like dreamstudio.ai (which is run by the team behind Stable Diffusion afaik)


Noone can tell.

Pandora's box has been opened.

Nothing is true, everything is permitted.


For a very high level view on what new technologies are coming, you can check out the Two Minute Papers youtube channel: https://www.youtube.com/c/K%C3%A1rolyZsolnai


I am on a team working on alternative YouTube recommendations and I discovered Yannic Kilcher and Machine Learning Street Talk off the list of channel recommendations for Two Minute Papers. The whole list has a ton of AI channels:

https://channelgalaxy.com/id%3DUCbfYPyITQ-7l4upoX8nvctg/


7' 12" on an ancient Intel Core i5-3350P CPU @ 3.10GHz (!) using BERT BasicTokenizer, default arguments


On reddit I found some older GPUs take about 5 mins and here this video[1] says 5 mins for CPU using this OpenVino library. Not sure if OpenVino makes CPU chips compete with GPUs. Has anyone heard of OpenVino before ?

1.https://youtu.be/5iXhhf7ILME


OpenVINO is developed by Intel themselves, and is one of many methods to freeze models to make CPU inference possible and performant.

https://en.wikipedia.org/wiki/OpenVINO



I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?


Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.


A $500 RTX 3070 with 8GB of VRAM can generate 512x512 images with 50 steps in 7 seconds.


The RTX 2070 also shipped with 8GB of VRAM, just fyi.


Yep, 8GB works fine. The 2070 is where I started. I wouldn't consider it ideal, though. There will be cases where you'll wish you could increase the resolution a little more, or could do just a few more per batch, but you're getting CUDA out-of-memory errors


I didn't see any requirements on the page beyond a CPU on that list. Do you need a certain amount of RAM? Will more speed things up to a degree?


It used ~8GB of ram on my machine with similar generation time to the low vram fork of stable diffusion [1] running on my 4GB GTX1650.

[1] https://github.com/basujindal/stable-diffusion


Love this. OpenAI are livid. :^)


Why?


Because they no longer control the narrative.


It does ruin the illusion that you need crazy million dollar servers to run the model and the world would fall in to chaos if the public got their hands on these models.


To be fair, the world only just got their hands on this (and by world I mean people with decent hardware), so too soon to say what the ramifications will be.


Also to be fair, the job "AI ethicist" probably didn't exist as a real thing until a few years ago. So the people in those roles over at OpenAI likely have no idea what they're doing.


We wouldn't be able to run it ourselves if they hadn't trained it on 4000 GPUs for a month.


The cost of training is actually quite a bit less. Emad, the creator of SD stated this on Twitter:

"We actually used 256 A100s for this per the model card, 150k hours in total so at market price $600k"


Even if it was hard to train, you could make your own by fine-tuning a larger model for much cheaper.

That's called "base models". (or "foundation models" if you're Stanford trying to co-opt it)


suppose one has an idea for a different architecture / functional form etc, assuming the receiving model is substantially smaller so that the dominant computational cost is in the SD model, how long would effective knowledge distillation take on say a CPU?


That’s called teacher-student learning. It could still take weeks on a single machine easily, but renting more GPU time or getting free credits from somewhere is perfectly plausible.


Christ, so what happens when google throws a cheeky 10 million at a model?


The most powerful device I have is an ipad pro M1 16gb ram. Can I run this on that thing at all?


It is noticeably faster than the original model (~30-40%) on my machine.


openvino is an unsung hero.


can't get it to install requirements on Windows with Python 3.10 and MS Build Tools 2022. Any tips?


I found a pretty good Docker container for it, though that's only really switching you from solving Python problems to Docker ones. Worth trying out if you have a Linux box or WSL installed though: https://github.com/AbdBarho/stable-diffusion-webui-docker


python 3.10 will fail on openvino. i used these steps:

anaconda prompt

cd to the destination folder

conda create --name py38 python=3.8

conda activate py38

conda update --all

conda install openvino-ie4py -c intel

pip install -r requirements.txt

i also had to edit stable_diffusion.py and changed in the #decoder area: changed vae.xml and vae.bin to vae_decoder.xml and vae_decoder.bin respectively

from there i could run

python stable_diffusion.py --prompt "Street-art painting of Emma Stone dancing, in picasso style"

for img2img, use this (note DIFFERENT program):

python demo.py --prompt "astronaut with jetpack floating in space with earth below" --init-image ./data/jomar.jpg --strength 0.5


It needs python 3.9.


Anything for M1 GPU?


It works just fine.



Why do you keep posting the same comment under every SD post? It doesn't contribute to the discussion, and it's not very relevant to the OP. Some of the links don't even work anymore.


Or if you don’t want to tweak anything, just use the hosted version? https://beta.dreamstudio.ai/


It's pretty expensive. They give you 100 free credits but I burned though that in about 10 minutes just trying to figure out how things worked. Didn't get any nice images.

After that, it's $1 per 100 credits, so about $6 an hour maybe.


One credit is one regular-sized picture generation. If you want to make it bigger or run more steps (in my experience, this is less useful than it would seem), then it can run up to 3 or more credits per generation. That's still 3 cents a picture. Compared to Dall-E and even Midjourney, that's crazy cheap.


I put $10 in and it’s lasted a long time, but I don’t generate images every day. Cheaper than getting a good graphics card, anyway.


Because I’ve got a 3080 already and don’t want to spend money?


Yep. Not everyone does, though, and the paid service is quite convenient.


This link appears to be dead:

> can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr



https://laion.ai/faq/ Based on the FAQ of the dataset that was used for training of https://huggingface.co/spaces/stabilityai/stable-diffusion

   LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.
I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):

1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?)

2. Any model trained on the "linked" data may commit copyright infringement.

3. As consequence, you using generated images may be liable.

I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!?

I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing?

Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that allows remixing. Would be a hell of a attribution list but definitely better than what is presented here.

EDIT: Formatting

EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above.


If these AIs were actually just "remixing" and creating collages, then perhaps I would agree with you... but there is no exact pixel data stored here. This is fairly obvious when you consider that Stable Diffusion was trained on 100 terabytes of images yet the actual model file is 4gb.

Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.


Bad cases make bad law - if you argue too hard in the direction of "any copyrighted material in the AI's training set makes it copyrighted" this could lead to, say, "Disney owns any animated movie made by someone who watched a Disney movie".

You can make an AI that doesn't memorize a specific training input; similarly you could probably make one that intentionally memorizes them. Both of these seem useful.


It's not simply a given that using copyright material to train a model is copyright violation.

In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.


It is likely legal, but is it ethical? If it is not ethical, should it be legal?

We do tend to treat humans differently based on them being sentient beings with a limited lifespan, not machines.


I strongly (actually very strongly) feel that it is ethical.

I also feel that the act of producing plagiarized content is unethical, immoral and I'd be supportive of new concepts in intellectual property law that make it illegal too.


I'm all for having these models scrutinized for copyright violations (and possibly amending copyright laws), but this comment is nothing but low-effort FUD.


Is feeding copyrighted material into an AI really copyright infringement?


If it is, then every human brain is guilty of copyright infringement.


I don't know, but it's quite entertaining when the output occasionally has a corrupted, but recognizable, Getty Images watermark: https://imgur.com/SmibVME

(Prompt: "A horse delivering mail in New York City, 1870")


That's the scariest horses I have ever seen.


Is training the model infringement or is distributing the model infringement?

What if you trained the model and only distributed generated images?

Is a human making art "in the style of" also infringement?


Legally, it is uncharted territory on many levels. I think there are good arguments to be made that these systems violate the intent behind copyright and trademarks, but not necessarily the laws.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: