One thing I noticed is that on GCP if you create a a2-ultragpu (Nvidia a100 80gb) and you select a spot instance, the price estimate goes down to $0.33 hourly ($240/m) which sounds really good if it's not a mistake. I was wondering if you could then turn a single A100 into 7 GPUs using Multi-instance GPUs. So on an 80gb one you get 7 10GB GPUs (can't have 8 due to yield issues on those cards). I'm pretty sure that will run much slower than on the full instance, but not 7x slower so if you're running a larger service at scale this could be an option to parallelize things. If someone is able to get that running please let me know how it performs.
The next thing I considered was just buying up a ton of 3060 12gb cards (saw a few new ones for $330) and just hosting a server from my house. This might be a good option if you don't care about speed but care about throughput.
RTX 3090s are also decent in terms of price per iteration of Stable Diffusion. If you want to build a fast service like Dreamstudio I think it's the only option to be able to do it at a reasonable price. If you want to host these in the cloud using consumer RTX cards, you'll have to go with less reputable hosts since Nvidia doesn't allow it. I don't want to name any since I can't vouch for them, but there are some if you search. The cheapest option will be to buy them and host it yourself.
I'm still researching what the best price/performance is for hosting this so if you have any findings please share.
I'm experimenting with your Discord bot right now. It would be great to have a command that shows where your processes are currently in the queue or maybe the discord bot can update on queue position.
BTW thanks for putting this bot together! I've been playing with it. Very comparable results to Midjourney, no surprise. Would love the ability to do variations and upscaling, but I know that's very GPU resource intensive and you're doing this out of your own pocket (how are you affording this? How much is it costing?)
Right now it‘s $0,65 per hour, for an AWS g4dn.xlarge instance which features an NVIDIA T4.
I‘m not really affording this, to be honest — I’m looking forward to switch to a Spot instance tomorrow, which could bring costs down to about $0,20 per hour, but even then I will have to switch it off in a couple of days.
I‘m working on a significant speed improvement — if that works out, and users get a result in under 1 minute if they are first in line, then maybe it‘s possible to make the bot finance itself through a credits system.
I assume all the code is open source on github if I'd like to run this myself on my own expense? Maybe an installation / configuration guide if there isn't already one would be helpful.
I submitted 2 /draw requests with prompts, got quoted a time 15-30 min for first one and then 17-34 for 2nd, submitted about 5 minutes apart but it's been now past the upper limit of the quoted time without any results. I'm assuming that the image generation has failed or perhaps the bot got stuck. Having some way of knowing would be helpful.
Just got one of the images back. Looks like you might want to double your time estimates. Also I got a Rick Roll meme image back as one of the results. I assume this is some sort of failure mode response?
You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...
Talk to it using the /draw Slash Command.
It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.
Oh and get your prompt ideas from https://lexica.art if you want good results.
PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?