I clicked through the links in the article, since they sounded technically interesting. They led to AI-generated porn. Those, in turn, led to pages about training SD to generate porn. Now, two disclaimers:
1) I am not interested in AI-generating porn
2) I haven't followed SD in maybe 6-9 months
With those out-of-the-way, the out-of-the-box tools for fine-tuning SD are impressive, well beyond anything I've seen in the non-porn space, and the progress seems to be entirely driven by the anime porn community:
10 images is enough to fine-tune. 30-150 is preferred. This takes 15-240 minutes, depending on GPU. I do occasionally use SD for work. If this works for images other than naked and cartoon women, and for normal business graphics, this may dramatically increase the utility of SD in my workflows (at least if I get around to setting it up).
I want my images to have a consistent style. If I'm making icons, I'd like to fine-tune on my baseline icon set. If I'm making slides for a deck, I'd like those to have a consistent color scheme and visual language. Now I can.
Thanks creepy porn dudes!
The other piece: Anyone trying to keep the cat in the bag? It's too late.
> the progress seems to be entirely driven by the anime porn community:
Its not entirely driven by porn communities, and the porn communities driving it aren’t entirely anime porn communities (and the anime communities driving it aren’t entirely porn communities.)
But, yeah, the anime + porn/fetish art + furry + rpg art + scifi/fantasy art communities, and particularly the niches in the overlap of two or more of those are, pretty significant.
> If this works for images other than naked and cartoon women
It does, and while it may not be large proportionally compared to the anime-porn stuff, there’s a lot of publicly distributed fine tuned checkpoints, LoRas, etc., demonstrating that it does.
It absolutely works for things other than naked and cartoon women. Here are some generations of my daughter and dog (together!). I believe most of these are from a fine tuned model of them and not an extracted LoRA, though I use that sometimes too: https://imgur.com/a/naHgnel
The space one without headphones is particularly cool.
I use it for D&D art generation. I can have a piece of art that somewhat matches every location/scene I have planned. If things don't match my plans I can generate 8 images and pick the best in about 2 minutes. I talk to a lot of other DMs who use it in a similar way.
It's not great with specific details, I plan to commission someone to draw the party when the campaign is over. But for things like a fantasy magic shop with potions, or a fantasy dungeon exterior, or a forest of mushroom trees, it's more than good enough for concept art to throw into Roll20. I couldn't afford 5-10 pieces of custom concept art per game, nor could I come up with the ideas for them 2 hours beforehand and have them ready for the session.
I have not and will try it soon now that it's out in an official release. But I doubt it can meet the level of detail my players have put into their own characters. I also fully expect to need to pay for revisions from whatever artist I commission. But I really want a piece of art that every player feels captures their character perfectly.
The ones you can apply for access to are the 0.9 weights, which have been available for a couple of weeks. Unless the SDXL 1.0 weights are also available by application somewhere that I'm unaware.
It sounds like after the previous 0.9 version there was some refining done:
> The refining process has produced a model that generates more vibrant and accurate colors, with better contrast, lighting, and shadows than its predecessor. The imaging process is also streamlined to deliver quicker results, yielding full 1-megapixel (1024x1024) resolution images in seconds in multiple aspect ratios.
Sounds pretty impressive, and the sample results at the bottom of the page are visually excellent.
They have bots in their discord for generating images bases on user prompts. Those randomize some settings, compare candidate models and are used for rlhf fine-tuning and that's the main source of refining which will continue even after release.
I always wondered why the vision models don't seem to be following the whole "scale up as much as possible" mantra that has defined the language models of the past few years (to the same extent). Even 3.5 billion parameters is absolutely nothing compared to the likes of GPT-3, 3.5, 4, or even the larger open-source language models (e.g. LLaMA-65B). Is it just an engineering challenge that no one has stepped up for yet? Is it a matter of finding enough training data for the scaling up to make sense?
Diffusion is more parameter-efficient and you quickly saturate the target fidelity, especially with some refiner cascade. It's a solved problem. You do not need more than maybe 4B total. Images are far more redundant than text.
In fact, most interesting papers since Imagen show that you get more mileage out of scaling the text encoder part, which is, of course, a Transformer. This is what drives accuracy, text rendering, compositionality, parsing edge cases. In SD 1.5 the text encoder part (CLIP ViT-L/14) takes a measly 123M parameters.[1] In Imagen, it was T5-XXL with 4.6B [2]. I am interested in someone trying to use a really strong encoder baseline – maybe from a UL2-20B – to push this tactic further.
Seeing as you can throw out diffusion altogether and synthesize images with transformers [3], there is no reason to prioritize the diffusion part as such.
> Seeing as you can throw out diffusion altogether and synthesize images with transformers [3]
That’s actually how this whole party got started. DALL-E (the first one) was a transformer model trained on image tokens from an early VAE (and text tokens ofc). Researchers from CompVis developed VQGAN in response. OpenAI showed improved fidelity with guided diffusion over ImageNet (classes) and subsequently DALLE2 using pixel space diffusion and cascading up sampling. CompVis responded with Latent Diffusion which used diffusion in the latent space of some new VQGANs.
The paper you mention is interesting! They go back to the DALL-E 1 method but train two VQGAN’s for upsampling and increase the parameter count. This is faster, but only faster than originally reported benchmarks using inferior sampling methods for their diffusion. I would be curious if they can beat some of the more recent ones which require as few as 10-20 steps.
They also improve on FID/CLIP scores likely by using more parameters. This might be a memory/time trade off though. I would be curious how much more VRAM their model requires compared to SD, MJ, Kandinsky.
The same goes for using T5-XXL. You’ll win FID score contests but no one will be able to run it without an A100 or TPU pod.
Is this still true in 2023? Sure, back in the dark ages it seemed like a 860M model is just about the limit for a regular consumer, but I don't see why we wouldn't be able to use quantized encoders; and even 30B LLMs run okay on Macbooks now.
They often reference this paper as the motivation for that https://arxiv.org/pdf/2203.15556.pdf I.e. training with 10x data and 10x longer can yield as good models as a gpt-3 model but with fewer weights (according to the paper) and the same principle applies in vision.
Diffusion is relatively compute intensive compared to transformers llms, and (in current implementation) doesn't quantize as well.
A 70B parameter model would be very slow and vram hungry, hence very expensive to run.
Also, image generation is more reliant on tooling surrounding the models than pure text prompting. I dont think even a 300B model would get things quite right through text prompting alone.
Hmm this is a good point, diffusion requires several (many?) inference passes as you refine the noise into an image, right? Makes sense that this is more expensive to scale up. Thanks for the explanation!
Do we know the amount of parameters Dall-e have these days, Firefly or Midjourney, etc?
If we are talking about Stable Diffusion, the reality is that... more parameters mean it will be hard to run locally. And let me tell you something, the community around Stable Diffusion only cares with NSFW... And want local for that...
Stable Diffusion 2 was totally boycotted by the community because they... banned NSFW from there. They had now to allow it again on SDXL.
Also, more parameters mean it will be more expensive to community finetunners to train as well.
I'm out of date on the image-generating side of AI, but I'd like to check things out. What's the best tool for image generation that's available on a website right now? Ie, not a model that I have to run locally.
I just tried this and the UI is very nice (better than dreamstudio), with nice tool integration, and image quality is definitely going up with each new release. You can see a few results at fb.com/onlyrolydog (along with a lot of other canine nonsense).
Not affiliated in anyway and not very involved in the space. I just wanted to generate some images a few weeks ago and was looking for somewhere I could do that for free. The link above lets you do that but I suggest you look up prompts because its a lot more involved than I expected.
Hey! Creator of ArtBot here. Thanks for plugging the site!
For those not aware, here's an interesting fact about ArtBot (and the AI Horde in general) -- we've been running an A/B test with Stability.ai for the last 3 weeks or so related to SDXL [1].
Any time a user generates an image using SDXL_beta on the AI Horde, they get two images back. They pick which image they think is best for the given prompt. This data is sent back to Stability.ai in order to help improve their image models.
In a similar vein, LAION partnered with the AI Horde earlier this year in order to gather aesthetics ratings for improving various image datasets. [2]
It's a cool little open source community and there's just a ton of stuff going on.
I just took the ones I liked and then deleted out the words that were specific to that image and left the ones that were providing the style of the image. So for example on the first one I would delete "an cute kitsune in florest" but would keep "colorfully fantast concept art". Then I just added a comma separated list of the of the features I wanted in my picture. It took a lot more trial and error than I thought and adding sentences seemed to be worse than just individual words. I am sure I barely scratched the surface of interfacing with the tool correctly but the space is moving so fast its not the kind of thing I want to spend my time learning right now just to have that knowledge deprecate in 6 months.
I've found https://firefly.adobe.com/ pretty good at composing images with multiple subjects. [disclaimer - I work at Adobe, but not in the Creative Cloud]
But I wouldn't say it's the "best." Just trained on images that weren't taken from unconsenting artists.
I was quite disappointed that the Photoshop generative fill stuff insists on running on Adobe's servers rather than locally. So however good it is, there are many of us who will never use it.
Yeah-- I can only assume it's to ensure a consistent experience and to not disperse the model openly. If you have the model running locally on people's computers, it limits who can use the generative AI and opens up a ton of headache around customer support. Again, I don't work on this, but I'm familiar with generative AI and what it takes to run.
There are toy AI things, but there is nothing quite like Stable Diffusion running on Colab. Lots of people recommended Midjourney but that is like playing with MSpaint. If you can get Stable Diffusion going with Automatic1111, its AAA tier. Especially with Control-net, and dreambooth, but that is part 2.
Google: The Last Ben Stable Diffusion Colab
for a way to not run it locally, but get all the features.
Is there anything like this for the vector landscape?
This may just be due to the iterative denoising approach a lot of these models take but they only seem to work well when creating raster style images.
In my experience when you ask them to create logos, shirt designs, illustrations, they tend to not work as well and introduce a lot of artifacts, distortions, incorrect spellings etc.
If you mean raster images that look like vector and contain arbitrary text and shapes, controlnets/T2I adapters do work for this. You could train your custom controlnet for this, too. (it requires understanding)
As for directly generating vector images, there's nothing yet. Your best bet is generating vector-looking raster and tracing it.
There are SD models tunes for vector like raster output. And XL has specifically focused on this use case as one of the improvements. Try SDXL 1 on Clipdrop or Dreamstudio.
A lot of people are having success by adding extra networks (lora is the most common) which are trained on the type of image you're looking for. It's still a raster image, of course, but you can produce images which look very much like rasterizations of vector images, which you can then translate back into SVGs in Inkscape or similar.
Midjourney is still going to be hard to beat imo. Comparing SD to MJ is a little unfair considering their applications and flexibility, but I do really enjoy the "out of the box" experience that comes with MJ.
I use both but StableDiffusion has better control over the workflow. With automatic111 I can generate a matrix of output based on prompt variations or parameter changes. I can also do bigger batches. And I can open multiple tabs and queue up several prompt variation matrices at once, then leave for an hour. I have a laptop rtx 2070 and a 512x768 takes about 20 seconds[0] or so. automatic111 also includes some upscaling AI once you've found the base image you want.
StableDiffusion needs you to be way more specific than Midjourney. MJ will fill in the gaps of your prompt to get a better image. SD usually won't.
MJ photos are higher quality with easier prompting IMO, but with a distinctive style. Even if you ask it to mimic some other style, it has that midjourney feel.
I mainly it for generating setting or character images for a D&D game. I use Midjourney more for characters.
I haven't done it in a while but I was cranking images out at 11s/output on a 3080. But it depends on your workflow, too. I started low res/low samples (32-64) and scaled up or used recursion until I got a desirable result or found a nice seed. I think I was doing 512x916 or something close to that.
With SD you have a lot of control over not just basics like image size and prompt complexity, but also things like how many iterations of which different sampler(s) get used.
So speed can vary wildly depending on how you're choosing to use it. And that's without even getting into the wide variance of hardware.
But generally speaking, it will usually be significantly faster than one image per minute.
Got any way to get individual images in relax mode? As it gives 4 images combined, and upscale is available in fast mode only. So that kind of makes relax mode useless.
MJ quality is significantly worse. Everything has the Pixar look and barely follows the prompt. Its nice for a toy, but SD with Automatic1111 is miles ahead of MJ.
I tried it in dreamstudio. Like all the other image generators I've tried, it's rubbish at drawing a piano keyboard or an accordion. (Those are my tests to see if it understands the geometry of machines.)
A couple of accordion pictures do look passable at a distance.
Another test: how well does it do at drawing a woman waving a flag?
One thing that strikes me is that it generates four images at a time, but there is little variety. It's a similar looking woman wearing a similar color and style of clothing, a similar street, and a large American flag. (In one case drawn wrong.) I guess if you want variety you have to specify it yourself?
AI models seem to be getting ever better in resolution and at portraits.
I hope someday there’s a version of this or something comparable to it that can run on <8gb consumer hardware. The main selling point of Stable Diffusion was its ability to run in that environment.
Yeah, Draw Things. It will be submitted as soon as SDXL v1.0 weights available. Quantized model should run on iPhones (4GiB / 6GiB models), but we haven't done that yet. So no, these are just typical FP16 weights on iPad.
Thanks! I guess I'll stick to running it on my Macbook for the time being until the quantized model gets uploaded. What kind of performance are you seeing with the FP16 weights on the iPad? I've run a few SD2.0-based (unquantized) models on my 2020 iPad Pro but it seems like it gets thermally throttled after a while.
Will be more info upon release. SDXL v0.9 performs generally the same as SD v1 / v2 on the same resolution. But because you tend to run it at larger resolution, you might feel it slower.
An NVIDIA-based graphics card with 4 GB or more VRAM memory. 6-8 GB of VRAM is highly recommended for rendering using the Stable Diffusion XL models
An Apple computer with an M1 chip.
An AMD-based graphics card with 4GB or more VRAM memory (Linux only), 6-8 GB for XL rendering.
"You must have Python 3.9 or 3.10 installed on your machine. Earlier or later versions are not supported. Node.js also needs to be installed along with yarn"
I don't like having to install npm when an existing dev stack (python) is already present.
Let's see wether derived models will suffer less from the 'same face actor'-model response to every portrait prompt. It's not trivial to get photoreal models not lookalike without resorting to specific, typically celeb based, finetunes.
SDXL is in roughly the same ballpark as MJ 5 quality-wise, but the main value is in the array of tooling immediately available for it, and the license. You can fine-tune it on your own pictures, use higher order input (not just text), and daisy-chain various non-imagegen models and algorithms (object/feature segmentation, depth detection, processing, subject control etc) to produce complex images, either procedural or one-off. It's all experimental and very improvised, but is starting to look like a very technical CGI field separate from the classic 3D CGI.
For bland stock photos and other "general-purpose" image generation, DALLE-2/Bing/Adobe etc are... the okayest. SD (with just standard model weights) is particularly weak here because of the small model size.
If you want to get arty, then state of the art for out-of-the-box typing in a prompt and clicking "generate" is probably MidJourney.
But if you're willing to spend some more time playing around with the open-source tooling, community finetunes, model augmentations (LyCORIS, etc), SD is probably going to get you the farthest.
> Also what are the most common use cases for image generation?
By sheer number of image generations? Take a guess...
SDXL 0.9 should be the state-of-the-art image generation model (in the open). It generates at 1024x1024 large resolution, with high coherency and good selection of styles out of box. It also has reasonable text-understanding comparing to other models.
That has been said, based on the configurations of these models, we are far from saturating what the best model can do. The problem is, FID is terrible metrics to evaluating these models so like LLM, we are a bit clueless about how to evaluate them now.
I overspoke. FID is a fine metrics to observe the training progress of your own model. And it correlates well with some coherency issues of generative models. But for cross model comparisons, especially for models that generally do well under FID, it is not discriminative enough to separate better / good.
I don't know what the use case is for other people is, but I've been playing around with book covers. This one took about two weeks, but it was my first real try and I was still learning how. Composition is a little off. The one I'm working on now is going faster (and better).
I've found that I rarely get a usable image completely as-is. It might take 5 or 10 generations to find something sort of ok, and even then I end up erasing the bad parts and letting it in-paint (which again takes multiple attempts). The T-rex had like 7 legs and two jaws, but was otherwise close to what I wanted... just keep erasing extra body parts until the in-painter finally takes a hint.
I was also going to do a few book covers for some Babylon 5 books, but it does so bad on celebrity faces. Looked like Koenig's mutant love child with Ernest Borgnine. Dunno what to do about that. I keep wondering if I shouldn't spend the next 10 years putting together my own training set of fantasy and science fiction art.
It's already supported in automatic1111 (see recent updates), and someone in the community will convert it to the automatic1111 format within minutes/hours after it's released on huggingface.
Sort of. IIRC (which may be unlikely) Auto1111 has the base model in the text to image plane, but if you want to use the refiner that is a separate IMG2IMG step/tab. Which would be a pain in the ass imo.
The "Comfy" tool is node based and you can string both together which is nice. Although if you aren't confident in your images you don't need the refiner for a bit.
I think the diffusers UIs (like Invoke and VoltaML) are going to implement the refiner soon since HF already has a pipeline for it.
Comfy and A1111 are based around the original SD StabilityAI code, but the implementation must be pretty similar if they could add the base model so quickly.
Auto1111 (latest on git) OOMed my 3080 running the base-xl on 1024x1024 unfortunately. Granted, my xorg takes up almost 950 MB on the precious VRAM. Did you get it to run using A1111 on the FTW3 without OOMing it?
TBH I was hoping the community would take the opportunity to move to the diffusers format...
You get deduplication, easy swapping of stuff like VAEs, faster loading, and less ambiguity about what exactly is inside a monolithic .safetensors file. And this all seems more important since SDXL is so big, and split between two models anyway.
No. From what I’ve gathered was trained on human anatomy, but not straight up porn. What they tried for 2.0/2.1 was way too overdone, to the point where if I prompted “princess Zelda,” the generation would only look mildly like her. Presumably they just didn’t have many images of people in the training. 1.5 and SDXL both work fine of that front.
Fine tuners will quickly take it further, if that’s what you’re after.
I seem to remember the issue with 2.x is that they removed all the commercial art from top-notch illustrators from the training data due to the backlash, so it was just way worse at generating great-looking things, which is all the user cares about. So the community stayed on their custom-trained models derived from SD 1.5 (which, yes, often included porn).
2.0 included a filter on the training data that removed all nudity. It went way too far and removed a lot of humans. They tried to rectify it a bit with 2.1 but even that was still hampered.
What you said also happened, but the main thing was the base model didn't have a great concept of human anatomy. Apparently it was really hard to train for anything else as well.
That sounds so weird, it took me a minute to understand.
Go to nitter.net which has no login requirement and no ads, but all the same content that X (tmsfkat) has.
Some of them look surprisingly correct, so it looks like there's been at least some progress on that front. I would assume these are among the best examples of many, many attempts so it still seems to be a ways off.
Hands being bad is a result of people one shotting images, you need to go repaint them afterwards I've found. But it'll do it great if you inpaint well.
> Probably will be even better with negative embedding.
And/or hand-specific LoRa and/or a workflow using something like ADetailer extension in A1111 that applies a model to recognize hands [0] and then inpaints them.
[0] recognition models are also provided for people, faces, and eyes, too, and it can use additional custom models for other things.
Eh it’s not a huge deal with stable diffusion because you can inpaint. So you mask out the bad hands and generate a few dozen iterations that merge perfectly with the rest of the image. You’re bound to get something that looks good.
With SD2.1 I’d generate 100 or so images using inpainting and the “good fingers” was about 1% hit rate. If it’s up to 10% that’d be great because generating 10 images takes just a few seconds on an A100
Hands are generally a non-issue at this point. You can just inpaint them and use a negative prompt LoRA to get good hands in just a few attempts. That is, if you don't just get good enough hands right away. That happens surprisingly often.
I can't say for 1.0 but in 0.9 hands get fairly often rendered perfectly. It's not always right but it's way better than any other earlier release (where it's usually consistently wrong).
This explosion of AI-generated imagery will result in an explosion of millions of fake images, obivously. Perhaps in the short-term this is fun, but in the long-term, we will lose a bit more scarcity, which is not that great in my opinion.
Isn't the best part of a meal eating after you've not had anything to eat for a while? The best part about a kiss that you've quenched the pain of missing your partner?
The best part of art is that you haven't seen anything good in a while?
Scarcity is an underappreciated gift to us, and the relative scarcity per capita is in a sense what drives us to connect with other people, so that we may be priveleged to witness the occasional spark of creativity from a person, which in turn tells us about that person.
Although that sort of viewpoint has been declining for some time due to the intensely capitalistic squeezing of every sort of human endeavor, AI brings this to a whole new level.
I think if those making this software thought a bit about this, they might second-guess whether it is truly right to release it. Just a thought.
Enforcing artificial scarcity is idiotic and counter progressive. There will be other things that will continue to be uncommon that humans will continue to appreciate. This is what human progress looks like. Imagine someone said this when agriculture started up- “The great thing about fruits and vegetables is that they taste so sweet the few times we find them. We shouldn’t grow them in bulk”
AI Art models are completely dependent on human labor to function. "Out-competing" human generated images will damage the commons and make it harder to train these models over time as they push human creative labor out of the market, if we believe it's even competitive.
Personally I think this guy has a point and he's pointing to something that I don't believe a lot of ai art advocates have considered: the attention economy. There isn't actually any scarcity in the current market for artistic content. There are literally millions of people producing art every day, and the market is very winner take all. There are few artists who are able to support themselves on their work, and their skills are exceptional and specifically in demand. My theory is that the supply of AI generated content will be so vast, and the perception around it will be that it's low effort and low quality, that so called AI artists are going to have trouble distinguishing themselves in a market where they're saturating it and all using the same models. I think your perception of this market is flawed. Art is not a fungible good like food or clothing.
So what if art is devalued? We are hardwired to appreciate beauty so art of some form will always be sought. Obviously there is the matter of artists losing their livelihoods but that is also an inevitable outcome of progress and always has been.
Lol. I hate this industry sometimes. It's not obvious, to me at least, that our society should accept the automation of the production of culture. Maybe I'm a luddite or whatever lazy quip you'd like to use, but I prefer the story of a human mastering a skill and producing something beyond contextless aesthetic sludge.
Yeah, that's why I always laugh when I see people use cars instead of walking 100 miles. It's an inspiring story for people to do marathons, and cars devalue that.
I disagree. The problem is that AI will flood the market to an extent that humans will barely be able to keep up, moving from one AI-generated thing to the next, barely having time any more to spend any real effort on enjoying life.
There is a great chasm between "don't grow vegetables" and "grow them the way we do them today".
I wouldn't advocate not growing vegetables. But today we grow them in a monoculture for instant availability everywhere, and those monocultures are susceptible to disease and also are not terribly ecologically friendly. AI is like an ultimate monoculture of diseased fruits.
Also, I believe counter-progressive to be a good thing. Human beings should not progress in certain ways, as we don't have the wisdom to use the technology we have developed.
Humans in general cannot appreciate things very well, and computers and AI will only make it worse.
The same could have been said when photoshop or CGI tools like blender replaced hand sculpting and hand painting but I think it hasn't been a net negative across the board (I think rather the opposite).
I believe it has. CGI at the beginning was okay but like all technologies, humans could not resist bring it to a high level of efficiency. Now all CGI movies are pretty bland and barely any effort is brought to storytelling.
A lot of downvotes. I can relate to it a little bit. During beginning of covid I was in SE Asia at airbnb that didn't have laundry machine - since in SE Asia you don't need it generally because there is so many cheap per kg laundry services around. When for the first month I had to hand wash my clothes I really appreciated having a laundry machine after moving to another airbnb that had one - you take some things for granted.
But no, I wouldn't want to hand wash my laundry more often. For the same reason probably I still prefer using a lighter when having a BBQ than a flint.
Yes, humans take many things for granted. That's why it's better to experience a lack of things sometimes because we are just wired to move onto the next thing.
As for the downvotes, I don't mind. I try and present a critical view of technology, but I expect the downvotes because almost everyone here has the perspective that technology is a tool and progress is generally a good thing, which are two statements that I wholeheartedly reject.
I didn't say starve. I just meant to take a break from eating (you know, like between meals?). AI and computer technology has removed the breaks between meals.
I think you have this backwards, Capitalism loves scarcity. Scarcity is what allows for supply and demand curves and profit-making opportunities, even better if you can control the scarcity. Capitalist entities are constantly attempting to use laws, technology, and market power to add scarcity to places where it didn't previously exist.
That is not the whole story. Capitalism likes the following process (if you can say capitalism "loves" anything, which isn't quite right. It's more like the abusers of capitalism love this):
1. Create scarcity,
2. Flood the market to reap short term gains with market-disrupting technology
3. Creat new scarcity by creating new products
I clicked through the links in the article, since they sounded technically interesting. They led to AI-generated porn. Those, in turn, led to pages about training SD to generate porn. Now, two disclaimers:
1) I am not interested in AI-generating porn
2) I haven't followed SD in maybe 6-9 months
With those out-of-the-way, the out-of-the-box tools for fine-tuning SD are impressive, well beyond anything I've seen in the non-porn space, and the progress seems to be entirely driven by the anime porn community:
https://aituts.com/stable-diffusion-lora
10 images is enough to fine-tune. 30-150 is preferred. This takes 15-240 minutes, depending on GPU. I do occasionally use SD for work. If this works for images other than naked and cartoon women, and for normal business graphics, this may dramatically increase the utility of SD in my workflows (at least if I get around to setting it up).
I want my images to have a consistent style. If I'm making icons, I'd like to fine-tune on my baseline icon set. If I'm making slides for a deck, I'd like those to have a consistent color scheme and visual language. Now I can.
Thanks creepy porn dudes!
The other piece: Anyone trying to keep the cat in the bag? It's too late.