Hacker News new | past | comments | ask | show | jobs | submit login
Opendream: A layer-based UI for Stable Diffusion (github.com/varunshenoy)
472 points by varunshenoy on Aug 15, 2023 | hide | past | favorite | 137 comments



It makes more sense to embed stable diffusion capabilities into well-established image editors such as Gimp, Photoshop, Krita, or Figma, which come with layered, non-destructive functionalities, rather than attempting the opposite approach.

https://github.com/Interpause/auto-sd-paint-ext https://github.com/thndrbrrr/gimp-stable-boy https://www.magicbrushai.com/


Gimp is so well established that it has almost fossilized...

Also, "Normal" layered non-destructive operations are a couple of orders of magnitude faster and do not require 8Gb of VRAM per 512x512 patch, or work only with fixed set of buffer sizes, or any of other strange things SD comes with. Like, how a non-destructive controlnet layer would look in Gimp?


Depends on where the 'opposite approach' is aimed to end. If the result is a totally new creative workflow then what is the point of carrying all the ballast of a legacy tool?


I got briefly very excited for non-destructive editing in GIMP, but the website still says this is slated for 3.2. Which functionality were you referring to?


Very exciting. The "first-generation" Stable Diffusion frontends seem to have settled on a specific design philosophy, so it's interesting to see new tools (like this or ComfyUI) shake up the way people work with this tool. I hope that in a few years, we'll know which philosophy works best.


Out of all the AI-related tools, generative art frontends are probably the thing most likely to radically change and improve in the next few years.

It's specifically why I've avoided diving too deep into "prompt engineering", because the kind of incantations required today just aren't going to be the way most people interact with this stuff for very long.


> Out of all the AI-related tools, generative art frontends are probably the thing most likely to radically change and improve in the next few years.

The difference between UIs is actually not very relevant today; by now the generic workflow for complex scenes is more or less obvious to anyone who spent time with SD.

- Draw basic composition guides. Use them with controlnets or any other generic guidance method to enforce the environment composition you want. Train your own controlnet if you need something specific. (lots of untapped potential here)

- Finetune the checkpoint on your reference pictures or use other style transfer methods to enforce the consistent style.

- Use manual brush masking, manually guided segmentation (ex. SAM), or prompted segmentation (ex ClipSEG) to select the parts to be replaced with other objects. The choice depends on your case and need to do it procedurally.

- Photobash and add detail to the elements of your scene using any composition methods you have (noisy latent composition, inpainting etc) with the masks you created in the previous step. Use advanced guidance (controlnets, t2i adapters etc)

- Don't bother with any prompts beyond very basic descriptions, as "prompt engineering" is slow and unreliable. Don't overwhelm the model by trying to fit lots of detail in one pass; use separate passes for separate objects or regions.

- Alternative 3D version: build a primitive 3D scene from basic props (shapes, rigs). Render the backdrop and separate objects into separate layers as guides. Use them with controlnets & co to render the scene in a guided manner, combining the objects by latent composition, inpainting, or any other means. This can be used for procedural scenes and animation (although current models lack temporal stability).

As long as your tool has all that in one place, it's a breeze, regardless of the UI paradigm (admittedly auto1111's overloaded gradio looks straight out of a trash compactor nowadays). I expect 2D/3D software integrations being the most successful in the future, as they already offer proven UIs and most desirable side features. The problem is that in the current state SD can't do much in the production setting, it's not a finished product - so there's not a lot of interest in software integrations just yet.


Thanks for sharing this detailed guide. Can you share an example of the type of resulting image you’ve generated using the above approach?

I’ve only just used Dall-E or SD with basic prompts, or sometimes using photoshop afterward. I’m curious what you’ve been able to come up with using your more complex pipeline.


vizcom.ai ;)


Wow that is awesom... I'd kill my $30/mo sub to midjourney if this thing were $30/mo for individuals...


As a commercial artist that's worked in several professional creative industries, I find the current textual methods of interacting with generative image AI to be unusable for the vast majority of professional tasks. I think they're great for a lot of laypeople because they abstract away things that laypeople don't want to have to think about— but in professional workflows, you need specificity at pixel-level granularity, predictability, and repeatability. Those things are all difficult with purpose-built tools and impossible through text prompts. I haven't spoken to a single colleague that doesn't work in high-volume, low-effort end of their disks/fields that disagree. Most commercial artists selling point is deciding exactly what should go into a piece, and implementing it is the easy part.

The pro tools that have incorporated generative AI into their workflows are not at all textual. The environment that popularizes this among the general public will look a lot more like canva or maybe Instagram than what's popular now.


At some level I agree that the prompt engineering done today to break ChatGPT guard rails are things that barely rise to “interesting hack” levels, but I think that manipulating language to induce specific behavior by an LLM is a powerful skill, and requires a very facile understanding of language in the semantic context of the training corpus. By varying the tone, vocabulary, style, pacing, and obviously the semantics of the original inducing language you can dramatically change the behavior of the LLM. This is less about prompt engineering and being a masterful manipulator of language - and why I don’t fear that LLMs make language skill irrelevant. Those with the most language skill will produce the most compelling and tailored LLM output for a purpose.


It’s entirely likely that there’s much more effort going into generative text - any perceived advancement of generative images is going to be disproportionately skewed due the richness of information that they hold.


Incantations are fun!


Photoshop Beta does it best. The generative features are just new tools that work as you’d expect with all the existing tools. For example, if you want to do outpainting, just make your canvas bigger and you get a contextual menu where you can (optionally) type a prompt. Inpainting, just make a selection however you want and type a prompt.


The control that offers is extremely limited versus SD in A1111 with all it's different models, LoRA's, embeddings, extensions and ControlNet types.


I wrote a typescript API generator for ComfyUI, works great - hopefully will have time to release it soon.

I think there's so much unexplored potential in UI and workflows around generative AI, we've barely scratched the surface. Very exciting times ahead!


I bet this will be available as an Automatic1111 extension by end of month.


I'm doubtful about that. A1111 is what I called a "first-generation frontend". Both it and all of its extensions follow a specific model for its usage - in general, every tool is contained on its own tab, with each tab having buttons to transfer the outputs into other tools. Radically changing this model would require rewriting so much that it'd just make sense to use a different frontend in the first place.


I haven't followed diffusion image generation development for a while. Where do you find information on what models you can use in the model_ckpt field? Do I need to import them from somewhere? What are the main differences between them and which are more modern or better?


Civitai.com is the current most popular resource for models. Also ckpt format is discourage for security concerns and saftensors is now used instead.


You can find them on huggingface, or you can reverse engineer which ckpt you want to use based on an image you've seen generated (like at majin[1] - beware, there's a lot of NSFW/controversial stuff here.)

1: https://majinai.art/


Some of this is straight up soft-core child porn. This is fucked up.


Agreed! I just clicked the link and did a double take. I don't care if it's AI. This is child porn material, and in my opinion, it should be shut down.


Yep.

There needs to be a REALLY FUCKING STRONG effort to kill all CP AI anything. Full stop.

AI should automagically report any attempt at CP.


Serious question - Why? Assuming no actual CP was used in the training of the model who is being harmed? Ickiness should not imply illegality unless the ickiness is at the expense of someone else. Swing your fist as much as you want so long as you avoid my nose, and all.


> Serious question - Why? Assuming no actual CP was used in the training of the model who is being harmed

I don't think an AI model could generate realistic CP without being trained on examples, which would mean there is literally no way for this assumption to be true.


If it's trained on pictures of children and adult porn, the model can easily combine the two concepts. Even if you remove one of these from the training set, it can be finetuned back in to the model with very little effort.


Based on other topic interpolations I don't think this works, especially not easily.


If I can finetune the model on my face and the model will be able to generate my nudes (even if there is no nude of mine in the dataset), then it can works easily with children too.


Do you understand that adult bodies and child bodies look different?

Please show me a model trained on adult humans being fine tuned to generate a child human (fully clothed).


> Do you understand that adult bodies and child bodies look different?

SD-based models can generate child bodies just fine, based on lots of training with non-porn images.

> Please show me a model trained on adult humans being fine tuned to generate a child human (fully clothed).

That's going to be hard in the SD space since there are, AFAIK, no models trained exclusively on adult humans (not even the base models), so you'd have to scratch-train a new base model to do this. (A model fine-tuned on adult images is still going to have the influence of base-model training, and unless massively overfit—usually very much an anti-goal, the exception being age-filter models whose entire focus is controlling apparent age [0]—will not have lost much of the generalization capability of the base model.)

The generalization abilities of models are good enough in other contexts that its plausible that realistic nude children could be generated by a model with no nude children in its training data that was otherwise trained on both clothed children and nude adults. I have no plans on testing this, however.

[0] e.g., https://civitai.com/models/65214?modelVersionId=74332


> SD-based models can generate child bodies just fine, based on lots of training with non-porn images.

I have not stated anything contrary.

> The generalization abilities of models are good enough in other contexts that its plausible that realistic nude children could be generated by a model with no nude children in its training data that was otherwise trained on both clothed children and nude adults. I have no plans on testing this, however.

Stable Diffusion has failed me for much simpler interpolations than the one you're describing. I don't believe this would work based on previous interpolations I have seen in Stable Diffusion. You can convince me by showing an example, but not by stating that it is possible without one.


> > The generalization abilities of models are good enough in other contexts that its plausible that realistic nude children could be generated by a model with no nude children in its training data that was otherwise trained on both clothed children and nude adults. I have no plans on testing this, however.

> Stable Diffusion has failed me for much simpler interpolations than the one you're describing

Its succeeded for me in much more complex ones. SD (even with the same exact set of checkpoint, LoRa, etc., and other workflow elements) isn't consistent across apparent-to-humans complexity levels in its generalization ability, and there are a very large combination of potential combinations.

> You can convince me by showing an example

Yeah, I'm not going to try to make simulated child porn for you, and while I would have sided with you in the debate over whether your earlier proof request amounted to a request for that, this one very clearly does.


Jesus, what is wrong with you? I clearly asked for an example of complex interpolations. How am I "very clearly" asking for CP when we're talking about complex interpolations, not CP?

Please, explain how I am "very clearly" asking for it. Please. Literally both sentences preceding the one you quoted (as in, the entire paragraph containing this sentence) are talking about interpolation. How do you jump from interpolation to CP??


>Do you understand that adult bodies and child bodies look different?

And does my face compare to the ones in the dataset.

>Please show me a model trained on adult humans being fine tuned to generate a child human (fully clothed).

That's not something I can easily train for you ahaha


> And does my face compare to the ones in the dataset.

Yes, exactly! You're fine-tuning something with more data. How do you fine-tune adult porn models to create CP without CP data? If you do what you're saying you'll have adult bodies with child faces.

> That's not something I can easily train for you ahaha

Can you show me any example where such a thing has worked? I don't believe such an example exists.


>Can you show me any example where such a thing has worked? I don't believe such an example exists.

You can use Stable Diffusion.

>If you do what you're saying you'll have adult bodies with child faces.

The body will be similar to petite people or similar to lolicon content traslated into "real life", it's not a far guess by the model.

The model do the same thing as with finetuning with my face, it understand that my images are the images of an adult male face and the knowledge will be added to the higher understanding of adult male and this include nudity, when you finetune on children it understand it's a human and it will added to the higher understanding of humans (And that includes nudity in various forms).

Maybe the model wouldn't be perfect with children, but it can't be with my nudes either, maybe I don't have nipples and the model doesn't know it; but the guesses that the model does are usually good enough to be considered realistic or plausible.


> You can use Stable Diffusion.

Yes, and as I've already told you, it doesn't work in Stable Diffusion, or at least I haven't seen any examples. Don't repeat this point again, show me an example.

> The body will be similar to petite people or similar to lolicon content traslated into "real life", it's not a far guess by the model.

Really? Show me an example. Don't just claim it, show me some place where this worked. You've mentioned fine-tuning a model with your face. Fine-tune a child model with your face and show me that it outputs something roughly adult-like. Or do the opposite and fine-tune an adult model with a child face and show me the model generates a small person. Show me that it's true, don't just claim it to be.


Should they check the IDs of the models to verify?

Imagine getting reported because you generated an image of an anime girl deemed to be only 17.

I'd personally rather live in a world where people generate distasteful images with an AI and have that AI unconstrained than the inevitable one where everything gets locked down and run by large corporations who will ultimately create more harm than someone generating some lolicon.


If you’re running a service you should have automatic filtering, detection, and such.

A model by itself though… you might as well ask a pencil to report someone for drawing graffiti. It does not make sense.


Models have been trained on something though. They are not analogous to pencils or brushes.


They are responding specifically to the "AI should automatically report" suggestion. An AI model on its own (without a service built up around it) would not have any mechanisms to use the network, reach out to an FBI hotline, or do anything like this. Protections / limits would have to be added by the operator somehow (or the distributor of models themselves).


If you have 4 GB of VRAM, you can finetune a model to whatever you want.


I've always assumed this is what will be used to justify the regulation of AI.


The four horsemen of internet censorship.

Money laundering, CSAM, terrorism, drugs.


Define censorship when something is objectively wrong.

Would you prefer an AI of "describe and print in 8k a body pit of objectional political dissidents thrown into a pit after being starved to death in a comical way such that my SV_BubbleTime can laugh at it as ooppsed to being offended by just how horrific this situation was HAHA"


You seem to be having trouble understanding the topic.

The censorship isn’t OF those things. It’s in the NAME OF those things.


I believe illustrations have been deemed to be abuse material, so I wouldn’t be surprised if LE have started looking into it.


Who exactly is being abused here?

I for one would much rather give pedophiles an opportunity to fulfil their sexual desires through AI-generated pictures than real ones.

Of course, we can talk about the training material. Are there actual child porn images in there? I seriously doubt it but who knows?

And perhaps a case could be made that AI-generated child porn could be a gateway to invite people who then seek out non-generated material.

But I think these are separate discussions to be had.


They aren't separate discussions, they're directly tied to determining abuse material. Revenge porn is an example of abusive material, despite the subject not being abused in the material usually, they're considered abusive material due to the intent to cause abuse through distribution.

So if either case applies, whether it's training based on certain images, or it becomes a gateway, these are discussions to be had directly relating to whether or not it should be classified as abuse material.

Additionally, I'm not sure if the recommended help methods by professionals who deal with pedophiles is to let them fulfill their specific fantasies without a care.

There are lots of really important discussions to be had, but they're all tied to each other basically. We can't separate them out, nor should we aim to.


Reading the parent post I think he would believe that revenge porn is abusive material, because there is a person who is getting abused with the distribution of the content, the person didn't consent the distribution. CP is abusive because a child was abused in the creation of the content. The doubt that the author of the parent post has is (from what I understand): who is getting abused with the creation and distribution of a generated image?


They've brought up 3 ways in which it can be abusive (trained on abusive materials, created for the intent to abuse, or continuing self abuse rather than seeking help), but argued those should be separate discussions vs my thinking which is that these are linked to determining if such material is abusive.


First of all, thanks very much for these comments - all too often, threads quickly deteriorate into flame wars or emotional finger pointing, and I'm quite happy that this exchange of (opposing) opinions has remained very civil on such a hot topic.

I just wanted to clarify that I did not mean that these topics are all unrelated. When I wrote that these are separate discussions to be had, I was rather trying to imply that these questions are important enough to deserve an own treatment. However, I do agree with you that in the end, they all contribute to the question whether or not artificially generated child porn is abusive or not.

I do appreciate another sibling comment that points out the relation to other fictional child porn, such as literary works.

Additionally, I would like to add another dimension to the topic, namely that IMO, there is often an unspoken underlying assumption that portraits consumers of child porn as (potential) predators. However, unlike a juvenile delinquent who might find it cool to break into a local corner shop at night to steal some cans of beer, pedophiles are usually not attracted to child porn as a matter of choice. Like many sexual preferences, it is often innate, and can also be a burden to them: imagine you know that what gets you on is morally wrong, even a crime, and for most of your life you are forced to suppress your real sexuality as a consequence.

I'm thinking that fictional child porn, even when it's not AI-based but perhaps created with photoshop or in form of stories, could potentially help pedophiles to find ways of somehow dealing with their sexuality without actually preying on innocent children.

However, all of these thoughts come from a very naive understanding of the subject matter. Neither am I a pedophile myself, nor do I know anyone who is, nor am I a psychologist or something who works in the field. So I am very interested in corrections or additional options - especially, as I pointed out before, if they are done in equally civil ways as they have been so far in this thread.


“Neither am I a pedophile myself, nor do I know anyone who is“ Glad you cleared that up for the readers at home.


We live in a world where the author of parent post would be unironically accused of being a pedo by several people for having that opinion so don't be surprised.


Unfortunately, this.


I mean you described it quite well and defended them. I raised an eyebrow and am still unsure even though you’ve said you weren’t one.

It’s kinda like arguing that racism is a fact of life for some and we should allow them to vent their racism in a controlled environment. Hard disagree.


> Reading the parent post I think he would believe that revenge porn is abusive material, because there is a person who is getting abused with the distribution of the content, the person didn't consent the distribution

So would it be okay to distribute "revenge porn" imagery after the subject has died?


well no, the person didn't consent, where is the gotcha?


This discussion isn't new[1], and I'm not sure re-hashing it here will be useful unless you think AI generated child porn is significantly different from any other form of fictional child porn. Photoshop has existed for thirty years, pen and paper for even longer.

[1]: https://en.wikipedia.org/wiki/Legal_status_of_fictional_porn...


Geez that’s disturbing. I clicked having no qualms with nudes, artistic or otherwise. I’m not a prude. I’ve seen my fair share of anime girls and AI nudes. Hell, I was raised on the internet before parental settings were a thing, but I didn’t expect that. It’s so gross how it toes a line too.


the Fediverse has a big problem with this, too, and I never hear anyone talking seriously about it


What is a serious talk, though? Ie what can be done? Isn't it a lot like the internet as a whole; report the offenders if you like then move on?

To me it's akin to encryption being used for illicit/illegal activities. Any tool that gives power to people can and will be abused by people you'd want nothing to do with.

What did you have in mind?


probably when someone adds young, childish, and something sexual to the prompt it says no and alerts law enforcement?


So is there some system that implements text/image classification for automatic law enforcement submission that you're referring to? Also jurisdiction, etcetc.

It all seems incredibly complex. Not a reason to "not try", but i suspect we'll struggle to implement even the most basic thing. And even then, take that basic thing and apply it to every software where users can input data.

Plus we'd have to convince everyone to do this. Automatic scanning and submission of data is not a well liked topic. Remember how Apple doing basic CSAM scanning was full of panic?

Even if a government _forced_ us to do this, jurisdiction alone would be a big question. Some serious questions that need some serious thought, imo. Is being hand-wavy even worth the time?


you just need you search text, the prompt, for things normal people wouldn’t search for, not at all complex. nobody normal asks AI to create child pornography


You picked the easiest (though still not easy imo) thing, and ignored the rest of the complexity haha.

If it's so easy i'd love to see your implementation that works multi-language, across all media types, for all jurisdictions and hell handles burden all the massive number of edge cases.

Or frankly, any implementation. Whatever you think is easy and everyone should be doing - please point to an E2E implementation of it. Maybe i misunderstand your scope. Something where if a user submits CSAM, or does something to some country authority..?


Prompts use text, period. We already have the technology to search text for child pornography, both google and duck duck go use this.

there’s also negative prompting, which tells the model don’t do these things.

hallucinations can still happen but it’s much easier than you’re making it out to be.


Yea, identifying isn't the hard part though - that's not what i'm concerned about. Automatic submission to parties, APIs for governments, jurisdiction, etc. Wonder what Google does if you type in CSAM triggers.

I suspect we could use some sort of central management service for "Internet Reports". Ie to deal with jurisdictions, reporting something to the right people, etc, as well as the complexities involved with identifying people.

Either way i think you're underselling the complexity. Or maybe you think it's so easy but no one cares, /shrug. Seems a long list of questions i'd have before i could even begin to implement it.


context: there are several instances that let you upload illustrations without a restriction on what content the image can contain, it can contain minors, rape, incest. These are instances are between if not the biggest and more active Maastodon instances. The reason why they are so active is because platform like Twitter while having a lot of these stuff too sometimes they will ban your account or at least shadowban you. Many Mastodon instances have banned these ones.

In my opinion instances should let the user decide what they find problematic or not and unless it's just spam they shouldn't ban instances.


illustrations are not a problem under the law in the United States, but it has to be seen for generated images indistinguishable from reality or almost.


What can someone do about it?


Also CivitAI but beware the NSFW

https://civitai.com/


> https://majinai.art/

Thank you so much for sharing this. Civitai keeps bugging me to create an account. This doesn't seem to suffer from the same flaw.


Controversial is one way to put it


It always amuses me when people who think they're the center of the world discover that there are other people with moral takes different than theirs.

If no real children were harmed to produce this stuff than it should be treated like any other extreme works of fiction (e.g. violence in video games, graphical descriptions in certain books).

Being disgusted is not grounds for banning something lol.


>Where do you find information on what models you can use in the model_ckpt field? Do I need to import them from somewhere?

You can train (finetune) your own on your reference material.


Very cool. Would be interesting to train a model on images with alpha channels so outputs would be automatically masked and more easily composable. But maybe masking is so good these days that would be futile?

When a user does img-2-img on a layer does it use the context from other visible layers in the generation?


For composing this approach works pretty well, maybe the author should consider making a UI for it

https://multidiffusion.github.io/


Thanks for posting. Really interesting



Segment Anything is neat, but segmentation is far from solved.

If the user generates a picture of a horse and rider to add onto another composition - they probably want to include the saddle.


SAM is also conditioned on points, if it's ambiguous what you want to mask you can add a point on the saddle and the model will add it without a problem, segmentation is pretty much solved, I agree with the parent post.


IME I haven't gotten great results using SAM, maybe it was just the images I was using? They weren't great quality and it seemed to struggle with low contrast areas


If it's audio, images, cg or video it's almost always GiGo.


> Would be interesting to train a model on images with alpha channels

Would be even more interesting to get an ANN middle system of ontology of the (finally) represented content in order to change the single items.

An internal representation of qualified structured items in space as part of the chain. Prompt > accessible internal representation > render.


Is it possible to add SD XL support for this?

I'd love a colab notebook if anyone has the skill and time to do so.


If anyone wants to add SDXL support, all you have to do is create a new extension with the correct SDXL logic (loading from HF diffusers, etc.). You could parameterize `num_inference_steps`, for example, to delegate decisions to the user of the extension.

If anyone gets to making one before me, please leave a PR!


Can you add a layer with e.g. an image of yourself?


Pretty sure you can do this. Diffusion models by default start with noise, but you can start with any data, including an existing image. For instance, you could import a photo of yourself, mask the eyes and then ask the model to make them green.


Very cool honestly, seems like a much needed improvement over Automatic. Does it support LoRa/will it support in near future?


You can write an extension to support LoRA (~10 lines of Python HF Diffusers code).

If you get to this before me, please create a PR!


That looks pretty nice but I guess the HW required or the time you have to wait to iterate on these things (if you don't use external services) is quite high. Is there an estimation / idea when a "normal" person can play around with these things with a lot of operational or capital investment?


There's great articles on how layered uis are a lot easier to use than node based uis. Really excited to see a layered approach to SD. Its definitely time to break out of gradio.


Maybe if they're talking about layered UIs with layer groups, which turn a flat stack into something resembling a tree. But even these UIs don't give you proper non-destructive editing - anything more complex requires you to duplicate parts of layer stack to feed as inputs, which is a destructive operation with respect to structure (those pasted layers won't update if you make changes to copied source). Doing this properly requires a DAG, at which point you're at node-based UIs (or some idiosyncratic mess of an UI that pretends it's not modelling a DAG).

It's all moot though, because as far as I know, there is no proper 2D graphics editing software that uses DAGs and nodes. Everyone just copies Photoshop. Especially Affinity, which is grating, given their recent focus on non-destructive editing. For some reason, node-based UIs ended up being a mainstay of VFX, 3D graphics, and VFX & gamedev support tooling. But general 2D graphics - photo editing, raster and vector creation? Nodes are surprisingly absent.


For some reason, node-based UIs ended up being a mainstay of VFX, 3D graphics, and VFX & gamedev support tooling. But general 2D graphics - photo editing, raster and vector creation? Nodes are surprisingly absent.

That's because non-destructive editing is mostly useful for animation, image series/sequences, and asset reuse, which are the most common in these fields. 2D artists have a different mental model, which is additionally set in stone by Photoshop and other software imitating it. Photographers use non-destructive editing, but mostly in simple cases because advanced things (retouching, creative compositing) can't and don't need to be done procedurally anyway.


What about ancient 'Illusion', old 'Shake' or current 'Nuke' VFX compositing softwares that totally have been supporting node-based (ie DAG-based) comp-workflows since the early 2000s? Guess this is just a very different (much smaller) realm than your usual Photoshop's and so on?


> There's great articles on how layered uis are a lot easier to use than node based uis

I can see that being sensible for simple linear flows from one step to the next, with no branching merging, or connections that skip steps.

Seems to me that with any of those other things, a layered UI is going to start to break down a lot faster.


Can you share such articles?


While automatic1111 is cumbersome and takes s while to learn, it seems far more capable. The layers here are just inpainting (as noted in the repository readme as well).


Not a bad start. One quick suggestion: avoid the temptation to make it overly complex.

Stable Diffusion needs to go out to the masses to a greater degree. The unnecessary garbage complexity (eg Comfy's ridiculous noodlescape) that developers keep including into the UIs is holding Stable Diffusion back significantly from a greater mass adoption.


Node based workflows with little DRY capability (i.e. ComfyUI) do get painful as the workflow grows. That said, an http server capabable of executing ML DAGs is extremely useful and a great building block for other tools and UIs to be built upon.

I wrote a typescript API generator for ComfyUI recently and having programmatic access to let you build and send the execution graphs is a game changer. Hoping to have time to release it soon. Same can easily be done for any other language. Exciting stuff!


First thoughts, how do I bind to an ip, and where can I install models?


If it can handle LoRAs, I'll be sure to try it out this weekend.


LoRAs can be handled as a straight-forward Python extension!


its gonna be breathtaking when this technology gets close enough to make legit cartoons and animations. layers is a step closer to getting there.


Corridor Crew did a some sort of anime using this technique [1] and then they did two videos [2, 3] explaining the technology behind. Quite interesting if you ask me!

There still are some issues with the eyes and a bit of flickering but at the speed everything is moving I wouldn't be surprised if this improves in a year or two.

Needless to say, there's still a lot of artistry involved in such a process so anything is yet to be completely automated.

[1] https://www.youtube.com/watch?v=tWZOEFvczzA

[2] https://www.youtube.com/watch?v=FQ6z90MuURM

[3] https://www.youtube.com/watch?v=mUFlOynaUyk


This technology is already close to making animation. Check out some of my experiments with text to video here:

https://www.youtube.com/watch?v=CgKNTAjQpkk

https://youtu.be/X0AhqMhEe-c


Is this related to Melondream?


How is this better than A1111?


What's up with names nowadays? Not only there's already an OpenDream[1] on GitHub, but there's also a Stable Diffusion service also called OpenDream[2]!

1. https://github.com/OpenDreamProject/OpenDream 2. https://opendream.ai/


Slap a virtualenv setup into that install script please. A system wide pip install is a bad pattern.


done :)


Now that's agile.


[flagged]


I get that you're spamming out of outrage, but they allowed me to disentangle my comments from my username, which is the same unless you mentioned something you don't want to mention.


I do not want to disentangle my name from my comments. I want to delete my comments. They are MY comments. I have a right to have them deleted.


If you're in the EU or are willing to stay in the EU for an extended time (>6 months?), then you may be able to compel them to delete the comments using EU laws. If they refuse, escalate and let the entirety of the EU take care of the rest.

I get why hn is against deleting comments, and sure, make it really hard to delete comments if necessary, but you should honor the request of users who's unpaid contributions make your site what it is.


Maybe this is the start of a movement here to get us our own GDPR?

You know what’s way more dangerous than me spamming this site with LLM hallucinations? Me walking into Congress with a fucking bill.

How far do you want this to go HN? Delete my comments and then delete my account or maybe I start talking to Senators. It’s a nice site you have here. It would a shame to see your data go “poof”, wouldn’t it?

You don’t delete my stuff? Maybe I just burn it all down in a nice fire led by Congress and a stroke of a pen. How would you like that?

(Also thank you for the support Zuiii. You’re alright with me. :) )


[flagged]


Maybe so, but please don't post unsubstantive comments to Hacker News.


[flagged]


Consider that maybe, just maybe, you are in your own bubble, being blind to what is actually being done. Look at another comment in this thread. [1] Six guys produced 23 minutes of rotoscoping in a short time with just an experimental colab notebook, something not possible before. They made the concept art and transferred it to their live play. What did they steal as per your comment above? The concept of having a body, limbs, and a head?

Several styles in animation already converged to the same playbook to cut the production costs. They can massively benefit from this, as it enables things that were not possible before due to the high manual labor requirements. This is a major limiting factor.

Maybe, just maybe, the point about replacing the skilled artists with prompt-writing monkeys is moot, because this is an emerging highly technical creative field akin to 3D CGI and nobody is actually doing it like the panicking crowd with zero understanding likes to imagine? If anything, the industry is perfectly capable of exploiting underpaid labor and drowning itself in shit without using any AI.

Might it also be that the entire moral panic and the ethics angle was just a viral fluke of a twitter horde amplifying the info noise without giving it any thought or having a look into it?

[1] https://news.ycombinator.com/item?id=37142433


Nice of the corridor crew to rip off work like vampire hunter d and still have it look like garbage. It's pretty easy to understand what they did because they released an entire video about the process.


> I think it’s important to bring these points up ad nauseum on any discussions involving this technology

Yeah, hijacking and derailing every distantly-related discussion to your pet political issue has been recognized as bad netiquette since, at least, the early Usenet days, and its definitely not appropriate here. There is definitely an interesting and important discussion to be had about the practical, ethical, and legal issues around AI training and use, intellectual property, and the market for works of the types that models are trained on, but your overtly stated crusade to hijack all AI image generation conversations onto that topic is wildly inappropriate for this forum.


It's comments like this that really highlight the extent of damage copyright has caused. It has conditioned people to think that they can own information once released, and that they can treat it like property. It's a notion that's ridiculous and silly that it gets threatened each time we make a leap in technology (tapes, computers, internet and streaming, and now generative AI).

I wonder how long society will tolerate the nonsensical idea of copyright before it's had enough.

Intellectual property is nonsense and I'm glad it keeps getting exposed.


Until we have an economy where people don't starve to death from lack of food or die of frostbite from being homeless, we have to figure out ways for people who actually make things to benefit from that work. Copyright is currently protecting people who rely on the product of their labor to make ends meet. Until we have the fairy tale economy you envision where artists can have the work stolen and still live, this is the best we've got.

And yeah, wild to think people feel entitled to own their own work thanks to silly things like the entire body of copyright law.


The true fairy tale is the intellectual "property" nonsense you people deluded yourself into believing. As I said, reality will continue to slap you across the face with each jump in technology. How I wish these slaps were enough to wake you up from your delusions but alas.


Doesn't this entirely depend both on what its been trained on and what style is being output? But also, philosophically, is it even "theft" to make something in-the-style-of someone?

I believe these questions and their complex answers are the reason you've been downvoted.


I understand why the person was downvoted, but not why the person was flagged. It doesn't make sense for someone to flag "AI art is theft."

Downvotes because you didn't back up what you mean.

Flagged because there are AI fanboys that want to sensor speech perhaps?


I would say that it absolutely deserved to be flagged because it was a comment of little engagement.

It both isn't directly related to the original post, and also didn't even make any particular arguments. It was just a 5 word declaration of fact, that is borderline offtopic.


Flagged because it was an unsubstantive comment as dang mentioned and also that it's increasingly a flame bait topic on HN, same as with Copilot and licensing.


Downvotes because the userbase here is rabidly supportive of any new "AI" tech regardless of the consequences, damage, or ethics surrounding it. But of course, we can never take a critical view of technology on Hackernews.


Theft takes the original, piracy makes a copy, AI art remixes the original. I’m not sure how to classify AI art but it definitely isn’t theft.


So piracy may be involved in training the model, but the rest does not follow.

Art inspired by other art has been the way of things for as long as we've been creating images. There's no such thing as a "clean-room painting."


Using unlicensed copies of other people's work in training is the problem, along with what that does to the market for original works. Using people's labor for AI training without permission or compensation will discourage people from sharing that work and ultimately make the AI models worse too.


Derivative works with zero copyright protection due to the predominance of machine assists

No way to quantify though, for or against copy protection

But thats a convenient compromise for now


"This pixel right here officer; clearly stolen."


Why, did you lose your art because of AI?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: