Hacker News new | past | comments | ask | show | jobs | submit login
I used Stable Diffusion and Dreambooth to create an art portrait of my dog (shruggingface.com)
663 points by jakedahn on April 16, 2023 | hide | past | favorite | 236 comments



Some feedback on workflow:

  - Automatic1111 outpainting works well but you need to enable the outpainting script. I would recommend Outpainting MK2. What the author did was just resize with fill which doesn't do any diffusion on the outpainted sections.
  - There are much better resizing workflows, at a minumum I would recommend using the "SD Upscale Script". However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control. In this case "SD Upscale" is fine but the inpaint based upscale works well with complex compositions.
  - When training I would typically recommend to keep the background. This allows for a more versitile finetuned model.
  - You can get a lot more control of final output by using ControlNet. This is especially great if you have illustration skills. But it is also great to generate varitions in a different style but keep the composition and details. In this case you could have taken a portrait photo of the subject and used ControlNet to adjust the style (without and finetuning required).


> However you can get great results by resizing the image to high-res (4-8k) using lanczos then using inpainting to manually diffuse the image at a much higher resolution with prompt control.

Diffuse an 8k image? Isn't it going to take much, much more VRAM tho?


For what it's worth if you actually want to get help on the state of the art on this stuff the best place to ask is the 4chan /g/ /sdg/ threads, and you can absolutely diffuse images that large using TiledVAE and Mixture of Diffusers or Multidiffusion, both of which are part of the Tiled Diffusion plugin for auto1111.

https://i.imgur.com/zOMarKc.jpg

Here's an example using various techniques I've gathered from those 4chan threads. (yes I know it's 4chan but just ignore the idiots and ask for catboxes, you'll learn much faster than anywhere else, at least that was the case for me after exhausting the resources on github/reddit/various discords)


Haha, I've been lurking sdg for the same reason and seen your efforts there which really stand out (in a good way!)

I haven't dared delurk, but if I had it would surely have been to ask for the scoop on how you accomplished these renders.

In this less troublesome venue, care to provide any details?


I'm not the guy who came up with this style! I just vacuum up catboxes and do a lot of tests/repros. I'm not at my main computer so I can't give you exact gen parameters, but the cliff notes version is:

  - DPM++ 2M K or DPM++ SDE K w/ a high eta are the best samplers in general, the former was used here
  - search seed space at whatever res is reasonable (512x704) before upscaling
  - once you have a good seed hiresfix with ultrasharp or remarci @ 0.5-0.6 denoise (i prefer the former, ymmv)
  - do a second diffusion pass in i2i on top of the hires, but with a simpler prompt focusing on high quality e.x. "gorgeous anime wallpaper, beautiful colors, impeccable lineart"
  - for the second pass (this will be taking you from ~1k x 1.5k to 2.5k x 4k) you're gonna run out of VRAM if you're trying to diffuse traditionally so use the Tiled Diffusion addon with Tiled VAE enabled as well to conserve vram. DDIM seems to work best here though I've gotten good results with the two samplers above as well.
  - using a style finetune helps a lot, Cardos Anime was used here
  - when in doubt, search Civit.ai or huggingface, there are tons of great models/LoRAs and textual inversions out there and if you have anything specific in mind having some form of finetune helps a ton
Obviously you're going to need to know how to prompt as well which is definitely not something I can explain in a single post, just like any kind of art you just have to practice a bunch to gain an intuition for it.

P.S. I've recently started a patreon, if any of you'd like to support my work on this stuff. I'm a big believer in sharing all my knowledge, so most of it will come out for free eventually, but I gotta eat. [0]

[0] https://www.patreon.com/thot_experiment


> gotta eat

Yeah :) Thanks for your infodump! I'm mostly curious how you achieve the insane amount of "clutter" in these images - is it done by referencing a specific artist or style in the prompt, or just by some difficult-to-find key phrase? I haven't been able to get anything near.


I think it's the tiled diffusion second/third pass that does it, because you're essentially doing many gens on individual tiles of the image there's a natural visual density that SD tends toward and since this is composed of many gens the density is increased in the overall picture, that being said it's not something I've tested super extensively and mostly only with this sort of style.


> but I gotta eat.

So do the people who made the art used to train these things.


Do not start with me with this shit. Of course artists need to eat. I'm on the fucking internet essentially begging because I struggle with capitalism and the idea that almost all forms of employment force me to restrict my intellectual property, or assign it to someone else. Every time I'm forced to do it because the bank account is low I can feel it grinding down my soul. It is antisocial to prevent people from building upon my ideas (or any ideas). We should take every step we can to strengthen the commons. (see "Everything is a Remix"[0])

I dropped out of fucking high school and was homeless couch surfing in LA for years trying to break into steady VFX work, I don't think for a lack of skill/blood/sweat/tears[1]. I'm well aware artists need to eat.

The problem isn't technological progress, (jfc this is all open source!!! stability is doing god's work by undermining the value of ghouls who try to restrict access to these models for personal profit). It's certainly not that copyright is too weak in preventing people form building off the aggregate styles and work of artists. I learned style by fucking copying the masters[2]!! This is how art is supposed to work. The problem it's the disgusting economic/political system that has twisted copyrights and patents into a device to enable rent seeking for those who can afford to purchase/commission them rather than protecting the innovation and creativity that they were meant to.

[0] https://www.youtube.com/watch?v=nJPERZDfyWc

[1] https://www.reddit.com/gallery/unqmux

[2] https://imgur.com/a/NDRWVj8


What about the artists whose art said artists used to train themselves and so on and so forth? How much do we owe the people we learn from?

So many artists copy/adapt Disney's style totally anonymously for example. Should they be paying royalties to Disney?

As a human being, if I browse DA and reference other people's work/styles while I muck about on a graphics tablet am I in the wrong?


While I'm strongly on team "train the AI on everything and reduce IP protections to strengthen the commons" as much as such a team exists, I think it's important to point out that this is a argument that misses the forest for the trees.

It's only relevant when taking a very logical/semantic view of the situation, when you take a more holistic view it breaks down. The scale and implications of the two things you're comparing are completely different. While I probably agree with you in general I think these sorts of arguments don't really help people understand our point of view because they sidestep the economic and social implications which are at the heart of this.


This is amazing. Do you have some place where you've put up your work?


I appreciate it, I'm trying to be a great artist here so this isn't really my work, it's 90% stolen from anons on 4chan. ;) I post on twitter[0] a reasonable bit but SD art isn't a big focus for me. I also recently started a patreon which is another post in this thread if you're interested in supporting.

[0] https://twitter.com/thot_exper1ment


> ask for catboxes

What's that?


https://catbox.moe/ ... It's a filehost popular among 4chan users. Basically OP asking folks to "show me the code"


auto1111 saves generation parameters as text into exif/png data chunks, so you can re-create generations done by other people/yourself in the past. 4chan strips metadata from images, hence the need for catbox.


That confused me at first too.You aren’t diffusing the 8k image.

You are upsampling, then inpainting sections that need it. So if you took your 8K and inpainted a section with 1024x1024 that works well with normal ram usages. In Auto1111, you need to select “inpainted masked area” to do that.


Yeah exactly this. It doesn't need more VRAM since you inpaint small sections (~512x515) of the image manually. It takes more time but gives the best results.

The automated equivalent is SD Upscale or TiledVAE


To clarify when things are upscaled like that they typically mean a section of img2img in a grid pattern to make up the full picture so it doesn’t overuse vram.


For outpainting there are these two amazing tools which give you a canvas to do stuff

https://github.com/zero01101/openOutpaint

https://github.com/BlinkDL/Hua

Both use automatic1111 API for the work.


Thank you for these recommendations! I'll definitely be trying them next time 'round.


Good luck! I have some workflow videos on Youtube https://youtube.com/pjgalbraith. But I haven't had a chance to show off all the latest techniques yet.


I love how much work went into this.

There's a great deal of pushback against AI art from the wider online art community at the moment, a lot of which is motivated by a sense of unfairness: if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

(I do not share this opinion myself, but it's something I've seen a lot)

This is another great counter-example showing how much work it takes to get the best, deliberate results out of these tools.


> a lot of which is motivated by a sense of unfairness

This is not something I've seen once in any sort of criticism of "AI art", and elsewhere in the internet I'm largely in a anti-ai-art bubble.

Most legitimate pushback I've seen has been more on the non-consensual training of models. Many artists don't want their work to be sucked up into the "AI Borg Model" and then regurgitated by someone else, removing the artists consent, credit, and compensation.


I found it rare that those dead-set against AI art actually concede that it has value after you take copyright out of the equation, bringing up Adobe Firefly instead pivots the conversation to other, considerably weaker arguments.

Using stock art is just further appropriation, which is silly considering the intent and licensing of stock artwork is clearly intended by all parties to turn works into commodities for commercial exploitation.

The old ways are best, the new ways are bad and take away the soul from the creation process and resulting works. Also unconvincing considering that most of the people saying that are using radically different, digitized, heavily time optimized art workflows compared to norm of the industry even 30 years ago.

Not that I don't see the problems, the potential for job losses due to the optimizations to workflow requiring less work and therefore less workers is an actual risk, but one that happens regardless of copyright enforcement of AI models. The problems commercialized AI art workflows cause may even be exacerbated by enforcement of copyright on training data by handing a monopoly of all higher quality generative AI models into already entrenched multinational intellectual property rightsholders hands. I think a lot of artists forget that copyright isn't as much for them as it's for the Disneys of the world.


I don't think there's much wrong with that though. I think the whole copyright/licensing/fair use thing is the one reasonably objective problem with "AI art" at the moment. People might have other concerns, but once you solve the copyright issues, it starts to be a whole lot more personal subjective preferences.


I absolutely have seen it. A lot. It's dressed up as Luddism, more often expressed as "you shouldn't be able to have those results because I spent years honing my craft" which may or may not be followed by "...and if we allow this, those years were wasted and I'm out of a job, along with millions of others".


> It's dressed up as Luddism

> "...and if we allow this, those years were wasted and I'm out of a job, along with millions of others"

Given that the quoted part is highly likely to happen, and not just in art, I hope we're no longer considering "luddism" to be a pejorative.


Luddites were a real group, who really did lose their jobs to technical progress. So it seems fair.

Technical progress does really require adaption sometimes. We don't criticize luddites because we think they were wrong about change being a real thing.


Yep. They were, broadly speaking, right in their predictions, and catastrophically terrible in aligning their tactics to their stated aims.


[flagged]


Why would anyone have to articulate that? They are programs that allow people to make the pictures they have in their minds into a form that others can now look at. People who otherwise wouldn't be able to make these pictures before (because they were bad at drawing or whatever) now can. That's not "necessary" but then again nothing about art really is. It's just fun.


If the generators are genuinely creative, then they are as necessary as any human artist.

If the generators are not creative, then what's the problem?


Almost all technology is unnesaray. Who needs farming when you can hunt and gather?


Our population is way too high for hunting & gathering to be sustainable. Modern farming tech is absolutely necessary to avoid widespread famine.


If the preservation of industries is really that important, where are all the tears for horseback couriers? Hell, the entire horse industry, really.

I wouldn't consider it a pejorative either because I don't see artists mobbing datacenters to smash A100 clusters with pitchforks.


You are absolutely correct. The reaction from artists is sheer terror disguised as a dialogue about whether a machine can learn to make art by looking at other peoples' works just as another human would learn.

I see it as a slow transition though; there's still plenty a human artist can offer over any current model even with a carefully curated prompt. But yes, eventually the whole industry will die down. Especially seeing as models can now generate sounds, 3d models, textures, natural placement of objects on a map, etc. Like everything we've invented a tool to help us do things faster and it will displace people. Tough to say whether it's right or wrong but it's what we've done all through history; move on from one technology to the next. I wonder if traditional artists complained/fretted when digital art/tablets were getting big, hmm.


SD base models can't really be used to imitate style of other artists reliably, because the datasets that they were trained on are a huge mess. Caption accuracy is all over the place. For example - Joan Cornella's work and Cyanide & Happiness comics are in LAION5B, but if you prompt SD to make art in their style you'll get something completely different. Try prompting for a "minigun" - you will also get something weird.

In order to copy style from other artists reliably - you have to make a LoRA yourself. That involves a lot of manual work and it can't really be automated if you want good results.

Artists can opt out of future SD base models (which doesn't matter), but they can't opt out of someone making a LoRA of their work (which actually works).


>> a lot of which is motivated by a sense of unfairness > This is not something I've seen once in any sort of criticism of "AI art"

I've actually seen this a lot.

In my view, it's not coming from professional artists working in the field. Their concern is more that people are ripping off their style, or that AI is making their efforts unnecessary (e.g. lots of people who made a living by copying the style of particular anime & cartoons for fans, no longer have a purpose since AI can do that given enough source material).

Non-professional artists, on the other hand, are still learning and have put a lot of time into their craft and it hasn't paid off yet. They seem to be annoyed that other people are getting results (via AI), without actually having to learn the mechanics of art.

AI basically lets your generic art history major produce lots and lots of pieces, because they can describe artwork well enough and know where to find good samples for the AI. The only thing stopping them was mere mechanical inability, not knowledge of the art space.


> and compensation.

Is this part actually coming from artists? What’s the suggested amount(be it upper quadrillion dollars per second or $0.25/use)?

I think compensation as a condition is only assumed implied, that financial gains are artists’ motives and they actually live off that income. Rather, I see a lot of vocal oppositions to AI image generators that aren’t drawing for profit at all.

So, is the money going to solve it, or is it a wrong assumption, or is it that it will have to be settled by lump sums?


> Is this part actually coming from artists

Yes. The group of artists that are suing Stability AI and Midjourney are calling for consent, credit, and compensation. https://stablediffusionlitigation.com/

> Since then, we’ve heard from peo­ple all over the world—espe­cially writ­ers, artists, pro­gram­mers, and other cre­ators—who are con­cerned about AI sys­tems being trained on vast amounts of copy­righted work with no con­sent, no credit, and no com­pen­sa­tion.

I think the details of credit and compenstation aren't as important, because once you require consent, artists can decide whether they're happy with the compensation model and choose to give consent (or not) based on that.


>Most legitimate pushback I've seen has been more on the non-consensual training of models

Look at the pushback to Adobe’s model.

“Non consent of model input” is just a tool they’re using in the hopes of destroying the tech. Plenty of companies have datasets of these same people’s work where the T&C permits training.

The narrative will switch once you can no longer use the “stealing/consent” argument. They won’t suddenly become fine with this tech just because the dataset consented.


If their own work isn't hoovered up en masse, I'm sure that artists still won't be happy that they can no longer make a living from their profession. But it's a bit rich to think they're being disingenuous in objecting to their own work being co-opted to enable the process of putting them out of work.


I don’t think it’s rich at all. You saw the argument switch within minutes when Adobe Firefly launched. It’s not about the process it’s what legal and social levers can be pulled to stop the tech.

I work in the creative fields so my work will be impacted by this but I realize it’s pointless to fight it.


Unfortunately it's become a meme among AI art haters that AI art is "just inputing text into a text box" despite the fact that is far from the truth, particularly if you want to get specific results as this blog post demonstrates.

Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.


Only if you exclude the countless hours an illustrator has spent developing their craft.


Using AI as a tool to create art takes nothing away from anyone who spent time learning a skill or craft that they use in their own pursuit of expression.

People will be arguing about whether or not art made with AI is art, and artists will just be using it or not. I remember an interview about electronic music where Bjork addressed concerns that if you use a computer to make music, it has no soul, and she said if the person using the machine to make the music puts soul into it, it will have a soul.

I remember David Bowie in the mid 90s saying if he was young in that decade he might not have been a musician, because in the 60s being a musician seemed subversive and at the time of the interview the internet was carrying the flag of subversion.

Anyway, it's interesting to watch these conversations. I'd never claim to know what art is or try to tell someone, but it seems to me that already because of the controversy artists are drawn to AI and further exciting the conversation. Commercial artists seem the most threatened; animators, designers, etc. I understand why, but I don't think arguing that AI isn't "art" is going to help their cause any more than protesting digital painting wasn't art, electronic music wasn't art, and much earlier that photography wasn't art.

All the time these conversations are happening, the art's getting made and we're barreling towards the next 'not art' movement.


> Using AI as a tool to create art takes nothing away from anyone who spent time learning a skill or craft that they use in their own pursuit of expression.

Except all those artists’ art being used without their consent to train these models that subvert the exclusivity of their style, or obviate their work altogether. It ingests their literal effort and eliminates other humans’ need to put forth a similar level of effort.

I’m all for cool new tools, but this is very much like the invention of digital sampling, and models should be required to “clear” all the works that they “sample” for training.


How precisely were those artists trained in the first place? Did they get consent from the artist of every painting they ever studied?


There's a difference between studying a work of art, and literally ingesting that work of art for training. A study is always an interpretation.


> Except all those artists’ art being used without their consent to train these models

As someone just brought up previously, though, there are other AI art models that don't use other people's work without permission.

For example, the new adobe ai stuff is done this way. And yet people will refuse to concede the argument at that point, and still think it is bad.


being sympathetic to that requires pretending that the user would have ever commissioned an artist for that idea at all. both the transaction and the idea would have simply never happened. it was never valuable enough or important enough to commission a human, hope you got the correct human, wait week after week for revision after revision.

people that want to hone a niche discipline for themselves still can do that. just be honest about doing it for yourself.


It's a meme because 99% of the ai art creators don't go that deep, they only prompt.

Even if they did have a more complex workflow most of them are still based on copyrighted training data, so there will be many lawsuits.


> Some modern AI art workflows often require more effort than actually illustrating using conventional media.

Then why don’t they illustrate it instead, and save themselves some time?


Why don't you buy your cake and cookies at a bakery instead of making them yourself at home?


I’m not sure I understand your argument. If you’re suggesting that illustrating by hand requires more effort than automatically generating an image with AI, then we agree with each other.


> Some modern AI art workflows often require more effort than actually illustrating using conventional media. And this blog post doesn't even get into ControlNet.

Indeed. Another criticism that I can definitely somewhat see the idea behind, is that the barrier to entry is very different from for example drawing. To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware. One way or another, if you want to practice AI generated art, you need more money than what a pen and paper cost.


The cost has gone way down in the last couple months.

With a super-cheap T4 GPU (free in Google Colab), PyTorch 2.0, and the latest diffusers package, you can now generate batches of 9-10 images in about the same time it took to 4 images when Stable Diffusion was first released. This drastically speeds up the cherry-picking and iteration processes: https://pytorch.org/blog/accelerated-diffusers-pt-20/

Google Cloud Platform also now has preview access to L4 GPUs, which are 1.5x the cost of a T4 GPU but 3x throughput for Stable diffusion workflows (maybe more given the PyTorch 2.0 improvements for newer architectures), although I haven't tested it: https://cloud.google.com/blog/products/compute/introducing-g...


We're minmaxing those costs thanks for the data


> is that the barrier to entry is very different from for example drawing

Thqt got me thinking. I agree, but from another perspective: the skillset is different. Traditionally, the approach to art was very bottom-up. Start with a pen and basic contouring techniques. Understanding more advanced techniques require a lot of work (perspective, shadows, etc).

"AI" art generally does away with basic techniques. The emphasis is more on composing, styling. A top-down approach. "AI" artists may be able to iterate quicker by seeing "almost-finished" versions quickly (though a skilled artist can most likely imagine their work pretty well).

But most of all, the tools and required skills are very different. You don't need to know a lot about machine learning, but it certainly helps. Probably pretty far from the skillset of most current artists. And people generally fear what they don't understand. And if I was an artist, I'd be at least a bit concerned about (i) it undercutting the value of my art, (ii) having to learn this alien way of doing things to remain competitive (by way of selection, artists probably enjoy their current tools.

Anyway, I imagine photography was similarly upsetting in a lot of ways. It also didn't happen overnight. I also suspect we are going to see similar improvements to output quality as in early days of photography.

Another similarity is with digital music (and recording/remixing before that). I wonder if we're going to see new genres emerge as a result (the equivalent of techno/electro).


Your comment in particular captures it, but I can imagine a lot of the same sort of comments on this post being made about film cameras when they came out, then again about digital cameras.


Digital cameras made burst photos go from $.25+ a frame at 5FPS to effectively free with rates at 30+FPS now. That was transformative but also lead to all sorts of lamentations about lack of skill


I remember my university photography club trying to get digital cameras banned from campus because "art only happens in the darkroom".


Every new technology which can be applied where existing art or artisanship exists causes similar lamentations, LLMs and image (which includes video) generation (especially because it spans visual styles from “photograph-like” to “cartoonish” to “3d-rendered” to…) just is doing for an unusually wide swath at the same time.


Before AI art there was 3D art. You need to get good at 3D sculpting, but the renderer handles a lot of hard stuff for you. Perspective, shadows, global illumination, light refraction, caustics, realistic materials, etc Some people also didn't consider it to be "real" art.


And yet 3D artists are also reckoning with the fallout of generated images: https://news.ycombinator.com/item?id=35308498

This complain in particular strikes me as someone who enjoyed the process more than the result. Very specialized crafting skills, now made, not useless, but not required to obtain a similar result. And, if reducing things to a market, competing with a very orthogonal set of skills.


3D artists can also generate concept art easier and there are many models that can generate tileable textures.

"3D Stable Diffusion" doesn't exist yet, but some projects look promising.


There are plenty of traditional art mediums that require significant financial outlays to get started: oil painting, ceramics, glass blowing etc.

There are plenty of free online tools for using all kinds of AI image generation techniques, and they don't require powerful hardware, just something that can browse websites or run Discord.


Plus training, lessons and inspiration. And talent.

It’s like with dreams. They can be terribly intricate and detailed, but ask me to draw something creative and I’m out.


I've probably spent at least an hour a day working in Stable Diffusion and Automatic1111 since January or so. At this point I'd call it my hobby, as instead of playing video games I'm plowing time into this. And I'm definitely seeing marked improvements in my style and what I'm looking for. I'll often start with a shotgun approach and make 64-128 pictures with a basic prompt of what I'm looking for. Sometimes there's a shape and basic composition that makes me gasp it's so much better than the others. So I'll feed that into img2img or inpainting, iterate on it, tweak settings that I just sort of have an intuitive feel for what turning the knobs does, and while away for an hour or two to make it exactly what I want. There's definitely a "dreamy fantasy art style" I'm a big fan of that would have cost me an absolute fortune to commission even one image a year ago. I can't match what is coming from the top artists on Artstation (I tried a few weeks back which was humbling, some astounding work). But it's good enough that my D&D buddies are entertained and amazed that I went through and made each one of their characters.

Our DM, being someone who has released creative works, was reticent and less gung-ho on AI for awhile until he decided to start playing with the AI tools (Midjourney in his case) for a new custom campaign he's running. He's suddenly able to develop novel NPC tokens for every important character we meet, the monsters are high resolution and the convenience of using Midjourney in Discord (which he already uses to coordinate our online D&D games) has been a huge boon and enhancement to how much fun and immersive our games are. A year ago this would have cost literally tens of thousands of dollars. He's a published fantasy author so prompting aka describing a scene comes naturally to him. It's been a lot of fun seeing what he's coming up with.

I'm really loving the spark of creativity I've been finding in myself where the turnaround on the old tools was too long for me to not get frustrated and give up, and to see it amongst my friends, even the ones who were initially skeptical.


Yeah, well then, please draw an image of my dog in the style of van Gogh, using pen and paper. I would say that for most of us, the more cost-effective way to get high quality artwork will still be Stable Diffusion...


> To draw, you need a pen and a paper, and you can basically start. To start with Stable Diffusion et al, you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware

A 4GB NVidia GPU (sufficient to run Stable Diffusion with the A1111 UI) is hardly “moderately powerful hardware”, and, beyond that, Stable Horde (AI Horde) exists.

OTOH, a computer and internet connection are more expensive than a pencil, even if nearly ubiquitous.


Stable Diffusion works fine on a CPU - on an AMD Ryzen 5700, approx 90s per image (and I believe comparable or faster on my old i7-6700). If you want to kick off a batch in the background while you work on something else, that's plenty fast. (I use: https://github.com/brycedrennan/imaginAIry).


Kaggle and Google colab both provide free access to GPUs powerful enough to run stable diffusion. Alternatively, you can use other people's GPUs through Stable Horde or one of the free services online, though this comes with more limitations regarding image size and implemented extras like Lora and textual inversion and so on.


you need either A) paid access to a service, B) money to purchase moderately powerful hardware or C) money to rent moderately powerful hardware.

None of this is true, you can easily use a colab and load the models on a drive, I have always done it this way and it works perfectly.

Also if you just need to start, there are plenty of free interface online to use, colab if actually you want to dive in.


Stable Diffusion doesn't really need powerful hardware, any graphic card will do, it will just be a bit longer. There's even ports on smartphones nowadays.


From what I read on the internet, people assume AI generated art is a difficult question legaly speaking. Some literally assume artists complain only because there are out competed.

I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak. I really dont see what's difficult with that case. I think the internet assume a bit to quickly it's a difficult question and a grey area when maybe it just isnt.

It's noteworthy that Adobe did things differently than the others and the way they did things goes in the direction im describing here. Maybe it's just confirmation bias.


> I disagree - I think that AI generative art is an easy case of copyright infrigement and an easy win for a bunch of good lawyers.

> That’s because you can’t find an artist for a generated picture other than the ones in the training set.

First, that’s clearly not true when you are using ControlNet with the input being human generated, or even img2img with a human generated image, but second and more importantly…

> If you can’t find a new artist, then the picture belongs to the old ones, so to speak.

That’s not how copyright law works. The clearest example (not particularly germane to the computer generation case, but clearly illustrative of the fact that “can’t find another artist” is far from dispositive) is Fair Use noncommercial timeshifting of an existing work: it is extremely clear there is no artist but that of the original work, and yet it is not copyright infringement.

> I really dont see what’s difficult with that case.

You’ve basically invented a rule of thumb out of thin air, and observed that it would not be a difficult case if your rule of thumb was how copyright law works.

Your observation seems correct to that extent, the problem is that it has nothing to do with copyright law.

> I think the internet assume a bit to quickly it’s a difficult question and a grey area when maybe it just isn’t.

IP law experts have said that the Fair Use argument is hard to resolve.

Assuming the lawsuits currently ongoing aren’t settled, we’ll know when they are resolved what the answer is.


It’s not as simple as that though because the algorithm does learn by itself and mostly just uses the training data to score itself against, it doesn’t directly copy it as some people seem to think. It can end up learning to copy things if it sees them enough times though

“you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak”

I don’t think that’s valid on its own as a way to completely discount considering how directly it’s using the data. As an extreme example, what if I averaged all the colours in the training data together and used the resulting colour as the seed for some randomly generated fractal or something? You could apply the same arguments - there is no artist except the original ones in the training set - and yet I don’t think any reasonable person would say that the result obviously belongs to every single copyright owner from the training set


> an artist for a generated picture

Normally - outside the specific context of AI generated art -, there is not a relation "work¹ → past author" , but "work → large amount of past experience". (¹"work": in the sense of product, output etc.)

If the generative AI is badly programmed, it will copy the style of Smith. If properly programmed, it will "take into account" the style of Smith. There is a difference between learning and copying. Your tool can copy - if you do it properly, it can learn.

All artists work in a way "post consideration of a finite number of past artists in their training set".


But this person’s dog isn’t in the training set, so why should some artist be credited for a picture they never drew? Not a single person has drawn his dog before, now there is a drawing of his dog, and you want to credit someone who had no input to the creative process here?


"Input into the creative process" is surely broader than simply "painted the portrait". Artists most certainly never consented to have their works used as training data. To this extent, they might be justifiably pissed off.

Artists and designers have furthered their careers (and gained notoriety) by 'ripping off' others since the dawn of time. This used to require technical artistic ability; now less so. The barrier to entry is.... not necessarily lower now, but different.


If you can find a new artist then I think the picture belongs to him.


That’s not (either positively or negatively) how copyright law works. You can not have a new artist and still not be infringement, and you can have a new artist (as is the case for derivative works by someone other than the original artist) and still be infringement if neither licensed nor Fair Use.


> That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak.

It doesn't belong to the "old ones", it is at best a derivative work. And even writing a prompt, as trivial as it might seem, makes you an artist. There are modern artists exposing a random shit as art, and you may or may not like it, but they are legally artists, and it is their work.

The question is about fair use. That is, are you allowed to use pictures in the dataset without permission. It is a tricky question. On one extreme, you won't be able to do anything withing infringing some kind of copyright. Used the same color as I did? I will sue you. On the other extreme, you essentially abolish intellectual property. Copying another artist style in your own work is usually fair use, and that's essentially what generative AI do, so I guess that's how it will go, but it will most likely depends on how judges and legislators see the thing, and different countries probably will have different ideas.


I don't believe it's a tricky question at all: "Did you train your model on an artist's work that has copyright protection without permission?" is the very simple, straightforward question. The fact of its being an AI model is irrelevant.


Did you train _yourself_ on an another artist's work that has copyright protection without their permission? Yes? That's ok, because copyright law doesn't care.


If you have an argument, spell it out. Machine learning is not human learning. They are not the same. Arguing "because human beings learn" does not lead to the conclusion "therefore training models on an artist's work and thereby producing work that looks like theirs is okay". Or, if that is your argument, you have to do more work than just gesturing vaguely at the word "learning".


>That's because you can't find an artist for a generated picture other than the ones in the training set. If you can't find a new artist, then the picture belongs to the old ones, so to speak

We have some countries where it is explicitly legal to train AI models on copyrighted data without consent, and precedent in the US that makes this a plausible outcome there as well.

Could you explain what portion of copyright law you believe would cover this argument? I'm not a lawyer, but have a passing familiarity with US copyright law, and in it, at least, I do not know of anything that would support the idea you're proposing here. How would you even assign copyright to the "old" artists? How are you going to determine what percentage of any given generation was influenced by artists X, Y, Z?


> AI generative art is an easy case of copyright infrigement...

Agreed. An AI model trained on an artist's work without permission is IP infringement and this should be widely understood. Unfortunately, because the technology is new people do not understand this. When Photoshop was new, there was a similar misunderstanding. People could take an artist's work, run it through Photoshop, and then not compensate the artist. It took some time for that to sort out.


I agree. This is a clear-cut case of copyright infringement, as is all art. After all, people painting images have only seen paintings other people painted.


Your snarky argument is against the concept of copyright itself. Such a radical point of view deserves better exposition.


There's copying and there's being inspired by. We don't know where AI generators fall yet, legally. The GP's argument of "they've only been trained on images, so everything is infringement" is a fallacy. Humans have been trained on images too.


The fallacy is thinking that macine learning is like human learning. It is not the same at all, and yet people continuously conflate the two whenever discussions like this come up. "Because humans learn, there is no problem with AIs being trained on an illustrator's entire corpus so that it outputs works convincing enough to be a forgery" is the argument you are making and it's a bad one.


Why does it matter how well it works? If a human produces a forgery, does that constitute copyright infringement? If it's inspired but sufficiently changed, is that a new work?

There's already a line, what does it matter who produced the work? Just judge everyone using the same criteria.


"convincing enough to be a forgery" was the deliberate choice of phrase not "forgery". An illustrator who spent many unpaid hours to hone their craft should get to enjoy the fruits of their hard work. A kid with a GPU and a model who trains it on that artist's corpus is stealing, not "learning". Every illustration from that model is theft.

Even if you don't find the moral argument convincing (and you should), legally protecting creators will encourage creativity, which is the whole point of copyright, while allowing anyone with a laptop to coopt their work will discourage it.


If your argument is that everything produced by an AI is theft, I don't think we'll agree. I don't even know how we'd begin to tell who each Midjourney image stole from.


You have twice now misread something I have written and strawmanned a weaker argument.

But, yes, if Midjourney trained on copyrighted works, anything produced by it is in my opinion IP theft.


The only problem to that, and a big one, is that there’s no way to trace back to the image in the dataset from a final output of AI.

It’s a static mapping, surely it should be possible, you’d think, but NN frameworks aren’t designed that way. That is blocking it from happening(and also allowing “AI is just learning, human is same” fallacy)


The shruggingface submission is very interesting and very instructive.

Nonetheless, it would be odd and a weak argument to point criticism towards not spending adequate «time and effort» (as if it made sense to renounce tools and work through unnecessary fatigue and wasting time). More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

This said, I'd say that Stable Diffusion is a milestone of a tool, incredible to have (though difficult to control). I'd also say that the results of the latest Midjourney (though quite resistant to control) are at "speechless" level. (Noting in case some had not yet checked.)


> More proper criticism could be in the direction of "you can produce pleasing graphics but you may not know what you are doing".

I don't get this. If one "can produce pleasing graphics," how does that not equal knowing what they're doing? I only see this as being true in the sense of "Sure, you can get places quickly in a car, but you don't really know how it works."


> how does that not equal knowing what they're doing

The goal may not be to produce something pleasant. The artist will want some degree of artistic value; the communicator will want a high degree of effectiveness etc. The professional will implicitly decide a large number of details, in a supposedly consistent idea of the full aim. The non professional armed with some generative AI tool may on the contrary leave a lot to randomness - and obtain a "pleasant" result, but without real involvement, without being the real author nor, largely, the actual director.


That seems untrue. In this case, the author set out with a specific goal, then tried to do it, and then succeded.

What are you suggesting, that the author lied in the blog post and actually worked backwards, post hoc? Seems incredibly unlikely based on the details they wrote.


You must have misunderstood what was intended to be expressed. I will reformulate.

"Pleasant" is not necessarily the goal. The goal may be "artistic", which implies some inner logic that the artist intends to express; the goal may be "communicative", which implies that the creator will focus on the effectiveness of the thought-structure presentation.

There is an amount of knowledge that is in the professional that may not be codified for explicit fruition in the machine. This may for example suggest that the knowledgeable author will want a very high degree of control on the output, as the details are involved in his intention - so, he will not leave much space for random randomness.

The z-movies director will shout "cut, good, keep this, next" very frequently; Kubrick will shoot scenes in dozens of takes, even over a hundred. Apprentices in music will explore; Hollis directed authentic professional musicians for very long sessions looking for specific qualities in results, and could delete entire parts for being imperfect or alien, while even excessively nice.

Perfectionist authors are demanding in all sides of creation. (And delegation, as said above, not just detail, will not be left to randomness.)


> if you're not going to put in the time and effort, why do you deserve to create such high equality imagery?

This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall. There’s probably a market for it, but I get the strong impression it’s the “live, laugh, love” market. The people that buy pictures for their wall in the supermarket. The kind of people who pay individual artists money to paint bespoke images of their pet are not going to frame AI art. I don’t think the artists need to worry.


It’s completely what you make it, though. If what’s in the OP isn’t your style you could literally type in anything you want.

I’ve done pictures of my wife in the style of other photographers, Soviet-style propaganda posters, 50s pinups, Alphonse Mucha, and much more.

I’m a professional photographer and have tons of great pictures of our dog - the kind of stuff people pay for. My wife’s lock screen on her phone is something I generated instead.


I would expect it’s only a matter of time till those “traditional” artists also adopt these tools into their workflows. Similar to the initial pushback against the “digital darkroom” which is now the mainstay of photography.

In-ai-aided art, like manually developed film, will trend towards a niche.


> This isn’t high quality imagery. Don’t get me wrong, the tech is cool and I love the work that’s went into making this picture. But this isn’t something I would ever hang on my wall.

Well yeah but that doesn't change the OP commenter's point that it takes a lot of work to get high quality art still.

> I don’t think the artists need to worry.

I disagree here but only on the basis of what type of art it is. Stock art/photography, and a lot of media designwork is likely at risk because we can now create "good enough" art at the click of a button for almost no cost. I agree that the "hang on the wall level good" artists aren't at risk just yet, but between the more filler-art and the uh

Well "anime/furry" commissioners are definitely at risk right now for anything except the highest quality artists, and there is a MASSIVE community behind this - in fact they have done a lot of the innovation for StableDiffusion including optimizations/A1111 webui, and have trained many custom models for their art, already had pretagged datasets of 10k's of images....


Simpler cartooning styles like classical Disney/Warner are actually some of the weakest ones in AI models right now. Prompted diffusion models are poorly suited to the task and tend to capture excess detail in the contour, because they don't have a mechanism for calculating out the geometry of a constructed form, so they don't arrive at the same clarity and simplicity.

What the model can do best is to convert a basic contour sketch into an elaborate rendering, which means that it's the top end of the market that was doing those renderings that's most at risk.


Eh, there might be a market for AI art. As long as the artist is guaranteed to have made only a single one of every piece.


aishit is a reverse turing test. if you find it's output exciting or impressive you can no longer qualify as human.


most of the criticism I've seen is that it's all trained on uncompensated stolen artwork. Much like how copilot is trained on GPL code, disregarding its license terms.


The trained on stolen artwork critique is reasonable - I helped with one of the first big investigations into how that training data worked when Stable Diffusion first came out: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weekno...

It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

I'm under the impression that DALL-E itself used licensed data as well.

I find some people are comfortable with that, but others will switch to different concerns - which indicates to me that they're actually more offended by the idea of AI-generated art than the specific implementation details of how it was trained.


When I did Photography at college, a lot of the work was looking at other works of art. I spent a lot of time in Google Images, diving through books from the Art section and going to galleries. Lots of photo copying was involved!

I then did works in the style of what I’d researched. I trained myself on works I didn’t own, and then produced my own.

I kind of see the AI training as similar work, just done programmatically vs physically.

Certainly a very interesting topic.

I can’t get my head around how far we’ve come on this in the last 6-12 months. From pretty awful outputs to works winning Photography awards. And prints of a dog called Queso you’d have paid a lot of money to an illustrator for.


I think it's more analogous to if you had tweaked one of those famous works directly in photoshop then turned it in. The model training likely results in near replicas of some of the training data encoded in the model. You might have a near replica of a famous photograph encoded in your head, but to make a similar photograph you would recreate it with your own tools and it would probably come out pretty different. The AI can just output the same pixels.

That's not to say there aren't other ways you might use the direct image (e.g. collage or sampling in music) but you'll likely be careful with how it's used, how much you tweak it, and with attribution. I think the weird problem we're butting up against is that AFAIK you can't figure out post-facto what the "influence" is from the model output aside from looking at the input (which does commonly use names of artists).

I work on an AI image generator, so I really do think the tech is useful and cool, but I also think it's disingenuous (or more generously misinformed) to compare it to an artist studying great works or taking inspiration from others. These are computers inputting and outputting bits. Another human analog would be memorizing a politician's speech and using chunks of it in your own speech. We'd easily call that plagiarism, but if instead every 3 words were exactly the same? Hard to say... it's both more and less plagiarism.

Just how much do you need to process a sampled work before you need to get permission of the original artist? It seems to be in music that if the copyright holder can prove you sampled them, even if it's unrecognizable, then you're going to be on the hook for some royalties.


"The model training likely results in near replicas of some of the training data encoded in the model."

I don't think that's true.

My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.

How much of an impact is the thing that's influenced by the prompt.

One way to think about it: the Stable Diffusion model can be as small as 1.9GB (Web Stable Diffusion). It's trained on 2.3 billion images. That works out as 6.6 bits of data per image in the training set.


Right. Apart from some (extremely famous) pieces of art that have been heavily repeated in the dataset you’re not going to be able to come close to recreating something directly.


Don't you think one of the images could be perfectly or perfectly enough encoded in that 1.9GB though? A funny example is Malevich's Red Square. Highly compressible! [0] Line drawings also can often be compressed to a polynomial.

> My understanding is that any image generated by Stable Diffusion has been influenced by every single parameter of the model - so literally EVERY image in the training data has an impact on the final image.

That's pretty interesting. Need to dig into the math more (lazy applications dev).

[0]: https://en.wikipedia.org/wiki/Red_Square_(painting)


Even if true for a small number of edge cases, i don't think that says anything meaningful about the model in general.


>It's interesting to ask people who are concerned about the training data what they think of Adobe Firefly, which is strictly trained on correctly licensed data.

If they truly got an appropriate license agreement for every image in the training set then I have no issues with that.

>I'm under the impression that DALL-E itself used licensed data as well.

DALL-E clearly used images they did not have a license for. Early on it was able to output convincing images of Pikachu and Homer Simpson. OpenAI certainly didn’t get licensing rights for those characters.


There's an argument to be made that drawing Pikachu should not be allowed, certainly. I think it's harder to make the argument that humans should be allowed to, but AI not.

What ongoing litigation I'm aware of seeks to close that loophole and make fanart illegal, which would be a first step towards also preventing AI art.


In terms of copyright, I don't believe there's any issue with drawing Pikachu, unless it's an exact replica of someone else's drawing.

Not sure if there would be trademark issues. But that would be the case regardless of how the image was created.


> I don't believe there's any issue with drawing Pikachu

In practice? No, not really. But Pikachu is a copyrighted character and only those with license to do so are actually legally allowed to reproduce Pikachu in media.

Trademarks can come into play like you said, but even just base copyright allows for the ownership of characters such as Pikachu or Batman or whatever.


What about drawing Pikachu for personal use? Putting it on a wall in my apartment, for example?


I think the more correct argument is that Stable Diffusion effectively did a Napster to force artists into shit licensing deals with large players who can handle the rights management. It’s unlikely that artists would’ve ever agreed to them otherwise, but since the alternative now is to have your work duplicated by a pirate model or legally gray service, what are you going to do? This seems borne out by the fact that Stability AI themselves are now retreating behind Amazon for protection.


I think you can both think that Adobe's model is ethical, but also personally just not like the trend or tools.


The general argument (IANAL) is that it's Fair Use, in the same vein as Google Images or Internet Archive scraping and storing text/images. Especially since the outputs of generated images are not 1:1 to their source inputs, so it could be argued that it's a unique derivative work. The current lawsuits against Stability AI are testing that, although I am skeptical they'll succeed (one of the lawsuits argues that Stable Diffusion is just "lossy compression" which is factually and technically wrong).

There is an irony, however, that many of the AI art haters tend to draw fanart of IP they don't own. And if Fair Use protections are weakened, their livelihood would be hurt far more than those of AI artists.

The Copilot case/lawsuit IMO is stronger because the associated code output is a) provably verbatim and b) often has explicit licensing and therefore intent on its usage.


>it could be argued that it's a unique derivative work

Creating a derivative work of a copyrighted image requires permission from the copyright holder (i.e., a license) which many of these services do not have. So the real question is whether AI-generated "art" counts as a derivative work of the inputs, and we just don't know yet.

>b) often has explicit licensing and therefore intent on its usage

It doesn't matter. In the absence of a license, the default is "you can't use this." It's not "do whatever you want with it." Licenses grant (limited) permission to use; without one you have no permission (except fair use, etc. which are very specifically defined.)


"Creating a derivative work of a copyrighted image requires permission from the copyright holder"

That's why "fair use" is the key concept here. Under US copyright law "fair use" does not require a license. The argument is that AI generated imagery qualifies as "fair use" - that's what's about to be tested in the courts.

https://arstechnica.com/tech-policy/2023/04/stable-diffusion... is the best explanation I've seen of the legal situation as it stands.


If a person trained themselves on the same resources, and picked up a brush or a camera and created some stunning art in a similar vein, would we look at that as a derivative work? Very interesting discussion. Art of all forms are inspired by those who came before.

Inspired/trained… I think these could be seen as the same.


I don't think we should hold technology to the same standards as humans. I'm also allowed to memorize what someone said, but that doesn't mean I'm allowed to record someone without their knowledge (depending on the location)


Training a human and training a model may use the same verb but are very different.

If the person directly copied another work, that's a derivative work and requires a license. But if a person learned an abstract concept by studying art and later created art, it's not derivative.

Computers can't learn abstract concepts. What they can do is break down existing images and then numerically combine them to produce something else. The inputs are directly used in the outputs. It's literally derivative, whether or not the courts decide it's legally so.


> Computers can't learn abstract concepts

Goalposts can be moved on whether it has "truly learned" the abstract concept, but at the very least neural networks have the ability to work with concepts to the extent that you can ask to make an image more "chaotic", "mysterious", "peaceful", "stylized", etc. and get meaningfully different results.

When a model like Stable Diffusion has 4.1GB of weights and was trained on 5 billion images, the primary impact of one particular training image may be very slightly adjusting what the model associates with "dramatic".

> If the person directly copied another work, that's a derivative work and requires a license

Not if it falls under Fair Use. Here's a fairly extreme example for just how much you can get away with while still (eventually) being ruled Fair Use: https://www.artnews.com/art-in-america/features/landmark-cop... - though I wouldn't recommend copying as much as Richard Prince did.

> The inputs are directly used in the outputs

Not "directly" - during generation, normal prompt to image models don't have access to existing images and cannot search the Internet.


> Computers can't learn abstract concepts

I would say that abstract concepts is the only thing that computers can learn at the moment, at least until they are successfully embodied.

> It's literally derivative, whether or not the courts decide it's legally so.

To be a derivative work you should be able to at least identify the work it is a derivative of. While SD and friends can indeed generate obviously copyright infringing works (then again so can photoshop or a camera or even a paintbrush), for the vast majority of the output you can at best point out to the general direction of an author or a style.


> Creating a derivative work of a copyrighted image requires permission from the copyright holder

It does not (in US law) if it falls within Fair Use, which is an exception to what would otherwise be the exclusive rights of copyright holders.


> Especially since the outputs of generated images are not 1:1 to their source inputs, so it could be argued that it’s a unique derivative work.

I think what you mean to say is that the argument is that both the models themselves and (in many cases) the output from the models, to the extent it might otherwise be a derivative work of one or more the input images, are transformative uses. [0]

[0] https://www.nolo.com/legal-encyclopedia/fair-use-what-transf...


AI is just showing us a fact that many are unwilling to admit: everything is a derivative work. Much like humans will memorise and regurgitate what they've seen.


In a colloquial way, that's indeed the case.

But the term "derivative work" has specific legal meaning. For example wikipedia states: "[derivative work] includes major copyrightable elements of an original, previously created first work (the underlying work)".

I would say that while that might be the case for some AI creations, it is certainly not true for all (or even a majority) of them.


TBH it would be much easier with more streamlined tooling, especially if doing it locally with lora/lycoris.

Its kinda like using ffmpeg for vapoursynth for video editing instead of a video editing GUI.

That being said the training parameter/data tuning is definitely an art, as is the prompting.


I love the detailed workflow that OP posted. Dogs seem to be particularly good subject material for this.

I turned my dog into a robot awhile back using the img2img feature of Stable Diffusion and the results were pretty amazing![1]

[1] https://twitter.com/davely/status/1583233180177297408


a lot of which is motivated by a sense of unfairness

Say you generate a picture with midjourney - who is/are the closest artist(s) you can find for that picture?

Not the AI, not the prompter, so the closest artists you can find for that picture are the ones who made the pictures in the training set. So generating a picture is outright copyright infringement. Nothing to do with unfairness in the sense of "artists get out compete". Artists dont get out compete - they are stolen.


Typical Midjourney workflow involves constantly reprompting and fine tuning based on examples and input images. When you arrive at a given image in Midjourney, it’s often impossible to recreate it even with the same seed. You’ll need the input image as well, and the input image is often the result of a long creative process.

Why is it you discount the creative input of the user? Are they not doing work by guiding the agent? Don’t their choices of prompt, input image, and the refinement of subsequent generated images represent a creative process?


I agree with you on the technicality - if we say the promter is an artist, then the picture belongs to him.


I've done so much with a fine-tuned model of my dog.

I previously made coloring pages for my daughter of our dog as an astronaut, wild west sheriff, etc. They're the first pages she ever "colored," which was pretty special for us. Currently I'm working on making her into every type of Pokemon, just for fun.


I uploaded a couple of the Pokemon generations really quick as examples. I still need to go through and do quick fixes for double tails (the tails on Pokemon are not where they are on regular animals, apparently), watermarks, etc. and do a quick Img2Img on them.

https://imgur.com/a/11OxoSA


For generating Pokemon, I recommend using this model along with a textual inversion of your pet: https://huggingface.co/lambdalabs/sd-pokemon-diffusers


Looking at the data set, it's a shame each Pokemon isn't at least named.

Or, sometimes even worse, here's the caption for the Jolteon image: "a drawing of a yellow and white pikachu"

Still, might be worth a try. Thanks.


Textual inversion? Not a LoRA / LyCORIS?


LoRA requires input images roughly in the same domain, and the Pokemon model is bespoke.


These are great!


Thanks. They aren’t necessarily the best ones - I just uploaded some quickly. Like I said, they still need final touches too. I probably should have worked on the prompt a bit more before I went all in too.

For anyone else doing it, the ability to do something like [vaporeon:cinderdog:.5] so it starts with a specific Pokémon and transitions into the dog later was great for some types.

One of the fun things about this sort of thing are happy accidents. One of the fire types generated as two side by side - a puppy and an evolution.


Using which tools, specifically?


Stable Diffusion, generically.

StableTuner to fine tune the model - I can't recall the name of the model I trained on top of, but it was one of the top "broad" 1.5 based models on Civitai. Automatic1111 to do the actual generating. I used an anime line art LoRA (at a low weight) along with an offset noise LoRA for the coloring book pages as otherwise SD makes images be perfectly exposed. For something like that you obviously want a lot more white than black.

EveryDream2 would be another good tuning solution. Unfortunately that end of things is far from easy. There are a lot of parameters to change and it's all a bit of a mess. I had an almost impossible time doing it with pictures of my niece, my wife is hit or miss, her sister worked really well for some reason, and our dog was also pretty easy.


Do you need an m1 macbook to do this? I have a 2015 macbook pro..


Stable Diffusion can run on Intel CPUs through OpenVINO if you don't have a GPU or the funds to rent one online (Google Collab is often used). You still need a decent amount of RAM (running SD takes about 8GB, training seems to run at 6-8GB) so I'd consider 12 or 16GiB of RAM to be a requirement.

There's a huge difference in performance (generating an image takes 10 minutes rather than 10 seconds and training a model would take forever) but with some Python knowledge and a lot of patience it can be done.

Apple's Intel Macbooks are infamous for their insufficient cooling design for the CPUs they chose, which won't help maintaining a high clock speed for extended durations of time; you may want to find a way to help cool the laptop down to give the chip a chance to boost more, and to prevent prolonged high temperatures from wearing down the hardware quicker.


I’m on Windows, sorry. There are some colabs where you can do both the training and generation though.


theres lots of online services too if you dont have the hardware - e.g. https://www.astria.ai/


I liked the original more than the final version. The vector style drawing was much more futuristic and more interesting.

Seems like lots of work went into that and I hope the author enjoyed the process and enjoys the final result.


I did too, and I even liked the aggressive cropping. Totally subjective though. The final result was beautiful as well, and this was a joy to read.


I did something loosely related. As a present for my girlfriend's birthday, I made her a "90s website" with AI portraits of her dog: https://simoninman.github.io/

It wasn't actually particularly hard - I used a Colab notebook on the free tier to fine-tune the model, and even got chatGPT to write some of the prompts.


In my (limited) experience, dogs seem to be easier than people for fine-tuning - especially if your end result is going to be artsy. Faces of people you know well being off in slight ways really throws you off, but with dogs there's a bit more leeway.


hah, these are pretty cool! Well done!


He mentions the Colab for Dreambooth, that only takes ten minutes or so to train using an A100 (the premium GPU) and you can have it turn off after it finishes, and saves to Google Drive. Super easy.


Yeah!

Here's the colab notebook, in case anyone is interested: https://github.com/TheLastBen/fast-stable-diffusion

I've trained a few smaller models using their Dreambooth notebook, but I think for 4000 training steps, an A100 will usually take 30-40min. I believe replicate also uses A100s for their dreambooth training jobs.


Ah I see, you're right 40 minutes sounds about right for that amount of training. Curious why the decision to train 40 images? I've used 15 for two separate subjects in Dreambooth with excellent results. I'm no expert, experimenting the same way as you, but haven't trained on more than 15-20 images per subject.

I've found the most important part is spending a good amount of time getting the prompts, although I'm not sure if having the person in an environment embodied and describing the objects around them helps give the model a "sense of scale"? For example if I just train "wincy" in the fast Dreambooth "wincy" will be the only token it'll know, with no other info in the prompts, it didn't know what in the image was "wincy" (me). I accidentally did this on training my wife (no prompts at all) and she got really mad at me at how ugly the results were (you made me ugly! haha)

Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?


FYI we're building a service to make this process even simpler and faster:

dreamlook.ai

Upload your pictures, we train the model in a few minutes, then you can download your trained checkpoint. $1/model, first one for free.

For app builders, we provide a solid API that scales to 1000s of runs per day without breaking a sweat.


I did the exact same thing when I saw DreamBooth for the first time! I showed it to a bunch of friends and they convinced me to turn it into an iOS app. https://apps.apple.com/app/ai-avatar-for-dogs-floof-ai/id165...

People have been sending me the cute pics the AI generates of their pups. I think this is arguably the best thing so far in this latest wave of AI releases!


The dog picture is really nice, but then it’s hung on the same wall as 20 other crowded pieces of (in my opinion) dubious quality.

This would have been much better standalone.


Valuable processing info in the comments. But why so much effort to produce something without the option to have ownership (copyright) over my product? If I draw a strange line with any digital painting tool and put a circle and a square around, I sign this art and this is my Art. If a spend a day with prompting, upscaling, fixing with Control net in the end of the day I will have a funny picture which is not mine.

https://fortune.com/2023/02/23/no-copyright-images-made-ai-a...


Some of us don't care about ownership. You could ask the same question of anyone contributing to open source projects


This is a different use case. Why? Because you make a conscious decision to donate your work. The models which are used for image generating (Midjourney, Stable Diffusion) are full of scraped data without consent from the authors.

From this point of view, Adobe Firefly is obviously ahead:

"The current Firefly generative AI model is trained on a dataset of Adobe Stock, along with openly licensed work and public domain content where copyright has expired.

As Firefly evolves, Adobe is exploring ways for creators to be able to train the machine learning model with their own assets, so they can generate content that matches their unique style, branding, and design language without the influence of other creators’ content. Adobe will continue to listen to and work with the creative community to address future developments to the Firefly training models."

So the only way forward to have an ownership of your product is to train your own models over your own data.


It's unfortunate a lot of the nice artsy detail disappeared when he had to recreate part of the head, but I guess that is inevitable. Great work and interesting writeup.


If anyone wants to try Dreambooth online, I made a free website for this: https://trainengine.ai


I would highly recommend using Photoroom's background removal tool. Does a far, far better job than Photoshop.


PixelMator is a highly competitive native Mac app, has an excellent background remover and unlike photoroom/PS, it's a one time purchase.


But why pick a dog as an example?

Humans are much worse in telling dogs apart than other humans (except perhaps the owner of the particular dog).

So for all we know, the AI didn't generate a portrait of this particular dog but instead a generic picture of this breed of dog.


Mostly because I thought of it more as an art project than a technical accuracy project. However, the honest answer to your question, is because I have a ridiculous amount of photos of my dog on my phone . Getting training data is hard work.

But this is totally true, I found that maybe 30% of the images I generated did not look like my dog at all. However the rest do a good job at capturing his eyes and facial expressions that he actually makes. I thought that the chosen image I worked from captured the look of his eyes super well.

But yeah, nobody but me would really appreciate that.


Because you invent a new word when you train dreambooth and teach it that your subject is an example of that word. The fact that the word you've created returns photos similar to subject is a sign that it worked.


I suppose that dreambooth is pretrained on a large dataset that includes many different dogs.

My point is that it is difficult to judge (for us) that the returned photos are actually similar to the subject.


The paper shows dogs with very distinctive fur coloring. Particularly the corgi with a white strip between its eyes. I think the paper would be completely fraudulent if this dog were also featured heavily in the training set. So the point is the white stripe corgi isn’t in the set, and with a few examples, the model could then generate brand new images of corgis with a similar fur pattern. Maybe all it can do is fur patterns but it’s a start.


I linked this elsewhere but here are Pokémon image generations of my (mutt) dog: https://imgur.com/a/11OxoSA

She’s pretty unique looking and it comes through even with heavy styling.


There might be a few things Draw Things missing from this article: no mask blur, not selecting the inpainting model for inpainting work.

Tomorrow's release should contain both mask blur and inpainting ControlNet, which might help these use cases.


Yeah, it was likely just user error. I actually really love Draw Things, because I can run it locally on my mac and quickly experiment without having to sling HTTP requests or spin up GPUs.

I did the actual work back on March 11th, so I was likely on an older build; but I was seeing issues where inpainting was just replacing my selection/mask with a white background. I had the inpainting model loaded, but couldn't figure it out.

I'm planning to continue playing with Draw Things locally, and exploring the inpainting stuff. For such an iterative process I feel like a local client would make for the best experience.


There is no user error but UX issues :)

That has been said, you probably used paintbrush rather than the eraser? There would be more help on the Discord server though! https://discord.gg/5gcBeGU58f


>I was wrong . it seemed to take the top and bottom-most row of pixels and extend them down from 512px tall to 1344px tall.

I mean you cannot outpaint in the img2img tab, load the image in the inpaint tab and possibly use the inpainting model.


ah-ha! This was probably it


Pretty cool stuff. Personally though, not a huge fan of his “the one” choice. Some of the other images in his assortment were much better imo. Each to their own of course though!


I agree, but I find it pretty cool that they were able to generate and pick from what they wanted. This seems like one of the real strengths of generative AI — people can tune outputs they otherwise couldn’t create (unable to paint, draw, play guitar, etc).

People can debate if it’s actually good that people can create art without being artists, but again, I think it’s great that the author had the freedom to create what they had in mind without much outside influence. This has been a goal for computers in general for so long, and it seems like we’re actually arriving with some mediums.


> Pretty cool stuff. Personally though, not a huge fan of his “the one” choice. Some of the other images in his assortment were much better imo. Each to their own of course though!

Glad to see I’m not alone on this. I think the end result would have turned out much better if the author had simply adhered to the Huichol art palette, which I’m convinced they were aiming for at the beginning. That color scheme works for a reason.

https://en.wikipedia.org/wiki/Huichol_art


Nicely done. I built a t-shirt/mug/frame printing app. I am using stablediffusion (intructpix2pix for selfies) with prompts pulled in from Lexica. The larger images are created with swinir and physical printing is from the good folks at printful.Com.

Big props to folks at replicate.Com for making solid infrastructure for ml.

https://www.ai-ink.me/


Awesome work. I build an app to train dreambooth model and generate images of hich makes this process very easy.

The app also has a rest endpoint for anybody to create app using it. Lot of clients create niche websites catering to different use cases. There is a kind of gold rush going on in this area.

https://aipaintr.com


Also how much $s were spent on the project?


Fantastic! definitely bookmarking. I spent a big part of the last few days attempting this, my model didn't come out nearly so well. I decided that it was because I don't have enough training images, and so have been taking 3x as many pictures of my dog to compensate.


Recently saw a nice replica of Duchamp's work 'Bottle Rack' from 1959. Readymade, but maybe a bit expensive. For the price they asked, I could do it myself in a blacksmithing class and have more fun.


Isn’t disappointing that nothing important is open source these days in AI?



I used Stable Diffusion Dreambooth to generate my github profil picture, what do you think ? https://github.com/dezmou


This is the barely a full step from; "I used Stable Diffusion and Dreambooth to create nudes of a person I know".

Yes, They're Real and They're Spectacular.


All the current stable diffusion commercial applications are in this format: take picture of subject & create a bunch of new portraits of subject.

For example:

- There's one that creates gaming avatars based on your picture - There's one that makes professional headshots for your CV

Even Microsoft's own product (Microsoft Designer) to use AI to create posters & flyers is most useful when you start with an image of your own creation then use the AI to change the style of the image, or integrate it into a template that it dreams up.


This is a great writeup on some of the nuances and gotchas you have to watch out for when finetuning using dreambooth and the generative creative process in general.


It's impressive, The end result was beautiful. I always use to wonder how to generate some meaningful art

Any references where the same has been tried on humans ?


I find it interesting that this green/orange colour palette so commonly appears in midjourney images, seemingly regardless of the subject.



This is really interesting. I do wish the author included the cost to train the model from replicate though.


If I wanted to do this, what kind of specs would I need to have on my Desktop Computer?


Is the link an AI generated tutorial? Write me a blog post tutorial pretend you are …


"art portrait" seems grammatically wrong...?


Results at the top of your article/project please


Great work writing up the process. Much appreciated!


The style looks Andy Warhol to me


What are the tools we can run on a Linux machine?

EDIT: four downvotes and zero answers how to run it on a Linux machine…


You were likely downvoted because you asked how to use it for NFTs, which you just edited it out.


I don’t see why that is relevant. Why is using it for NFTs worthy of a downvote?


The only piece of software mentioned in the article that doesn’t run on Linux is Draw Things.


[flagged]


Highly subjective comment. Art is not something that is either "good" or "not good". It can hold value to the creator intrinsically. Like a kids crayon scribbles.


They like it. And it was a good excuse to work with new tech. Why poo poo on it?


Leaving poo poo on things is a popular passtime for many dog people.


'is a popular passtime for many [] people'

Fixed That For You.

Interestingly for the ethologist, they have habitats: for example, the bottom comments in the stacks in YouTube...


I like my comment, and it was a good excuse to work with new tech, why poo poo on my comment?


Because your comment was pretty objectively inappropriate and improductive - gratuitous. Or did you mean something productive that we should have guessed?


If they like it, then it’s not garbage for them. Mission accomplished. What you think of art on a stranger’s wall isn’t really the point — it’s more so about the technology behind it.

I suppose you could be indirectly commenting about how you think the technology does a bad job generating art, but there are better ways to say it.


Good, so in order to produce good AI-aided graphics the producers will have to become critics, arts experts, with the important side effect of personal elevation and the collective gain of society. "Wins" on all sides.

Update: three minutes later, it seems that somebody did not get the irony.


What a useful and nuanced critique. Thanks!


Personally I paid a friend $200 to create an art portrait of my dog.


Not all of us have friends or 400$


the lengths techbros will go for in order to avoid paying an artist for artwork

as well as, doing all that nn/ml stuff, instead of just, trying to learn a bit of how to make an artwork themselves, how to draw something, even by tracing over a photo, like doing a 'how to do a vector colorful painting dog' search and going off on that.

like, this end result doesn't even look far off from what a 'colorful vector dog portrait' tutorial would yield. it just involves tons and tons of questionably sourced artwork, and violated copyrights. (i know techbros are very confused about copyrights, but stuff like licenses and copyrights actually do have their meanings, limitations, and liabilities)

specifically picking stablediffusion, probably the most blatantly stolen artwork-based model (given how open and clear it is with what data has been used for it, and how you can't just squirm 'i didn't know what were the terms of use of their data' with other, more closed-off services), that's just another great touch as well.


> the lengths techbros will go for in order to avoid paying an artist for artwork

Hiring an artist removes control, it's not your art, but the artist's what ends up on the wall. That's the reason that I avoid hiring artists when I want some art made.

With that said, I have found that at least for now using AI for art is just too much work and too little control. I want my art on the wall, not whatever the AI model outputs. So, I'm in your camp of "just learn to paint the darn stuff". I find it's a lot more fun.


One of my hobbies is 3D art. I love 3D sculpting, but hate texturing and rigging. This is why I'd like the AI to do the unpleasant parts.


why would he pay for an artist when he's happy with what he has? Why do drawcels feel so entitled to be looped in financially for no reason?

I find the complaint about copyright so strange in this case. Copyright has a purpose, stopping some random person from creating an imagine only they will see and use is not that purpose. In this case it's just spiteful. Ultimately, if you think he's infringing your copyright you should sue him, but I don't think you'd win.


besides whether artists should get paid or not, or whether they should be reimbursed for use of their art or not, using something without permission, or rights, or license, without something that'd actually (legally) enable to do so without violating copyright, is just bad in itself.

there's a great alternative to "not paying/refusing to pay" (but using and stealing stuff anyway) - just, not using other people's stuff. not using stuff that's built on copyright/license violations. not using artwork you don't own, that you don't have rights, licenses, or permissions to use. (yes, simply 'taking something and making a model from it', would be a violation.) one could just not do a shitty thing, and they wouldn't have to jump hoops to find any justification for the shitty thing they did.

they could do a step-by-step art tutorial, and wouldn't have to pay anybody, nor use tools that rely on stolen artwork. but nope.

highly ironic how they made this thing, and promptly showed it off to thousands of people on the internet, immediately invalidating your example

they also promote (just by choosing and mentioning all of these things) those services, like Replicate, that monetize the use of stolen artwork (by selling compute, directly coupled with nn models), and ultimately profit from it (solely, without "giving back to artists whose art they perused" or anything).

they could make art in a way that wouldn't participate in tech art theft racket, but they didn't. and they didn't just participate in it, but promote it and perpetuate it.


Copyright laws evolve together with evolving tech, and it's interesting to think about how they could be refined for this purpose. They're currently very clearly stopping somebody from blatantly selling prints that are copies of another artist's work. Now, you could argue that these models merely look at art that is available for free consumption for humans, and draw inspiration from it, like a human artist would. Just think of the music industry, and how many songs are (legally) similar for the same reason.

Is the solution to prohibit any type of processing of such images, or should it specifically prohibit the calculation of derivatives of the loss function based on those images, or something else entirely?


> yes, simply 'taking something and making a model from it', would be a violation.

is there precedent for this? How are the services you reference operating if it were so clear?


they're operating while ignoring and violating copyrights and licenses, and opening themselves to a possibility of a lawsuit. there's getty v stability so we're gonna find out.


This sounds a lot like wagoners complaining about how these newfangled automobiles are profiting off their wagon designs, and not compensating them for it.


No, a better subject for that analogy would be software engineers complaining about ChatGPT's coding abilities. Wagons & cars are both a matter of mechanical engineering. A digital calculator didn't displace pen-and-paper computers in the way that power drills didn't displace carpenters. Transcribing and operating a printing press are both rote procedures where the skill involved achieves accuracy and the materials involved are mostly the same. But operating Stable Diffusion and painting in PhotoShop with a stylus require wildly different modes of operation, and enjoying or being good at the latter hardly suggests that one would enjoy or be good at the former.

This sort of image generation (especially extrapolating probable upgrades & improvements of just the next few years) displaces artists without providing an upgrade path. It shouldn't take much empathy to understand how that's frustrating and scary that must be for them.

I think pxoe's expression of frustration is totally reasonable. The engineers who made this stuff could have focused on using AI to enable new possibilities, instead of undercutting existing possibilities (to create new markets instead of overtaking existing ones). They could have used 100% consensual training data, but instead felt entitled to exploit a loophole/ambiguity in the social contract under which artists have been sharing their work on the internet.

A more appropriate analogy would be, this sounds a lot like social movements complaining about capitalist co-opting of their symbols, e.g. the use of "communist" rhetoric to build oligarchies (or the sale of "save the earth" mugs made from oil-derived plastics, etc.). Even that isn't a perfect analogy, though, as the co-opted output wasn't itself the displaced work-product, although it does a better job of capturing the emotional side of it, the sense of betrayal. Ultimately, I don't think it's productive to reduce the reaction to the decisions behind Stable Diffusion etc. to a single analogy, and it shouldn't be so hard to say "sorry, you're right, this is bad for you and good for me and you have every right to express your frustration over that irreversible decision".


> A digital calculator didn't displace pen-and-paper computers in the way that power drills didn't displace carpenters.

I'd like to point out that _computer_ used to be a job title for a human worker who would _compute_ things. This is a job that no longer exists.

https://en.wikipedia.org/wiki/Computer_(occupation)


feeling better?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: