After using SD heavily for a week, I half agree with this. It is incredibly disruptive, and it's wild how much it accelerates the creative process. I'll give you that.
But two things I've noticed:
First, artists will still have a massive advantage over non-artists with this tool. A photographer who intimately knows the different lenses and cameras and industry terms will get to a representation of their idea much faster than someone without that experience. Without that depth of knowledge, someone might have to rely instead on random luck to create what's in their head. Art curators might be well-positioned here since having a wide breadth of knowledge and point of reference is their advantage.
Second, we need the ability to persist a design. If I create a character using SD, I need to be able to persist that character across different scenarios, poses, emotions, lighting, etc. Based on what I know about the methods SD/Midjourney/Dall-E are using, I'm not sure how easy this will be to implement, or if it's even possible at all. There will always be subtle differences and that's where being an artist who can use SD for inspiration instead of merely creation will retain their advantage over a non-artist.
For every tool, a better output will always be created by people who specialize in a craft. Just as photoshop revolutionized photography, to this day you can tell the difference easily between and bad 'shops.
In video games upon release everyone is bad and its an even playing field. But as people practice and gain experience they improve their usage, refine their approaches. You eventually see metas develop and best practices.
This will all be exactly the same. Just give it some time and watch professionals create professional outputs compared to paying for some base level work on Fiverr or whatever.
I agree, this would be an incredible tool. I can see how some of the outputs may help me improve a piece I'm working on, even if I would never use the model's output for my final product.
It looks like a decent way to produce concept sketches.
Currently, working with creatives (including programmers) is a very iterative process for non-creatives.
"I want this."
"No, I meant this."
"Can we try making that line longer?"
"Eh, I'm not feeling it. Why don't we try brighter colors?"
"Ugh. That looks obnoxious. Can we tone down the red?"
etc.
If the non-creative can use this system to reduce that iteration, it will make life easier for them (maybe not for the creative, if they are banking on the hours doing the iterations).
Does SD have the ability to take this kind of micromanagement input though? From what I've seen, it works off of a general descriptive prompt. Will adding a very specific "but put the duck 5 pixels to the left" or a very vague "give it more pop" to the prompt actually have the intended effect?
honestly i feel the opposite. one of the worst situations to run into as a designer is a client who is simply too married to a bad design. i would much rather work with something you scribbled on the back of a napkin than than some highly rendered ai vomit that is just fundamentally shit.
There's one huge difference between Copilot and stuff like this. Art that's 98% correct is awesome. Code that's 98% correct is completely useless.
I think Copilot is going to live off hype for a while then tank and be looked back on as a failed experiment. Whereas I think that this kind of AI will eventually get to a point where it's extremely useful and could change up certain industries (game assets, marketing materials etc).
As a new user of copilot for the last three months, I can't disagree more. I was initially skeptical, and I have noticed it often produces code that looks good but is wrong. However, it still saves me enough time each day to pay for the monthly subscription - in one day. That's a 30x ROI. I imagine it only gets better from here. I wont go back to programming without it.
Can I ask what kind of programming you do for it to be so helpful?
I mostly do maintenance of legacy codebases (also known as codebases, lol) where a lot of the work is figuring out where the changes need to be made and actually making the changes is frequently just a few lines here and there.
When I do have to figure out how to use some API, it's often not an open source one, so Copilot would not have it in its corpus.
I think these kinds of conditions are really common since software tends to last for maintenance longer than it is in initial greenfield development.
So I'm confused what kind of work benefits from Copilot. Just pumping out greenfield development of new websites/webapps that don't use much legacy or closed source code or services, just using existing popular open source libraries in commonplace ways?
The other thing I wonder about is code quality. When I look up API docs and stackoverflow examples, I get to read them all, maybe test some examples out in a CLI/REPL, and then decide carefully exactly what to do myself, what special cases to worry about or not, what errors to handle, etc.
Maybe what I end up writing is even the same as Copilot would have written. But in the process, I learn about finer details of the library and make detailed decisions about how to deal with rare edge cases. Might even end up writing a comment calling out a common pitfall I realized might exist.
My question is -- in order to save so much time with Copilot, are you still able to do all this extra thinking and deciding and learning (in cases that warrant it)? Or would doing that just end up consuming most of the time Copilot "saved"?
In other words do you end up producing code much more rapidly, but at the expense of code that looks more like a junior than a senior wrote it, because it is most concerned with working at all, and does not have time to worry about finer details? At the expense of not being as deeply familiar with the foibles of the API you're working on?
Honest questions as I haven't tried Copilot, and these are the thoughts that make me imagine it won't be of value. A lot of what I know I learned from doing the parts of the work that Copilot would be automating. Sure, Copilot would save me time when initially writing it. But would I then have less deep knowledge available when there's a fire in production because I never explored the fine details of my dependencies as much?
It's really more like a smarter autocomplete. I haven't tried it on a third party API yet, we don't use many at work. I work in a startup on a Python and TypeScript code base. To give an example, last night I was creating a unit test and copilot filled in the assertions. It missed one it couldn't know about, and it got two wrong. But it was a lot faster. The most amazing case to me was with a function to transform URLs to an image resize service. There was a bug in the function, it needed to return URLs ending in .SVG as-is. I went to fix the bug by typing "if" and copilot filled in the `if url.lower().endswith(".svg") return url`. It knew about the bug before I did. Too bad it couldn't do a code review when I originally wrote the function.
They have a 60 day free trial. Try it out, it's one of the most interesting changes in developer tools in a while. I feel like I'm living in the future sometimes when using it.
I haven't tried copilot either, but one of the things I'd be curious about is how well it can conform to a company's coding style guidelines and/or match its coding style with the existing legacy code that's being modified.
One of the major annoyances of working as a team with legacy code is when someone forgets to, or deliberately avoids, conforming their code to the style and techniques of the surrounding code. Nothing grinds my gears like working in a 500 line C++ file, where_every_function_uses_underbars, has consistent 4 space indentions, avoids exceptions, and passes by reference, but right in the middle is that functionThatMiltonWrote that uses camel-case, has 8 space indentions, throws exceptions, and passes-by-pointer.
It works great in our codebase; it uses the current text in the file from above your cursor as reference.
So if you're creating a new file, it isn't always perfect, but once it catches on to your style it's seamless.
I haven't tried copilot in languages less opinionated about style, so I'm not sure how well it handles that case. In Python, TypeScript, and rust it seems to work well. In any case I use auto format on save, so that fixes some of the potential formatting issues.
It seems like you haven’t used CoPilot. Yeah, some of the harder bits I may have to code myself but the amount of boilerplate it reduces is incredibly liberating.
Whoa! Really? Seems like modern software releases would be excited to achieve 98% with the incredible amount of bug fixes/patches released very quickly after the massive beta test known as release day.
Code has to be 100% correct or else it’s considered a bug (assuming it’s syntactically valid).
Code that is 98% correct is actually much worse than no code at all. That’s the kind of code that will introduce subtle, systemic faults, and years later result in catastrophic failure when the company realizes they’ve been calculating millions of payments without sales tax, or a clever hacker discovers they can get escalated permissions by passing a well-crafted request etc.
What's your metric for the "percentage the code is wrong"? Is it how many lines of code were wrong, or how many test cases the code fails?
Presumably if AI-generated code passes every test case, but would fail on edge cases that some human programmer(s) did not anticipate in their suite of tests, the humans potentially might have made similar coding mistakes as the AI if they had had to personally write the code.
No, it didn't but people who took only one semester of art classes may remain under the delusion that art is about making pretty pictures.
People who need illustration or graphics with no particular style can meet their needs with this tool but that is far from art. This replaces commercial illustration, not artists.
Exactly, this replaces general illustrative commercial art. This is great for illustrations at the head intro of fiction stories. However, fine artists that produce art where the object created is merely the totem for their actual art: intellectual conceptualization of an idea, often concerning the finality of life and the immature behaviors we playout within; that type of Fine Art is completely out of reach of modern AIs. I'm sure there are attempts by AI developers to mimic such art, but that mimicking will be gibberish. It requires a sentient being to create Fine Art, because Fine Art is an 360 degree expression of existence, not some pretty picture.
I can see this replacing the clip and misappropriated art in slide presentations that no one is paying for currently with a $20/month service that lets you do "line art angry man in toga at computer" to put into your talk on Kuberentes.
It might replace the "nice pic, do you have it at {size} so I can use it as a desktop image" comments on various social media sites.
I don't see it replacing an actual photograph as a wall hanging (or a painting) because they are subtly wrong in some ways. The reflection in the lake doesn't match the landscape... that one cloud has its lighting at slightly different angle than the rest.
Possibly... but I'm skeptical. For the type of photography that I do, these are things that come from an understanding of the world and its implications. You aren't going to see certain types of clouds in certain landscapes - they just don't form there. For example, having a cloud that indicates fast moving wind in a mountain environment in a plains landscape with a glassy smooth pond.
It's not that you can't paint that picture... but you'll never be able to capture that scene in camera. If someone was presenting that as a photograph, it would feel wrong to me because of an understanding of the meteorological criteria for the scene.
Yet, I am doubtful that I'll have a generated image at 11x17 that holds up to the same scrutiny that I apply to my own photographs.
I am absolutely certain that it will be able to generate images that are completely appropriate for images that you don't look at for more than a minute at a time or are used as complimentary material for other content.
All that said, I am not concerned that I will get more or less sales of my photographs with AI generated art competing. The people who are going to pay for a photograph are going to pay for a photograph. Those who aren't - weren't going to in the first place.
Of course not, it just gained a new "advanced stable diffusion usage" track. Photography didn't displace painting from art school, and even outside art there are plenty of examples to be found. CNC machines are awesome, but every machining class in existence still teaches manual lathing and milling.
Not yet, but I can definitely imagine a future where these tools get more capable and refined, to the point where all the shortcomings listed above will be overcome. Knowledge about cameras and scene composition are already encoded in the networks to some degree, it just needs to become more accessible. There's probably also a better way to seed new images than by starting with random noise, so we could get similar variations easier. We have already made the big step towards creativity and real world understanding of objects and their lighting, the remaining issues are more technical and unless we are incredibly unlucky and run into a true show-stopper, we'll probably all have access to a high quality digital artist that can reduce production times dramatically.
You need to give some information about the scene to the network.
Camera settings is just a short hand to describe the field of view and depth of focus (at the very least). If you make that implicit you'd still need to give the network the steradians, focal length, circle of confusion, etc. etc. etc. that you want your image to use.
You'd need to understand everything in Hecht's Optics to tweak all the parameters of an AI generated image.
That's an implementation problem, not a technical or conceptual one. Diffusion models have shown that they can learn practically all of these things if you make them sufficiently big.
i dont think it would be a massive stretch of the imagination to be able to train a robotic arm to hold a brush and reproduce something that is on screen and do it really well.
Machines like that already exist but specifically with oil painting there is a lot of important information in how was the paint applied. So it is really really hard. Too many decisions in tools used, angles of the strokes, pressure, thickness of the color etc.
You might make painting that looks identical from far away but when you come close it will be completely different.
Nobody gets paid to be a telephone operator or a cobbler or a steeplejack anymore either. If new tools come along that automates and further commoditizes the production of art, I don't see the issue. Automatic driving for trucking, public transport, taxis would economically impact vastly more people, and everyone here seems to be cheering for that.
There's nothing special about art. People will still do it as a hobby, like they do with a lot of other dead trades and professions. In many ways I think the personal aspect of the creation and sharing with people you know is the special thing, rather than the creation or performance of a commercial success (not that I've had any experience with the latter, mind you). And I'm not against the mass consumption of art, just that it needn't be produced by people if it has the same entertainment value then that's great.
I grew up in a car dependent area and moved to a walkable big city in my teenage years, and only then became aware that cobblers were still a thing -- in fact, in NYC, they're not an obscure thing, but as common as bank branches.
I think that's why there's a lot of people online who think cobblers aren't a thing anymore. They're from car dependent areas. If you drive everywhere, shoe soles don't wear out much faster than shoe uppers anyway, so it doesn't make sense to care. But if you move to a walkable city, you'll suddenly find it quite economical, since the soles wear out far faster, and the cost of a sole replacement is less than a new pair of shoes, so you might replace the sole a couple times before discarding the shoe.
Well you got me. I guess chimney sweeps and black smiths and loom weavers and carriage drivers still exist too, not sure about steeplejacks since the passing of Fred Dibnah. But you get what I'm trying to say, hopefully.
You included “cobbler” only because it sounds old-timey to you, and I was pointing out that it’s not.
Repairing expensive shoes is not an automated process. It’s more like fixing a roof leak, landscaping, or changing a flat tire. Jobs for those things still exist and aren’t going anywhere.
You're nitpicking one of the given examples without engaging the user on the point they were trying to make.
To be nuanced, maybe they might have said, "cobblers are less in demand now that many people have moved from owning fewer pairs of shoes they make last through repair to owning more pairs of shoes that they tend to get rid of when they are worn out due to changes in construction materials used in production," but if people have to write like that to make points, nobody will ever make a point.
It's a nitpick, but a little bigger than that. It's as bad as including "bus driver" in the list. Cobblers just shouldn't be included in the category at all.
Cobblers are in just as much demand in most of the world as they always have been. They only fell out of demand in car-dependent areas, which is a small minority of the world population (but a vast majority of the HN commenting population since most of the USA outside of a few cities is car dependent)
I don't know if it has anything to do with construction but doubt it. If you actually walk everywhere shoes don't last very long these days, especially shoes under $100.
There are a lot of other urban services that exist in almost the entire populated world, but that most Americans think quaint because they are not relevant to a car dependent highway world.
All that said, this really is a nitpick and the original point stands very well. Some of us just don't like it when car-dependent people forget that they are a small minority worldwide and instead treat urban walkable people as the insignificant minority! Or rather, HN being a forum for all things interesting, we find it interesting to make it a teachable moment. What could be more interesting than finding out that something that has always seemed obvious to you is actually backwards?
> I don't know if it has anything to do with construction but doubt it. If you actually walk everywhere shoes don't last very long these days, especially shoes under $100.
By construction, I mean the material and design of shoes people tend to wear. I can't say I've ever met someone who takes sneakers or running shoes to a cobbler and these shoes are more common nowadays.
Those kinds of shoes you mention tend not to last very long at all and are not resoleable. If you walk a lot, you find yourself throwing away the $80 shoe after just 6 months.
These kinds of shoes are most of the market because most people don't walk much in the USA. If you walk a lot, you might still not change anything and keep buying the disposable sneakers, throwing away $160 a year.
But if you walk a lot AND are disposed to think critically about the situation, you find that if you pay a bit more for shoes you can make them last many years as long as you resole them periodically. And as I recall, a good $50 sole on a good $150 shoe costs half as much and lasts three times as long as a disposable $80 sneaker.
Not only do you save money (not really a ton) but it actually is more convenient, since even counting resolings, you get more miles between having to go repair or replace your shoe. And you don't have to wear in your leather uppers again. It is truly a luxurious feeling when you come back from the cobbler and have shoes that are worn in and fit your foot just perfectly like a glove... yet the soles are brand new and strong and comfortable and ready for another thousand miles.
Why do you think there's a stereotype of leather boots being popular in NYC? I'm sure the resoleability and longevity in the face of large amounts of daily walking have a lot to do with it.
No I included it because it used to be a common profession and now it is an extremely niche one. I could have also put chimney sweep in there knowing that it's still a thing.
Someone is still going to pay for a person to perform or create art for them. Some professional driving jobs will continue to exist long after most are automated. When I said "nobody", that's what's known as hyperbole.
Yes, but the consequence for society is that professionalization is coming to every single field, and the blue-collar jobs are evaporating, and we had better figure out how to get all those people either white-collar jobs or some other form of income if we want to avoid significant social instability.
It used to be that you could find work as a political cartoonist and draw a picture of some satirizable politician each day - maybe Dr. Oz with a funny-looking vegetable platter, or Biden falling off a bike labeled "build back better," or something - and that would be a career. Now it seems you can just type "Dr. Oz holding a vegetable platter, political cartoon" into one of these AIs and the bulk of the work is automated. Sure, you could spend the rest of the day refining it, but nobody's really looking for perfection or transcendent skill in their political cartoons.
You can still find work making political videos of Dr. Oz and vegetables (e.g. https://twitter.com/JohnFetterman/status/1564432981841907713). Today's image generation AIs cannot do videos like that. Tomorrow's will, before we know it, and again nobody's looking for more than a baseline level of quality there either.
And even the local newspaper that might have been employing a political cartoonist is being swallowed by high-capital-ownership companies that can replace a lot of the writing with AIs. The expectations of quality are a bit higher - though the AIs have mostly gotten the hang of sports writing - but again people don't expect The Smalltown Gazette to have the standards of The New Yorker.
Sure, the highest-quality drawings, feature films, and longform journalism will get a lot better. And that's great! But most people don't work on such things. What are they going to do? If nothing else, how will they remain an audience with money to spend on the highest-quality works?
(I am not advocating for stopping this process, to be clear. Smashing the AIs isn't a coherent proposal, and smashing the physical machines of automation didn't actually work when the Luddites tried. I'm advocating for admitting that this process is happening and drying up employment, admitting that keeping as much of humanity as possible under a good standard of living is important for humanity, and figuring out what to do about it.)
For face generation, I think there are deep neural networks that can generate multiple views of the same face [1], [2]. Stable diffusion already provides the possibility to generate variations. So I don't think it is a stretch to imagine that these existing capabilities will only get better and/or be applied to SD.
[1]: Multi-View 3D Face Reconstruction with Deep Recurrent Neural Networks
[2]: Deep Neural Network Augmentation: Generating Faces for Affect Analysis
I haven't had a chance to try it yet, I'm optimistic but skeptical that it can have true persistence like I'm referring to. Especially since it requires training your own model which requires multiple images of the same asset/character/object/etc.
For what I think you have in mind, I suspect it will eventually not be "image to image", but "<ai thing> to <ai thing> + image", for that guess to be remotely repeatable. That "ai thing" is probably the persistence you're talking about.
I think that thing will neccesarily contained representations of dimension, behavior (physics/bones), and "style". Without the "ai thing", if only using an image/text, the character would have to be impossibly represented in the model, so it could guess all of these things predictably. For example, what that character looks like from a side profile, or behind. What if it's an alien, and its arms should always bend backwards? Could a text representation ever be made to completely describe this, with good reproducibility? Probably not. But, I assume some non-human representation would have a better chance.
As is, if something known is required, I think the behavior of these models can be considered "destructive" to the input image, more often than not. For this reason, I think artists are safe, for the time being. :)
I think this can be seen a bit like the invention of an "index fund" for art. Active investors are still needed to generate the market signals that an ETF can aggregate, but for the majority of people an ETF is preferable to the cost of hiring an active investor yourself. And similarly SD needs artists to generate the signal that it aggregates, but for the majority of people it might be (or might soon be) preferable to use SD to get a "generic"/"average" result instead of hiring an artist yourself.
There is bound to be a smart kid who already turned this idea into a shitcoin (meaning a pump and dumb money grab, not an actual attempt to make a art-index and tokenize it).
It seems someone indeed did this (the p&d, not the index): https://www.coingecko.com/en/coins/artonline
What about using this tech for ideation and artists for production?
You could use Stable Diffusion et al to create new characters based on a prompt, then farm the concept out to artists to produce individual works. Kind of like hiring a super expensive agency to design your new logo or brand identity, then using a stable of in-house designers to translate the concept into UI, ads, etc.
Honestly, I don’t even know if we’ll need humanity for the final product. It’s like using a chess computer to get an idea of a good move… and then a human to approve it? Adjust it? Be inspired by it?
Any signals humans give (painting x is better than y) is another signal to encode. Take billions of such ratings and improve the AI’s taste to superhuman levels.
In short, anything that humans would add to artistically improve the outcome is just another signal to be encoded. It’s weird to write it, but artistic creativity is deciding what new pixels go where, which is a search problem (in a large search space) which AIs are apparently doing great at.
We have a bias: we’re humans, we must be important somehow! But it comes down to a bigger neural net eventually outperforming the one in our heads.
The difference is that chess rules are extrinsic to the players at the board, whereas art is intended to communicate (emotions, narrative, ideas) to humans.
A neural net that can communicate with intent to human minds as well as a biological human is nothing short of strong AI. We’ll get there, but not with generative models.
> It’s like using a chess computer to get an idea of a good move
They did this for Backgammon and found styles of play humanity hasn’t discovered in millennia. Now humans can use those when playing each other and it makes an old boring game feel fresh and exciting.
The datasets these tools use don’t include any context. There’s no sense of what the images in the data might mean to the viewer, or how they relate to the time and place they were made. I would argue that means the tools will struggle to produce meaningful works, even if they become great at making beautiful works.
Useless? One can spend days/weeks/months enumerating different concepts of a design due to a roundtrip between “maybe we try <a prompt>” and an image. Now it’s literally minutes and a designer can draw you some ideas right at your office.
I've tried too, and honestly I'll hire artists again (I did several times). It's easy to come up with nonsense, so in the end I think this is a tool in the hands of artists more than anything else.
For threatening artists it should be much better. Maybe it's not the model, maybe it's the interface human-computer that fails, but in the end it's what we have.
Maybe someone with a lot of time in their hands can iterate enough times to come up with something nice and depicting what they wanted. Not for me.
That’s possible, but my gut feeling is that this turns out to the the same as the DARPA grand challenge moment for self driving cars.
It was an absolutely amazing accomplishment. I legitimately thought I could hold out long enough that my next car would be fully self driving. Truckers were an endangered species.
But here we are 20 years later, and we’re still almost there.
We’ve made amazing progress and I love the self driving features I do have on my car, but how many jobs have been replaced by self driving cars?
I always wondered if we massively overestimate human creativity. Maybe it is ingrained in our culture and our very being. I’ve never heard counter arguments that humans are not that creative.
Creativity demonstrated by Alpha zero chess engine blows Magnus Carlsen’s mind (from his recent interview with Lex Fridman), I wonder if at some point in the future, we’ll finally throw in the towel and get out of the denial phase.
"AlphaZero would sacrifice a knight or sometimes two pawns, three pawns, you can see that it's looking for some sort of positional domination, but it's hard to understand. It was really fascinating to see. "
No, apparently we just massively underestimate how important studying the humanities is.
This is illustration, not art. I'm flabbergasted how many people are so quick to confuse or conflate the two.
For the record, my significant other is an artist and my entire life has been surrounded by creative people. I’m questioning human creativity, say, 50 years from now, and how much machine creativity we currently underestimate. Sure these artworks lack imagination and fidelity today. Appreciating art and humanities has nothing to do with projecting AI capabilities in the future, I’m flabbergasted how many people are so quick to confuse or conflate the two. Maybe we should require people to study logic/reasoning and how important it is.
50 years from now, I have little doubt that will be the case, barring serious civilizational decline. But that isn't the sentiment on this page.
Ditto for me regarding being surrounded by creative people. This isn't going to lose them any business any time soon, not even the younger ones five years from now.
Alpha zero was trained by playing against itself, with a clear win condition. You'd need to quantify what it means for one image to be "more creative" than another to get a similar result.
Without trained models from human creativity, what can AI do ?
These AI emerged because of human creativity.
Picasso created it’s new art form from its own creativity. He created something no one ever though of before. Now AI are fueled with Picasso’s drawing and can produce art that looks like his maybe.
But what about creating something entirely new that has never been fueled into the engine.
Could the AI invent something I’m about to dream tomorrow and this be the exact same copy ?
I think it's undeniable that AI can create novel things, the question is if AI can create novel things that are also interesting. A randomized 600x600 png is novel, but it isn't at all interesting, much of what goes into making a piece of art interesting is not a quantifiable or well-defined goal. That's not to say that AI is better than humans, just the opposite, art is a deeply human object, and I do not know if we would appreciate art made and developed by AI in the same way we do absorbing and creating art in response to each other.
I see your point.
We agree that on a 600x600png there is a finite number of possibilities given a set number of colors.
It could be possible to « brute force » all the images possible which would be the equivalent of trying all the combinations of letters to write a book. So creating things is not really a problem. Creating relevant things from a purpose is.
The fact is, we know no other intelligence except ours. We are the only known being that appreciate art or books, from our subjective conscience.
Can AI create its own art, understood and appreciated by itself ?
Has intelligence a meaning without « humans » ? It’s part of the same thing. So the AI is modeled after ours. So it’s not its own thing and cannot understand what it creates and if what is created is relevant. AI is just an amazingly powerful tool.
I believe the creativity is primarily the process, something we enjoy doing, not the method to _produce_ or _generate_ something. Sort of like dogs like playing fetch. Some creative processes do not create permanent artifacts at all, like all performing arts.
I think there will likely be better ways to bias the training of specific instances - Something like a 'training library' with biased training data that you can plug in for your use case.
For example let's say you're a well known designer with a distinct style, and you train your own instance on your lifetime's body of work. Now you can generate whatever you want in 'your style' (just like you can now ask for a painting in Dali's style).
Now you've turned your style and design process into a factory - for every new client you can create whatever they want, in your style, with multiple examples, in a button press. Perhaps you can even sell that 'training library' to other designers?
> A photographer who intimately knows the different lenses and cameras and industry terms will get to a representation of their idea much faster than someone without that experience
There are already websites which sells tailored "professional" prompts for dall-e, GPT3, etc
I've seen an inpainting technique where you put an existing character then describe what their twin is doing, then crop out the existing, which seems to persist the character at the cost of fewer pixels to work with
Well, it won't replace creativity, it just makes starting out easier.
SQL was designed so that business people could describe queries in English. How did that work out?
Now, of course, we have ORM's, which will get you data, sure, but often in egregiously inefficient ways if not used correctly. If you want to to get it right, you still need to pop the hood and adjust things, and you have to know what you are doing.
> A photographer who intimately knows the different lenses and cameras and industry terms will get to a representation of their idea much faster than someone without that experience.
Sure. Only now the gap has been shrunk by a factor of infinity, because the time to representation was infinite for someone without that experience before.
Ultimately software engineering isn’t about code, it’s about describing in sufficient detail exactly what you want. Of course, this just leads to business analysts and ux designers being the next ones to be replaced after coders… :P
Is it possible to provide another image as a prompt for SD? For example, could you provide a simple drawing of a house and expect it to render a house?
You can do using img2img method based on top of SD. Another option is to create an embedding of a concept based on textual inversion and then use the embedding to guide SD generation. Both methods are possible using this: https://github.com/hlky/stable-diffusion-webui
Thank you for this! I've been playing with the optimizedSD for a while but couldn't get what I wanted out of it. This guide makes sense to me. Gonna give it another shot on the weekend!
It'd have to get itself into the dataset, as if it was 'Snoop Dogg' or some other identifiable person that the AI can reproduce. The degree to which you can generate stuff that is deepfake 'Elon Musk' doing whatever you say, is the degree to which you can invent a character and have the AI generate images (or video) of it following (sort of) your script and directions.
Is it just me, or do the comments in this thread seem to be the exact opposite of the sentiment in the comments on similar Github Copilot threads?
I just find it a bit ironic that programmers are irate about Github Copilot using their copyrighted material to train. However, if it's an ML model training off of copyrighted artists material, clearly its a transformative work. I just find the opposing sentiments for these scenarios a bit funny.
I don't think AI trained on my creative output (not just code, mind you) is a problem per se.
What is a problem, in my opinion, is the tendency of large corporations and small circles on top of these to monopolize access to these models, and if some of the functionality gets available to the public, it's going through a very paternalistic, corporate, puritan censorship pipeline.
If you train artificial intelligence on our hard-won data, the resulting artifact should be available to us. StableDiffusion executed very well here.
Problem is, there may be no stability.ai for general-purpose multimodal AIs that are coming this decade, and this technology is a dystopia fuel when it's owned by a select few.
If you use my copyrighted code to train your brain and reproduce exact copies of functions I wrote, I have a problem with that. Same for art. Exact copying is the copyright problem.
If you use my copyrighted (public) code to train your brain, and then produce new functions that do new things, there is no problem.
Replace "brain" with "AI" and you get my position on this (and a lot of others' positions). It seems to be a lot easier to transform art than code.
Indeed. A difference I see between the GPT-like language models and those image generation models is that it feels like the language models actually hold full copies of a lot of sample of the training dataset (hence their ability to recite existing content), whereas the image generation models clearly do not: Stable Diffusion is 4Gb something, yet it can draw anything.
That's the amazing part: the training dataset contains 5B images. Yet it distilled all of them in this mere 4Gb of data and can produce an infinite amount of content with those. It really feel like it learned how to draw, in the same sense a human learn and does not simply reproduce exact copy of what it already saw.
I see what you mean about GPT returning copies. But is there really a difference if your model just happens to 'calculate' an exact copy instead?
For example, if you enter in something simple and high profile, what it returns looks pretty close to an existing work...
Try, for example, typing in "Banksy", "Brad Pitt" or "Starbucks".
So if you type in "Photo of a Coffee", "Impressionist painting of a fruit bowl" or "Blue canvas with one red line in the middle", how do you guarantee that the image you get back isn't actually a copy of someone's work?
Copyright - at least for images - is going to become moot shortly.
Copyright exists to allow an artist time to reap the benefits of his labor. Without this time, they say, no rational person would invest his labor into making art. Dubious… but, whatever. The point is, if you remove the ‘labor’ from the mix, there’s no need for copyright.
If I can produce spectacular images with zero individual labor, there will be little reason for me to copy someone else’s work.
Not really a copy. I'm sure you could cajole it into something very similar to the actual piece, but at that point it's more the model + the extensive prompt instead of the model itself.
Generative models attempt to model their training data. Essentially, they try to be a model of the underlying data distribution from which all samples in the training data were drawn from. A model of that distribution which cannot reproduce all samples from the original training data given the right prompting/query/seed is an incomplete model by definition. If it can't reproduce all samples drawn from the original distribution then it clearly does not model the same distribution those samples were drawn from.
That said, this is very very different from just copying at a conceptual level. This is going to end up being a an interesting legal question going forward. I'm curious to see how it turns out.
I think both are fine. One source of the difference in reception (aside from the selfish aspect) is that the space of work generated by Copilot has fewer degrees of freedom than visual art. So the things it produces are much more likely to be near-identical to existing code than Stable Diffusion is to images. I.e. if SD were just providing near identical copies of artists' work, people would be more sympathetic.
The difference appears to be that Github Copilot just reproduces, verbatim, snippets of code without actually doing anything transformative to it. You cannot describe a sort algorithm and have it spit out something other than a direct reproduction of someone else's implementation.
With these artwork models, they can emulate general styles, occasionally known characters or bits of text will show up in the output, but it is (as far as I've seen) never a 1 to 1 faithful reproduction of the training material with no changes made.
This is an important difference from a legal perspective, not just a moral one. Whether or not a use of copyrighted work is transformative is a big part of whether that use is fair or not.
It actually, by default I believe, now checks for exact collisions with any existing GitHub code (the training data) and removes them. These are somewhat rare in any case, is my understanding.
Not to detract from your other points. Just a common misconception I see a lot.
Yeah I hadn’t thought of it that way. Curious how designers feel about image generation. I imagine everyone will love AI that doesn’t threaten their job
Sometimes people think it ought to be obvious, bright-line-tests only, to determine what is truly transformative or novel. The divergence you're pondering, if taken real charitably, maybe signals that that is wrong and that it requires some domain expertise to be able to ascertain whether imitation is really problematic.
One among many differences between copyright issues for artists and software engineers: software engineers’ copyright designed to protects developers, while art copyright designed to enrich (mostly Hollywood) lawyers.
A) almost like a dunning Kruger effect where people are less familiar with a domain and therefore think this has superseded it and cheer it on, whereas they’re more familiar with their own craft and can see where it falls short.
B) They see it as technology conquering a new area , but get nervous when it starts infringing on their own.
> Stable Diffusion has been trained on millions of copyrighted images scraped from the web.
My brain has been trained on even more copyrighted material. Every book I read, every tv show I watch, the toys I played with as a child. It's hard to imagine that I could come up with anything that is not inspired by copyrighted work.
1: you cannot produce thousands of detailed pictures in a day, this program can. The argument gets pretty clear if you transpose it to other objects, ie why is it fair to ride a bicycle in the sidewalk but not drive a car?
2: copyright laws. You may not see a picture and imitate it. How do you know this AI didn't just imitate one of the million pictures it saw? And if you distribute it and the author wants to sue, who should it sue? The author of the AI or the person who prompted it?
It's a lot of work for a human to produce an image, but they can think thousands of images in a day quite easily. The artistic process often begins with some daydreaming.
Is it only the rate of production that makes a difference? If we hooked up our brains to computers that could turn our thoughts into pixels at a rate of thousands per day you'd see no issue?
I don't think we should distinguish between meat and silicon neural networks. People should be able to use whichever neural network they see fit for a particular task. If a person has a right to observe and incorporate an image, so should a ML model operated by that person.
Also, you may see a picture and imitate it. You may not copy or redistribute it directly.
> How do you know this AI didn't just imitate one of the million pictures it saw
I think the onus is on the copyright holder to prove a violation to a specific work.
> who should it sue?
I think the operator of the network is liable. If you distribute an image that is found to be in violation of a copyright, then you are liable no matter how you came about it.
Yes, I don't get a free pass for harming somebody with a tool I'm operating just because the tool is "automatic" in some way. If the person operating the car is the passenger, then you sue them. If it's Google operating their Waymo fleet, you sue them.
Car manufacturers in the future may offer to take on the liability themselves for autopilot mistakes, but that's not yet the deal offered.
At the end of the day, there's no legal magic or loopholes. Somebody is ultimately the operator of the vehicle, even if their hands aren't on the steering wheel.
I would assume that the point is that you had to pay - in one way or another - for access to this material, which is the intention when it is published and shared. You bought your books, your toys, your TV shows (through ads).
SD, by scraping the web, gets all the benefits of prior work without necessarily having adequately compensated the work's creators.
you also didn't flip millions of illustrations into a commercial product without paying the creators or rights holders to license them.
this is theft plain and simple.
I think the point is that he did, and that we all do, when we train our meat neural networks (on copyrighted material) in a way that makes us better at providing a commercial service (our job).
If a human has a right to view and learn from a copyrighted image on the internet, why shouldn't an AI?
Yes, this is exactly my point. And I’d go even one step further: creating real artificial intelligence cannot be possible without learning from the real world - which means learning also from copyrighted material. So a law that prevents this (even by just requiring to pay a minuscule fee) would effectively ensure that such a technology will be developed in a country with different laws.
Besides the point that these are freely viewable images anyways, this is also clearly not people's real objection. If you were making a novel-writing ai, it would not placate anybody if you bought a single copy of each book before incorporating it into the model.
I agree.
In theory, some may be placated if the person creating an SD artwork had to ultimately compensate the creators of those artworks that were used by SD in order to stich together the image, akin to sampling records in music.
> My brain has been trained on even more copyrighted material
What your brain has learned cannot be transferred with an USB stick in seconds
Not even your offspring will receive any of it
If I want to learn everything you know, I have to learn what you learned , assuming I will be able to
Kinda of a big difference, don't you think?
These kinds of comments are embarrassingly low effort, just because we threw rocks at each other when we all were chimps, doesn't mean that guns haven't been a game changer and made homicide easier even for people that would never be able to hit anybody by throwing rocks.
The original article talks about moral (not even legal) objections to learning from copyrighted data.
Your comment implies that DALL-E 2 is morally okay, because they don't distribute the model ("copy everything it knows to a USB stick") but only sell access to the algorithm to generate images, while Stable Diffusions open source model is a problem because it can be copied.
Most people would take the exact opposite stance I guess.
> The original article talks about moral (not even legal) objections to learning from copyrighted data.
but I am replying to
"My brain has been trained on even more copyrighted material. Every book I read, every tv show I watch, the toys I played with as a child. It's hard to imagine that I could come up with anything that is not inspired by copyrighted work"
Difference being your brain has not been trained by someone (for profit), you have trained it using YEARS OF YOUR LIFE TO ACQUIRE KNOWLEDGE AND EXPERIENCE
which is morally acceptable (does not imply that the use you do of it is legally acceptable), given that you paid a very high price, sacrificing your own time for the objective.
And that your knowledge is only yours, you can't transfer it to anyone, it doesn't even show up in your DNA.
> Your comment implies that DALL-E 2 is morally okay, because they don't distribute the model
Implication doesn't mean what you think it means.
My comment doesn't imply anything of the sort, you are
> Most people would take the exact opposite stance I guess.
This is going to be the legal challenge for the next few years: every year there are lawsuits alleging that music, drawings, etc. are too similar to someone else's work but those take time and require human judgement to decide whether something crosses the line, and the scale which tools like this offer makes that basically as feasible as it was to individually sue Napster users for sharing other people's works.
I'd expect that level of legislative response, and also would bet on lawsuits over any authorized data in their training corpus.
> My brain has been trained on even more copyrighted material
That's in fact a problem: plagiarism is considered cheating/intellectual dishonesty and can ruin a career, copyright infringement is punished by civil law (and fined), counterfeit is a crime, etc. etc.
You must retain yourself from copying too much, but only the original author (and eventually a judge) can decide if that happened.
If stable diffusion is used for counterfeiting or the result infringe the copyright, do you think the fact that the model was trained on unlicensed copyrighted material is irrelevant?
Authoring any work, no matter how closely copied, is not an infringement. The infringement is commercial distribution of copyrighted work. And yes, I do think you should be liable for whatever you distribute commercially, regardless of how you obtain it.
Yes, selling an exact copy of a copyrighted work is an infringement (giving it away generally isn't). Creating a derivative work, or work in the same style, is fair use.
Generating a digital image of money isn't the same thing as counterfeiting currency.
> I would also not generate pictures of child pornography
Legality aside, why not? Who is harmed?
> There's a reason why SD apply censorship filters to generated images
I'm not sure there is. I don't think any one group of people is uniquely equipped to limit what images another group of people can generate with ML.
Maybe this current explosion in the relevance and visibility of this kind of AI model will finally lead us to rethink how insanely nonsensical our IP systems are. I'm not holding my breath, but there's hope that this sort of thing will (combined with situations like the HBO debacle) clarify the need for massive IP reform in the cultural zeitgeist.
The problem here isn't that the model was trained on copyrighted works, the problem is copyright itself and a cultural focus on collecting rent. We probably need UBI or equivalent to deal with the outcomes here in a healthy way.
Why do people keep bringing up copyright anyway? It seems pretty clear that the images being generated by StableDiffusion are transformative so it's protected under Fair Use.
There are really significant, novel copyright issues implicated by these large generative models trained on other people’s IP.
If you take a step back, you can see that there are different ways to frame what is happening. One frame is: “Defendant built an algorithm that memorized features of Plaintiff’s IP. Defendant’s algorithm recombines parts of those features in order to produce works in the same domain that compete with Plaintiff’s work, all without Plaintiff’s consent.”
Bear in mind that copyright holders are among the most litigious out there. If generative art becomes as big a deal as some people expect, they will have every incentive to use their huge litigation budgets to claim a piece of the action.
Because the law develops very slowly, the legal process has not yet had the occasion to really evaluate what transformative use means in this novel context. I’m personally interested in seeing where things go, but it’s going to be a while before we know where the law is headed.
> “Defendant built an algorithm that memorized features of Plaintiff’s IP. Defendant’s algorithm recombines parts of those features in order to produce works in the same domain that compete with Plaintiff’s work, all without Plaintiff’s consent.”
The fun part is, this is how human artists learn too.
They absolutely do. What they don’t do is mechanistically clone compressed mathematical representations of input data. The human part of the creative process could very well be a distinguishing feature, legally.
Tracing and collage both mechanistically clone elements of the input. Trendfollowing, mimicry, and copying others is standard in any creative area.
I don't think computer-generated works will be easily distinguishable from human unless they're desired to be or shipped with metadata. It's already hard enough to distinguish human artists from other human artists without having names attached up front.
> Tracing and collage both mechanistically clone elements of the input.
Yes, and tracing counts as art fraud.
Collage is a bit different, because you are mixing many clones of many other objects such that you create a new object; additionally the way you assemble the clones may transform them (a photo of the mona lisa has different surface texture than a painted version, even more different if it is clipped from newsprint), but while the borders of this are not clear, it is clear when people are far enough over the border. Think of hip-hop, sampling, and remixing music, and some of the legal battles which have come out of that.
They can (and do) clone compressed electrical representations of everything they see. Everything is just stored in memories in their brains instead of memory on the cloud. In the case of these complex AI, you cannot really extract the initial works, right? They have all been atomised, mashed together, and very imperfectly encoded as weights and parameters.
Yeah I am sure a lot of lawyers are going to have a lot of fun arguing every way imaginable.
I see this type of reduction again and again to advocate for one position or another relating to ML. Human consciousness and thought processes aren’t “just” anything (math, electrical impulses, etc.) — the fact is, we don’t know what the brain really does and how it’s connected to our conscious experience, or even what that is!
Deep learning is very powerful and impressive in its applications to date. However, it’s so saturated with hype (and humans are so prone to anthropomorphizing things) that it’s often viewed as something much more profound than it actually is. Neural networks, despite their name, don’t model the brain. And they lack a whole array of “intelligence” features that humans possess and use constantly.
All of this is to say that there are very significant differences between computer algorithms and human cognition, and I tend to think the legal system will be unpersuaded by arguments that ignore those differences.
Also, this is to say nothing of the public policy interests that shape the law. Regardless of what’s “under the hood,” the law can simply treat human and machine output differently. I’m not a copyright lawyer, of course, so I can’t speak to the norms or technicalities of copyright law itself.
> Human consciousness and thought processes aren’t “just” anything (math, electrical impulses, etc.) — the fact is, we don’t know what the brain really does and how it’s connected to our conscious experience, or even what that is!
There are a lot of things we don’t know, but it is not magic. There is no discussion that artificial neurones don’t have much in common with the real ones, which are very non-linear and much more connected. But in the end it’s all electrochemistry.
> Neural networks, despite their name, don’t model the brain.
But that’s not directly related to my point. My point that even in the case of a ML model, you cannot get an exact reproduction any more than you can get from a human’s memory. In one case it’s scrambled somewhere in someone’s brain, in the other on a hard drive but the difference is not really relevant. Subjecting an AI’s production to the copyrights of all the things it’s been exposed to is very similar to subjecting a painter’s production to the copyrights of all the painting they have seen.
It will be hard to tell whether an image was created by a “human creative process” or AI.
Will the same image be legal if a human made it but not if it was created by Stable Diffusion? How will someone even know, short of a legal discovery process?
A legal discovery process is a perfectly reasonable (and common) way to determine if something is legal, and yes, it's completely unsurprising that the exact same outcome might be legal or illegal depending on how it was obtained (or even with what intent the actions were done), legality takes these things into account.
> Is my brain's floating point calculations subject to round off error?
>
> In the (actual) neurons, is there a representation of real numbers? Where are the numbers in the brain stored?
None of these details are relevant to the bigger picture similarities of non-hardcoded learning from training data. None of these details change the ethics of what's being discussed here.
> I feel like people who assert this neither understand what neural networks are and how brains work.
> The fun part is, this is how human artists learn too.
We don't actually know exactly how human artists learn, and human artists are capable of innovation, nobody knew pointillism or Bauhaus before they were invented.
A little know fact is that for humans it takes a long long time to learn, while they learn, they develop a style, if they don't they are not "real" artists, but merely executors, artists evolve, sometimes dramatically, in unexpected ways [1] [2].
So for us humans learning is an experience, not just recombining parts of features of other things.
We are also highly influenced by feelings, unfortunately, so sometimes we do things a certain way because we felt that way, not because we wanted to paint that thing that way, or because we are not good enough to do exactly what we wanted to do.
Is Mona Lisa happy? Who can tell?
Was Leonardo happy when he painted it?
What was Leonardo thinking when he painted it?
What was happening in his life?
Is that the best smile Leonardo could paint or it's an enigma he put there for future generations?
These questions are more important for an artist than the mere features of the painting.
The philosophical question is: is art discovered or invented?
If it's discovered, then SD can generate art, if it's invented, than SD it's not even generative work, because to invent something from something else, you need inventiveness.
Artist today use the exact same method of learning from other peoples artwork to generate new artwork and styles. These models are learning just like any artist learns and then producing new content.
This is patently, obviously _wrong_ for anyone who has tried learning any artistic skill in their life. Sorry to be this straightforward, but it gets on my nerves every time I read it.
If you tried learning, let's say, the chiaroscuro technique from Caravaggio you'd be analyzing the way the painter simulated volumetric space by using white and dark tones in place of natural lighting and shadows. You wouldn't even think of splitting the whole painting into puzzle size pieces while checking how many how those look similar when put close one another.
Given somewhat decent painting skills, you'd be able to steadily apply this technique for the rest of your life just by looking at a very small sample of Caravaggio's corpus.
On the other hand if you tried removing even just a single work from the original Stable Diffusion data set you used to generate your painting, it would be absolutely impossible to recreate a similar enough picture even by starting from the same prompt and seed values.
Given how smart some of the people working on this are, I'm starting to believe they're intentionally playing dumb to make sure nobody is going ask them to prove this during a copyright infringement case.
>This is patently, obviously _wrong_ for anyone who has tried learning any artistic skill in their life. Sorry to be this straightforward, but it gets on my nerves every time I read it.
Both my parents (though retired now) were commercial artists. I was trying to be an artist at one point in my life before moving in Engineering and Science. All my parents friends are artists so I grew up around artists.
Ask any artist here who is using illustrator, Photoshop, Krita etc. How often do they google image search for textures, or reference images that gets incorporated into their artwork? The final artwork is their own but it may incorporate many elements from others artwork.
>If you tried learning, let's say, the chiaroscuro technique from Caravaggio.. You wouldn't even think of splitting the whole painting into puzzle size pieces while checking how many how those look similar when put close one another.
Ever seen hyperealistic pointillism?
Who are you to be the arbitrator of how an artists creates their work? Have you ever gone to a modern art gallery and seen all the different methods people use to create artwork?
Art is boundless and unique to each who creates it.
If a Artist uses a tool to create art, everyone agrees that is art. It could be a paint brush, clay, software on a computer etc etc. If an artist uses AI as a tool to create art then suddenly it's not art.
> It's absolutely logically consistent to allow humans to do it while forbidding AI to do it.
Why would you, though? If art is for the sake of art, then all art is valuable regardless of origin. If art is for the sake of providing human employment, AI being better in no way stops performative make-work from existing. If art is for the sake of copyright trolls to troll harder, then fuck art, feed it to the AI!
Art is for the sake of humans producing said art. I don't give a flying f*ck about "art" (it's not art) generated by a neural net using statistical patterns.
This is such a strange position to hold. At some point in the future may be confronted with an image that is emotionally stimulating, and you will have no idea whether a human or AI created it. Are you suddenly going to dismiss and discard it just because you subsequently learn it was AI generated? You must see how silly that is.
Your "logic" for making AI art illegal is basically "don't like it". Your personal and subjective opinion is that it's not art by definition.. This is like refusing to eat artificially grown meat because you have some strange idea about what food "should" be. Even if the meat was made MORE delicious you would still claim it wasn't food and turn it away. There's no logical consistency to your position, it's purely reactionary.
A good chunk of art is just that, to my understanding. People go to art shows with a knowledge by whom, when and “how” the art was done. They will be very confused if asked to tell two pics apart if you time-travel to Picasso, ask him to paint a new unique pic and then generate another one with AI. They can even find an idea, symbolism and what he felt/thought of behind an AI version.
All this boils down to a simple fact that egos like to think of themselves (and of artistic interaction) much more than there actually is.
I remember a story when a literature teacher insisted on a definite symbolism of some minor detail in a novel. People contacted the author about it and he said no, there is nothing behind it. It was just a filler without any second thought. Makes you think how much symbolism is far-fetched in classics, where you cannot simply email an author.
My position is perfectly logically consistent. Art is distilled human experience and human emotions emerging in a particular context after a chain of events. None of this is true of AI "art".
What is not logically consistent is to claim that a black box utilizing statistical relationships between pixels in a giant dataset is an "artist" and that its products create "value".
The compiler is not a programmer, AI can never be an artist.
- It appears people can train AIs from scratch or at least fine-tune them at home.
- Even if your art isn’t in “the training set”, that does not prevent the AI from learning its style. (Someone can decode it to CLIP embeddings. It could have a really good text model trained on vivid art museum descriptions of your art.)
- The ability of an image model to generate your art means it could also be trained in reverse to recognize it, producing a caption model, which would give vision to the blind. And surely you’d feel bad about that.
If you forbid it, the investment in developing those models disappears. They will be stuck at ~what we have now at best.
You can also require cloud providers to enforce a ban on training (and deploying) such models, it's doable. Good luck training it in your basement, it will probably take you a decade.
If this is banned, it will become a lot like piracy - yes, it's available, no, most people (at least in the West) don't do it, practically no businesses do it.
Training these models is much much cheaper than you think it is, and there’s good data for it already.
Either use a CC0 set like Wikimedia/Flickr and throw in some dead artists like Brueghel, or train on data from a country we don’t respect the IP of. Lots of Taobao product photos out there. It’s enough.
You are getting ridiculous now. Training this on "Taobao product photos" will lead to a useless model that is unable to produce practically all of the "cool" demos posted here in the last week.
A few months ago this task was virtually impossible. Then it was possible, but extremely expensive and pay-walled behind the "Open"AI' website.
As of about a week ago this tech runs on consumer GPUs. The weights have been downloaded 100s of thousands of times, and fine-tuning / modifying is possible.
Training from scratch is about $500k still, but it will only get cheaper and easier.
This doesn't contradict anything I have written. The average technical user will be unable to train this exact model (not to mention the supposed future more powerful ones) in their basement in this decade.
That just feels like such a pessimistic forecast to me. Of course, the current trajectory of improvements in model efficiency and better commercial GPUs / ML-accelerators may hit a wall.
But I would not be surprised if this was trainable on a commercial GPU at home within that time. But I think another important trend that we are seeing is that you don't need to train these models from scratch.
Open-source "foundation models" means that you can usually get away with the much easier task of fine-tuning, as to not throw away / re-learn everything that these large models have already fit.
Edit: I initially said 2-5 years, but on more reflection this does seem optimistic (for training from scratch).
If things go that way, the 'legit' models will continue dev, just using licensed content (along with public domain works). It will be more expensive for the end user, but that cost will shrink over time for general work. Tools that mimic working artist though might not be available (or will expensive). This all seems pretty ideal, so the pessimist in me guesses it's fairly unlikely.
I am not sure it will be possible to get enough training data that way.
I don't know enough about diffusion models but if LLMs (of current size) have to use only public domain, they will be undertrained and we will see significant degradation in performance. Not to mention that Codex will be effectively dead.
Human artists can copy an art using transparent tracing paper on a lightbox just like that, and that will be plagiarism. No reasons AI output shouldn’t be.
Fair Use is a defense, not a right. Just being transformative isn't enough here, that's just one of many different factors that needs to be checked. Furthermore Fair Use includes evaluating the effect an infringement has upon original work's value.
So when your image generator keeps spitting out Gettyimages watermarks, while you are building a service that is in direct competition with Gettyimages for stock images, there is an argument to be made that Fair Use really doesn't apply here. As what you are doing is essentially stealing Gettyimages' work, AI laundering it and selling it back to their previous customers.
With StableDiffusion a Fair Use defense might have an easier time, as the results are released to the public. But it's still not exactly clear cut. If you type in "Mona Lisa", you'll still get something that looks like a copy of the Mona Lisa, not like an original work.
The new technology looks like it could threaten artists' livelihoods by replicating something about their work. The old legal framework for protecting artists' livelihoods from photocopiers etc was copyright. The goal of making sure artists can both make art and eat is still relevant, so people reach for the tool that used to make sure that was possible.
If it really does make it hard for artists to eat from selling art, that to me is just sour grapes. I mean there use to be 1000s or 10s of 1000s of liveries to lock up your horse. There's no goal to keep horse keepers so they can still keep horses and eat. Instead, their industry mostly disappeared. I'm not heartless but at the point their services are no longer needed that's just the way it is. Cloth weavers replaced by the loom.
I don't think SD will replace most top artists for now. It's hard for me to believe SD is going to come up with images like those from top concept artists. But I can imagine SD replacing lots of situations, like maybe stock photography, when you can just ask the AI to draw "people in front of whiteboard discussing sales chart"
We need a different tool than artificial scarcity. Both for copyrights and patents. Government funded prize systems and patronage systems with a lot of mechanisms for citizens to choose what to fund seems ideal.
Whether a work is transformative or not isn't the only factor that goes towards a fair use ruling. Bring transformative is just a part of the "purpose and character of use", which is a single factor weighed alongside the nature of the original work, the amount and substance used (if the "heart of the work" was copied), and the effect the derivative has on the market for the original work. It's much more complicated here, and will have to be ruled on a case-by-case basis, as far as I know, and copyright holders could potentially make many cases using the other factors that a use isn't fair. I do think it should be considered fair use, but as the law currently stands, it does appear to be more complex, at least from my point of view.
Online artists esp. fanartists have a strict moral system with rules like “credit the original artist” that isn’t based on actual laws, so they’re upset about this.
I think that's wishful thinking. It's objectively a mechanical derivation based on many copyrighted works. In many cases it's going to reproduce some works with not that much transformation.
What's a similar thing that has come before this? I can't think of any, this is very novel. You'd want to wait for some rulings before you jump to conclusions.
It might be similar to some forms of (human composed) sample-heavy music which use bits and pieces from many different songs to create something entirely original.
As far as I understand it, this is still considered copyright infringement in most IP law systems. (If the samples aren't cleared)
Yes but we have a common law system and there's already tons of precedent that training AI systems is transformative.
It's also quite obvious just by looking at the generated images that it's clearly transformative. The images generated are unique and you can't trace the original copyrighted image from what's generated.
You really don't need a judge to see that Fair Use covers Stable Diffusion.
What happens if you give an image prompt like "mona lisa", "daffodils van gogh", or similar designed to describe an image the model was trained on. Will it generate that image?
Or for written works, start with a sentance from a copyrighted work, or part of licensed code. Will it start reproducing that work word for word (like code pilot can do with the GPL license)? Getting these to generate copies of GPL'd, company owned, or other code with restrictions can lead to complex issues for the person/company using that code. Or likewise if a story contains significant elements of copyrighted works; worse if the works have trademarked elements.
> The images generated are unique and you can't trace the original copyrighted image from what's generated.
That might not always be true. I've gotten some results back that had the Getty watermark on them and others with the artist's signature. Unless the AI is adding that to images that never had one before (which might be a trademark issue), then you might be able to determine the provenance of the image components.
Is there something equivalent to the yellow dots printers add to their output that would survive the AI transformation?
Federal lawsuits are not cheap and the default is that you pay your own costs, you have to win the argument that you should get court costs & attorney's fees.
Well, you can still be sued even if the reason is almost totally bogus, and you have to go to court and make a stand that the reason is bogus. Doing only 100% legal things won't protect you from being sued.
Creating and allowing public use is a positive externality. Negative externalities should be taxed, and positive externalities should be subsidized.
Everyone should be able to freely use any public works, and governments should ensure that people whose art is being used to train these models are able to continue to do so.
What will happen is AI will be fed IP legal corpus as part of training. It will only show images that can't be linked (in a justice setting) to any particular work. Which will be interesting because it will produce media culturally alien but still appealing and probably addictive. It will literally be the engine of advancement of human culture. I'm not the conservative type, but I'm still slightly concerned as much as fascinated.
>What will happen is AI will be fed IP legal corpus
I first read this as a prediction that AIs will be employed to generate future IP legislation, and now I don't know if I'll be able to go back to sleep.
How about some company just make site that licenses art. In secret they run SD to generate images forever. They use image search to search for similar images on the net and claim copyright. PROFIT!!!
My son is an artist and we have conversations around AI generated art almost every week these days. Our general consensus has been:
1. Artists that use SD etc., will save a lot of time and have a big edge. Example: Generating references instead of searching for them, Generate random inspiration(s) for abstract ideas, Variations of a particular idea to compare and contrast.
2. Non artists who have powerful imagination but lack the artistic skill will stand to benefit as well. For example, I look forward to generating visual companions for my poetry, essays or fiction. I have the choice of commissioning an artist if my piece is successful.
3. Artists that are purists and masters of their craft with a very unique style will continue to thrive in their niche. These people have to lobby and come up with a license that explicitly prohibits feeding their images for learning. But a skilled artist can replicate the style and feed it. There is an ethical line here that may require creative tweaks to existing laws to protect people that fall in this category.
Super exciting tech and I can't wait for this to move over to 3D models. I know 2D image to 3D model is already pretty close to being real(nvidia ominiverse, nerf etc) and SD can be the starting point of that pipeline. Prompt -> 3D world that can be decomposed to meshes, textures and USD will change the game development landscape quite a bit.
So I just tried beta.dreamstudio.ai, and am a bit underwhelmed.. It's compositions generally look silly or completely ignore the instructions, even if I turn cfg way up. I was generally generating 9 compositions at a time, here's some of the things I've tried:
"An elf fighting an orc in a forest"
Half the images had christmas elves so it's obviously bad at context, half looked like miniatures (models like games workshop), so it's obviously got a load of its data from pictures of miniature but doesn't realize and thinks that's what it should draw, and the "fighting" always looked pathetic.
"A human berserker with an axe fighting 3 orcs in the desert"
That had problems with actually putting a human in. Also no axe. Orcs designs were obviously stolen from LotR films orc design, the actual breath of orc designs in fantasy didn't seem to be represented at all. Often the weapons are for some inexplicable reason only part drawn.
"A space elevator leading up from new york"
Was a little more promising, but looks pretty amateurish, just a picture of a space elevator transposed over a picture of new york with terrible lighting, though it was choosing half decent perspectives.
"A 12th century castle on a hill in a hellscape surrounded by demon hordes"
Pretty much ignored everything after "castle", just a bunch of 12th century looking castles.
"A castle on a hill in a hellscape surrounded by demon hordes"
None of the 9 images generated has any sort of demons, it just coloured some castle a bit red.
Your experience is quite common, especially when new to prompting. You can get more control by checking prompt guides, but this still leaves a lot to be desired.
You seem to have missed the main point of the article this thread is based-on though. With img2img, you can input a kid-level MS paint input and control composition in detail. With inpainting you fix individual parts, with outpainting the surroundings/background. With text inversion, you can reuse successful parts in any context. People are even making the first videos and animations.
All of this is available today, not years from now. The workflows are still clumsy, but it's been a week. Many people are working on the UI to streamline all of this. You're severely underestimating progress.
It's funny how hyped up stable diffusion is on HN right now: reminds me of when style transfer first started making it's rounds in 2017. https://news.ycombinator.com/item?id=13958366
I think as technologists we want to think that code can "solve" some of the problems in the art world... but I think we still have a really, really long way to go.
I tried to get style transfer adopted at work (worked at a creative technology firm in NY) but frankly I think deep learning methods for art generation tend to be really unpredictable, which make them pretty hard to use for professional applications. Imagine deploying production code that only worked 85% of the time... would be a nightmare. I felt, and feel similarly about deep learning approaches to art. They're just so finnicky and unpredictable, for example, add a single extra pixel to that example in this article and the output would look completely different.
Either way, cynicism aside, stable diffusion is awesome :).
> Imagine deploying production code that only worked 85% of the time... would be a nightmare. I felt, and feel similarly about deep learning approaches to art. They're just so finnicky and unpredictable, for example, add a single extra pixel to that example in this article and the output would look completely different.
Don't think the metaphor works. Code that only works 85% of the time is obviously broken but Art is subjective so an 85% solution to a creative problem could be more than enough for most consumers.
What kind of GPU are you running this on? My 3080 seems to take about 30 seconds per image with 50 passes. I'm wondering if I'm missing out on some optimizations. Could just be the quality of Linux NVidia drivers.
I'd recommend trying a different fork. Perhaps you're using the the official one. I believe that one still "ramps up the system" on every image generation. Other repos do the ramp up only once.
I'm using 512x768 as the default, but a quick test shows only a marginal difference in speed between the two. I'll have to give Windows a try to see if it's the driver holding me back. Do you have any tips or resources for up-scaling the image after?
> No-one expected creative AIs to come for the artist jobs first, but here we are!
Maybe that's because we never really thought about it. In hindsight, it's only logical. For an artistic rendering, correctness doesn't matter much nor does understanding the model. For flying a plane or driving a car or just transforming code from one language to another, it very much does.
I don't know about that. A lot of the images I've seen SD generate have been interesting but "incorrect" in really important ways that indicate it doesn't _understand_. Things like people's limbs not being connected correctly, or three legs, or a wheel on a car not being in the right place, or elements of perspective that are just off. All things a real artist would never do unless it was intentional.
It's like some of the stuff that GPT-3 generates, it sounds extrememly plausible and realistic but then you read it properly and realise that parts just don't make any sense.
However I don't think these things are show-stoppers, I think we'll see a lot of artists getting good at providing the "feedstock" and guidance to these systems to deal with those quirks and generating some really interesing work.
Symbolic AI dominated the field for 50 years: thanks partially to an accident of history (the highly influential work of Minsky, Newell, etc) and partially due to lack of the data needed to try anything else.
Now we’ve seen essentially 50 years of research on data-driven methods compressed into 5. In retrospect it makes sense that applications to which symbolic reasoning is especially ill-suited (including artistic rendering) would be the lowest-hanging fruit.
That depends on how easy it is to use the new tools.
Getting higher level languages and various libraries/frameworks didn't make coders obsolete, because there was still tons of work to do at a "higher level" that required algorithmic/engineering-style thinking. Yes, previously difficult tasks got a lot easier, but that just meant that even more ambitious projects could be tackled, and that was useful.
In contrast, it may be the case with art that the things a relative newbie can spit out with the help of image generation models are "good enough" for most use cases. Yes, a real artist would be able to squeeze out even more, but many companies may be willing to go with the option that's enormously cheaper and still seems to produce okay results.
Is the same not true for programming? There's a lot of non-programmers that create programs in Excel, or use no code solutions. A relative newbie can still get a Wordpress website up and running, you only need professional SWEs for the heavy duty stuff.
For better or worse, you very very quickly pile up complexity when programming in a way that novices can't deal with (and experts too, often).
What we're going to find out is whether novice users of image generator AIs frequently get stuck with inadequate results. IMHO the answer is likely to be "no", especially after we've iterated on the tools for a few more years.
Agreed, but the rub is that there was a ton of demand for "heavy duty stuff"; indeed, the improvement in code tooling actually increased that demand, because software programs became so much more powerful.
It's not clear that you'll see something similar here, that easier-to-produce high complexity art will greatly increase demand for said art.
There are some kinds of “correctness” that matter. When you’re doing something new, you can only really do one new weird thing at once. Too many weird things and it falls apart or distracts the viewers.
(The “cow tools” Far Side comic is a dumb example.)
Since these AIs can’t count, that extra weird thing is probably going to be people with too many fingers. I actually have a personal collection of weird Midjourney images because if you ask for wide aspect ratios it starts generating multiples of the same thing and fusing them together…
Regarding the ethics. Thought experiment: I look at a copyrighted image, get inspiration, and then draw a new image (which is not the same as the copyrighted image). Is that ethical?
But if I do this millions of times, after looking at millions of copyrighted images, is it still ethical (as long as I don't recreated copyrighted images)? Tough questions!
The ethics partly depends on the analogy you choose.
I can write a book about a boy wizard's adventures wizard school and that's legal, but if I call them Harry Potter it isn't.
I can create Harry Potter fanart and distribute it online pretty freely - but slap it on a mug and sell it, and that's illegal.
I can record an audio description of a painting that's as detailed as I like and it's legal to distribute - but take a photograph of the same painting and it's a derivative work, no matter how artistic my choice of camera settings.
We don't really have any prior examples that are precisely like these huge ML models trained on copyrighted data - and depending on which imprecise analogy you choose, you can come to a different conclusion.
Harry Potter is a trademark. It protects and distinguishes identity and authority. It is much harder to get a trademark, and is also harder to fairly use one. Different issue.
If your fan art is infringing, it was infringing whether or not it was on a mug or on dropbox.
Photograph one is not true. For commentary it can be by audio, or printed on a mug or whatever, commentary is transformative. Go out and take a photo of the world outside, if you live in a city you've captured thousands of copyrighted materials in your image. Maybe it captures someones painting, maybe it doesn't. Whether its fair or not is if it's transformative, the format doesn't matter.
I'm not seeing an ethical difference from any of this. Or did I miss the point?
> Harry Potter is a trademark. It protects and distinguishes identity and authority.
Perhaps I should have said Darth Vader, then - the point is you can copyright a character independent of the copyright on a book's text, and the trademark on the series name, and the fact that broad concepts like "black-clad masked evil overlord" are uncopyrightable. And that copyright can persist even if you transform a book character into an engraved coffee mug.
> I'm not seeing an ethical difference from any of this. Or did I miss the point?
The difference is:
If a person says "Stable diffusion is to its copyrighted training data as an audio description is to a painting" or "Stable diffusion is to its copyrighted training data as the concept of boy wizards is to harry potter" they would probably say it's ethically fine.
If a person says "Stable diffusion is to its copyrighted training data as a photograph of a painting is to the painting" or "Stable diffusion is to its copyrighted training data as video lecture is to a single image in its slides" they might well say it's not ethical.
Is your third example correct? Andy Warhol was famous for using photographs of existing art and transforming them only slightly. Roy Lichenstein literally took panels from DC comic books and magnified them and became celebrated in the art world for it.
Depending on how close it is it may not even be legal. To give a music analogy: at least in my country, if you take a melody, change all the notes' durations, and transpose it you'd still infringe
In my opinion it’s just a technical side of ethics, which has much less value than the main side: what it does to an author. If they suffer from these copies (morally, financially, spiritually, etc) without a way to offset it, then it is not ethical.
Well, but that's hardly a universal rule either. If we have a magic cure for bad eyesight, that'll be pretty terrible for optometrists, lensmakers and so on, but nobody would think it unethical that we're taking away their jobs.
To me it’s a different situation, because no previous hard, unique, “ownership” work of these people was used without consent to make this cure.
Think of this instead: a doctor collects a big volume of symptoms and analyses and creates a statistical way to cure people more easily. They publish many examples of their work without licensing anyone (legally and morally) to use it freely. Now some algorithm collects their data and many others data and transforms it into a better method. A doctor suffers from going out of business. Is that ethical? On one hand, the algorithm invented something new and easier to access. On the other, it basically stole parts of their and similar researches on a previously unthinkable scale. We humans copy ideas all the time and this is somewhat normal, but this enormous at-scale capability was never a thing.
Personally I don’t care for optometrists, uber drivers or designers. Nature will find a way. But when we talk about fundamental social contracts like property or accumulated knowledge protection, I think it is unethical to break them, regardless of technicalities. If it’s such a great advancement benefiting everyone, why can’t AI creators just ask permission for 2.3B of datapoints they used?
But that's what makes this so tough. Whether you consider the ways in which these AI models repurpose existing artistic works to be mere technical details or central to decide on the ethics of the matter depends largely on the analogies you reason by, as michaelt noted.
No need to bring AI into it. What if someone chooses to open a Wendy's in a town that already has a McDonalds. The competition will defintely reduce the McDonalds's business. And many would argue (famously Rockefeller) that competing with another business isn't ethical when businesses could band together into trusts or similar structures to limit competition.
These new models completely changed my mind about how much impact AI will have in my lifetime. They are the most impressive software achievements in decades and anyone who has a “meh” reaction will absolutely end up looking silly.
I’m not optimistic that their impact will be positive though.
I will up you by saying that even your prediction is silly, for using the word "lifetime".
Not lifetime. Right now. Just one week of StableDiffusion in the wild has seen the development of many clients, GUIs, optimizations, plugins for commercial software, editing workflows, the list is endless.
The speed of it all is dazzling. It will not take long before this is assembled into a smooth experience that runs everywhere, with obvious end state being your phone.
And that's still just artistic image generation. The open sourcing of it means people can make any vertical, precisely optimized for a particular (commercial) domain.
Almost all shortcomings seem to be crushed in no time. Bad at drawing eyes? Here's a new encoder that fixes it.
The creator of StableDiffusion indicated to soon tackle music generation and if I remember correctly, even poetry. And there's 3D and video.
The ultimate end state: if you can imagine it, it can generate it. Not only are we much closer to that point than society realizes, we're also moving at an exponential speed towards it.
> They are the most impressive software achievements in decades and anyone who has a “meh” reaction will absolutely end up looking silly.
With the risk of looking silly, I declare "meh" once more, just as I "meh"-eh when GPT-3 came out.
The so-called "AI" is not fundamentally different from the AI of the 80s, it's just that now we have much better hardware. The main problem of the past AI winters still remains - the existing algorithms focus on statistical methods, which can be rather inexact. Imagine a nuclear plant controlled by an AI, or an airplane flown by a neural network. They completely lack reasoning capability and therefore you can't trust they will be able to adapt to unpredictable situations.
incremental improvements are easy to dismiss but it can be hard to tell when some critical mass of utility is reached. The first cars were electric (and steam-powered) but it took 100 years to supplant ICE (steam is next /s). the components for drone technology aren't new but control systems and batteries needed time to improve, VR has incrementally improved since the 80's, even the internet was around for 20 years before incremental improvement brought us the web...
It sure looks to me like this new development is hailed as a leap, rather than incremental improvement. Although I'm judging just by reading the comments in the HN bubble.
Either way, I think the current "AI" is fundamentally limited by the statistical approach to problem solving. Without any reasoning capabilities, no amount of incremental improvements will change the fact that neural networks are simply making guesses based on existing data sets. Nothing magical or revolutionary, it's the same thing we've known for many decades.
This is happening with images first, but I can see how this might be able to be leveraged for all kinds of things. The speed at which society might be transformed could be astounding. But it won't be distributed evenly, not everyone is prepared to take full advantage of these benefits. This is early 2000's internet distruption x2 in half the time.
BUT it will likely be disrupting the upper middle class more than the lower middle classes, as it's mostly disrupting to creative work. Upper middle class people have a louder voice.
It is capable of surprising beauty. It will however be as beautiful as the disaster that is modern Facebook -- and the misinformation will be more convincing and subtler.
I think that the cause of that is unlikely to be Stable Diffusion but the lack of education that allows so many educated people in this thread to mistake images for art. In that way a lot of the damage has already been done.
One day, images generated with these AIs will become ubiquitous online, and new AIs training on data will wind up with feedback problems because they got trained on photos that are AI generated, possibly even ones they themselves generated!
This tech is a huge deal. Huge. Ubiquitous unique, beautiful art. Entire movies made by machines. Too many to even watch them all. Actor models that aren't real people owned by the thousands by people and corporation s.
It reminds me a bit of "Low-background steel", which recently popped up again here. Images older than 2020 are predominantly human generated. Anything later is suspect!
> One day, images generated with these AIs will become ubiquitous online, and new AIs training on data will wind up with feedback problems because they got trained on photos that are AI generated, possibly even ones they themselves generated!
Possibly but not necessarily. The human curation prior to posting as well as additional textual context associated with the image could be valuable training signal, even if there is some feedback.
or when you try to google something that would better be answered by a professional that costs $1000/hr like doctor or lawyer but you just want a vague idea of the landscape and all the answers are "you need a lawyer/doctor" and not any real attempts at an answer. just shutup!
Expect hit actors from today to become digitized and immortalized on the screen. Eventually there will be a cannon of standard "actors" (customized per movie of course) that will never change once they settle in on the appropriate archetypes.
This already happens. Around 2010 the actor for the colonel from Avatar had his face digitized for use in sequels. More recently actors from earlier eras have been reincarnated in CGI (Luke Skywalker takes on his original trilogy appearance for the Mandalorian).
Although it's likely that the images that appear online will be peoples' favourites of many candidates, also likely developed through many steps of AI tweaking. So that's still a useful signal for another AI to consume.
The images I generate don't have a watermark. Same applies for a lot of other AI generated images. The watermark is the most trivial part to remove and is definitely not a defining feature of what makes an image AI generated.
I've said this ad nauseam - but people who think this is going to kill an industry clearly have no idea of said industry.
It's a fantastically great tool, and a very exciting space, but reducing the function of creatives to people who draw pretty pictures is staggeringly ignorant.
Live music didn't disappear but the industry took a huge permanent culling when any cheap venue could just play a track. The replacement doesn't have to be great to trash an industry. It just has to be good enough for most. This is.
How about the fact that you can go on github today and find a thousand ml based music generators probably, and yet people still like going to concerts and seeing an artist play an instrument.
Not a meaningful comparison. Seeing an artist play an instrument is a fundamentally different experience. It's visual, much more impressive sound, a social experience, and so on.
With imagine generation you just look at output. There's no difference between seeing the output of a human-created image or an AI-created image, people can't tell.
Ok, but you still see people passively listening to real artists instead of ai generated music. Provenance matters for music and for art. Maybe if you design retail art for Target without your name ever put on the work you have to worry.
A better comparison is like saying AI text generation bots will replace authors, or that AI drug testing will replace drug development - which clearly isn't the case. The core issue is that people with no idea what the creative fields offer are throwing their hat in with what they think is going to happen. People with experience are saying the opposite because they know better.
At best, this will remove those websites where you can pay $5 for "a designer" to make you "an image" - is this a loss though? Such things have never been a threat to creative fields.
You pose it like an all or nothing equation, which it isn't.
The way I see it, if you'd consider the art world a pyramid, the bottom is about to fall out. A lot of commercial artwork serves no deeper meaning but pretty decoration.
The emphasis will move to ideas instead of just execution. Artists will soon find out about the avalanche of people that have creative ideas yet can't draw or paint. They'll be unlocked.
In context it is all or nothing - because people on HN think illustrators and artists are what they find on fiverr.com. They're making hugely naive blanket statements saying that this will destroy an industry and make creatives unemployable. These doomsayers have literally not the faintest idea about the job they think is being erased by txt2img.
If people were saying "oh hey this is going to give the lazystock on iStockPhoto a run for their money", I wouldn't debate that point, it's true. However that's not the industry, and it's certainly not where the money is - neither in # of customers nor total spend. Those people who you might think are customers simply put: aren't, they get by with images stolen from google images and bundled clipart, or frankly: nothing at all.
Now this isn't to say that txt2img isn't useful or exciting. I can say that it is the largest and most significant expansion of creative tech since the advent of DTP. This will absolutely accelerate and open the door to not just higher standards, but new ways of rapidly ideating concepts. I've already seen fantastic examples of txt-to-image-to-mesh-to-live animation. All automated through AI.
This is also why I speak against the other kinds of naysayers: the ones that think this tech is unimportant. These types are being incredibly short sighted and acting like we're looking at this tech's endpoint, rather than its infancy.
tl,dr: No creatives are not being put out of the job. Yes this tech is incredibly important.
I agree with you. I think it's understandable that non-artists commonly associate art with what they interact with or see the most: illustration and decoration, stuff found at artstation, the like.
Surely you have a point that this does not cover the entire world of art, but I think it would be helpful if you constructively explain which parts are less or not affected, instead of calling people ignorant.
I’ll definitely continue to call people ignorant when they make grand unsubstantiated claims, that frankly are nothing more than trollish internet behaviour.
A better approach for people is to ask questions, rather than trying to write controversial falsehoods.
Right now there is at least one high ranking submissions on HN where a creative details how this won’t end their career, but you don’t need to read it - social media is filled with creatives literally rejoicing - no one is sweating this.
So to that: I say that ignorance to this is definitely a choice.
I've started an open source implementation of a Discord bot that turns your prompt into images using Stable Diffusion [0].
It would be great if this would turn into a community-driven chat-based text2image offering - which is certainly challenging as the required GPU-powered server instances aren't exactly cheap. Maybe this could grow into an open network where people provide the GPU power of their personal PCs? Or we find a way to cover hosting expenses through a credit system or some kind of sponsoring?
Feel welcome to join the community [1]. You can easily test drive the current Bot implementation on this server.
So why do you come here claiming "like a thousand files spread over a hundred folders"? Have you considered the guidelines [0] of this community?
[0] Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community. https://news.ycombinator.com/newsguidelines.html
These systems make for a good family game in our house. We take turns with prompts and have the laptop broadcasting to the TV. I imagine you could make a more formal party game from it - some sort of weird version of Pictionary perhaps?
I'm honestly going to think about game ideas and try some with the family. It's fun to play with these models on your own, but it's way more fun with a group of people laughing and one-upping.
(Also, very happy for somebody to steal this idea!)
It is for authors that got their works stolen for "the greater good" without even being notified.
A friend of mine found his works in the Stable Diffusion dataset, the work was not meant for public use, he's never been notified and, most of all, he would have never agreed if they cared to ask.
I don't think the word "stolen" applies here, at least not any more-so than I stole from him when I looked at that image. I'm not sure about that website rehosting his image, but in terms of the model people are running on their computers, it's not like it literally includes the image that can be pulled back out of it. A computer looked at that image, and tweaked a very small amount of numbers within ~5GB of numbers based on it.
That said, it did get a little better at drawing like your friend from that. But so does every other person who looks at his art. The model however might be better at it than most people. It's a dilemma but it's hard to say any individual image is "stolen"
We kinda already went through this once when GitHub scanned all public codebases to build the model for Copilot. When they did that, Copilot got the ability to program a little more like me (sorry everyone). But when I publicly put my code out there, I was also giving people the opportunity to learn from my code and reproduce similar things to it as well.
> But when I publicly put my code out there, I was also giving people the opportunity to learn from my code and reproduce similar things to it as well.
Not really, IMO
I can enter a bookshop and look around, open a book, read some pages and maybe learn something.
It doesn't give me the right to republish a similar book putting together what I saw, unless the license specifically grants me that right or I obtained a license that does.
Code Pilot is a commercial product, free licenses don't automatically grant the right to commercial use of that code.
I would venture to guess that the author doesn't have a full understanding of what the AI is doing or how it works.
There are a gazillion artist who take clear inspiration from other artists, and there is no copywrite violation in doing so. The AI viewing his work isn't any different.
> I would venture to guess that the author doesn't have a full understanding of what the AI is doing or how it works.
Actually he does.
> There are a gazillion artist who take clear inspiration from other artists
Inspiration is not the same thing Stable Diffusion does.
We can't even define inspiration in a proper manner, but for sure we can say that if someone wants to draw comics in the way Tezuka made them, they have to study, exercise, rinse and repeat for at least a few years.
No human can scrape billions of images and take inspiration from all of them, not even in 10 life times.
Also, no human will make something similar to something else seen for the first time in 10 seconds.
We can go back and forth all day. But legally your friend has no legs to stand on. People do "like copies" of works all the time, and it's been ok'ed by the courts just as much.
Unless your friend thinks the AI is copying snippets of his work into generated images, I'm not sure where they are getting these ideas from.
> But legally your friend has no legs to stand on.
My friend is just upset.
Legally he has every right in the World, he's the author for Christ Sake!
Will he try anything?
Of course not.
Are you okay with this?
Well, then you should reconsider your values.
> People do "like copies" of works all the time,
If those copies are authorized, I don't see the problem.
Try to recreate a Star Wars image and sell it on the Internet.
See what happens.
> Unless your friend thinks the AI is copying snippets of his work into generated images
Would you bet your life on the fact that it doesn't?
You know why we Europeans came up with the GDPR?
Yes, exactly, because processing of the data, automatic or not, must be authorized by the holder of the rights on that data.
We are not talking about artistic expression here, I'm not sure where you are getting this idea from, this is not inspiration or art or human expression, this is simply data processing leading to algorithmic replica of other people's works.
Without people's work, no model could replicate it.
Impressive, still completely dependent on some kind of source material, that they scraped, they haven't produced it by themselves.
There will be a day when people like you will realize that appropriation is not right.
Books are different, most of the works models are trained on are in the public domain, Shakespeare is public Domain, nobody will ever contest that, but if you think a living author has no right to have a say before someone process their work, you're the one with crazy ideas.
I think you don't know how laws and Author rights work
The simple fact his work became part of something else he did not authorize is the problem here.
And yes, it could spit out something that is very close to the original, so close that fair use could not stand.
Fair use is not a right!
> If your friend is still upset, maybe they should consider the artists they "stole" their learning material from.
He does, don't imply differently, ad hominem are a stupid argument for very stupid people.
That's why he spent 30 years of his life learning and in the end he became good enough to meet the artists he "stole" from to thank them of what they did.
You seem to lack the ability to understand the difference between being a good person and being a senseless automata...
If the model physically does not have enough information to create a work that would be found to be infringing on your friend's rights, then why is your friend upset? The model viewed his work and learned from it in the same way that humans do.
If your friend published images on the internet, he clearly intended them to be viewed by (meat) neural networks. Why are silicon ones operated by humans any different?
Edit: To elaborate, if (10 years ago) someone saw your friends works and started producing derivatives and publishing them to DeviantArt, would your friend have any good reason to be upset?
I think that your and your friend's argument reduces to "the AI is not a human, but a machine-like, and thus can not be `inspired' but only `reproduce'". I don't know if it is a novel argument to be tried in a court of (copyright) law, but it is certainly a good one.
> I think that your and your friend's argument reduces to "the AI is not a human, but a machine-like, and thus can not be `inspired' but only `reproduce'".
That's the one argument, but it's not the most important important.
The most compelling issue here is that SD used copyrighted data scraped from the Internet without even informing the authors, who were not unknown to SD authors, because they tagged them in the model.
using work without permission is a violation copyright. imo the fact that the original has to be scrapped to create the dataset completely negates any argument about whether the final product is transformative b/c it cannot exist without the original data
They didn't take the image. They looked at it and learned a few datapoints from it. It's not like they compressed the image and added it into the model, able to be reversed by anyone who downloads it. In fact it's likely there's other completely distinct images that could have contributed the exact same tweaks to the model, like a hash.
>I can enter a bookshop and look around, open a book, read some pages and maybe learn something.
>It doesn't give me the right to republish a similar book putting together what I saw, unless the license specifically grants me that right or I obtained a license that does.
No, but the analogy would be if you learned something from the blurb you read - like a word, or a factoid, or an expression - and it influenced some writing you did later. You didn't steal from that book's author.
> They didn't take the image. They looked at it and learned a few datapoints from it
All rights reserved means "all rights"
Nobody authorized them to do what they did.
> No, but the analogy would be if you learned something from the blurb you read - like a word, or a factoid, or an expression - and it influenced some writing you did later. You didn't steal from that book's author
Wine contributors were not allowed to see Windows leaked source code to avoid copyright infringement claims.
both WINE and ReactOS have refused to use the leaks; ReactOS doesn't even allow people who have worked legitimately at MS in the past to be developers, simply because even the smell of contamination would expose the projects to enormous legal risks
So, to be fair, Stable Diffusion authors should take inspiration from all the source material, replicate it the best way they can using their own abilities and then train the model on what they produced.
They didn't do it for two reasons:
- it would have required centuries
- the result would have been much less compelling
stable diffusion is so interesting due to the fact that the source material was of high quality.
So, we can conclude that it's the merit of the authors of the source material, not simply the model itself.
Shouldn't they be rewarded or at least consulted before using their works?
What if some of them is training their own model on works they have the right to use and now Stable Diffusion took all their hard work away?
If stable diffusion was trained on my drawings, it would produce a steaming pile of shit.
This why we have the fair use defense for ignoring copyright [0]. It’s more efficient for society to limit the control copyright allows authors of works so they can be leveraged to enable something even greater. While many in the media community ask permission for almost any use, to maintain good relationship, and limit liability you don’t have to for many uses.
“Copyright” and fair use are human defined not absolutes, so the real debate is whether AI models should be allowed to learn from copyright works like a human does. This use case is also close to people who use clips or samples snippets of works in other bits of art which they then profit off of. It is way more efficient at this process than previous methods (having an artist study and practice a style then reproduce it), so that is where the debate is.
I’m more trying to see what the utility of stable diffusion (or just the text to image problem) in the long term. Right now people can play around with making weird art pieces and maybe it will be integrated into design tools...but then what?
Eg with other AI problems out there I can see a potential application to medicine, self driving cars etc, but I just don’t see what the bigger goal of this is going to be.
Magazine illustrator, concept artist, stock photography, video game art asset creation, logo generation, interior design, fashion design, costume design, political cartoonist, caricature artist, website layout templates, font design, pixel art, book covers, children’s toy plush design, new car concept prototype…?
That's just existing industries. All the new crazy stuff it'll unlock... Custom operating system interfaces for everything based on the data they're working with, truly-open infinite video games where you can run or zoom any direction and have it write itself, bespoke lcd wallpaper customized on the fly to the person walking by, bespoke industrial design creating 3d objects (first digital, then printed) for exactly the context they're needed for merely materials cost...
Does anyone know if it's possible, with a high end gpu or relatively cheap cloud provider, to use transfer learning to extend the model with in-house images? I think it would be useful to customize the output for certain genres or applications.
Yep it’s missing intent, just like machine translation. There somehow seems to be less information than there is in the input. I think more people will start to notice it as time goes by.
> There somehow seems to be less information than there is in the input.
This captures my thoughts very well. That's why these images get old very quickly - you can basically imagine the same thing in your mind. There's no real whimsy, surprise or creativity there. For now at least.
yeah one thing I think of is when an artist 'creates' something, its coming from their brain, like expressing what they feel. When AI generates the Art, even though there is a human curator, the lines and shapes the AI picks out is all random/arbitrary/no-emotion (even though it draws upon a huge dataset of lines/strokes/styles to form). It's "noise" art. The noise is just cleaned up to resemble real things.
They are very clean looking, it could be an effect of SD's compression to be so small.
MidJourney's production model is good about dirtying things up, you could use it as a post processor. I think its default style of turning everything into fine art-cyberpunk-movie matte painting gets old though.
I've gotten some good atmospheric photorealism results out of DALLE, but currently trying to get outpainting to work to extend them and it's tricky.
Why do Stable Diffusion images all sorta look the same. They look either deformed or like pastels . I tried it a few times on some free sites, and it was not accurate. I would put in 5 keywords and maybe it would get 2 of them right. These are common search terms.
It's not a search engine. It takes a while to get a feel for what prompts work and how to phrase things. I suggest browsing https://lexica.art/ and using some prompts from there as a template.
Some of the tech and especially the platform they're building is impressive, but in terms of raw image generation quality from results I've seen and my own experience, I don't find it anything close to DALLE-2
I tried SD for a bit and the results are at best mediocre.
Dalle2 is dimensions better at understanding prompts, but it’s still a beta. So there is hope.
Linux started out pretty mediocre compared to closed source Unix, and even today there are operating systems that are better, but being freely available made it a game changer. I don’t foresee Dalle2 being open any time soon.
I'm struggling with the connection you're making between Linux and Stable Diffusion.
Linus didn't invent the core concepts of Unix. He copied the (arguably) good parts from an (arguably) closed ecosystem. His big innovation was leveraging the internet to create a new kind of community not really seen before it. The Linux bazaar gave smart developers excluded from the Bell Labs / BSD cathedral a place to be productive in that style on their own terms and others used it to disrupt the OS business. A lot of shit gets into Linux that the community turns into useful things.
I look at the inputs/outputs of Stable Diffusion and am reminded of the fractal craze of the 80's which revealed the underlying simplicity of incredibly complex looking things. Lots of interesting art and technology came out of that but I don't think anything like the Linux community did.
I think its because we've eveolved to be more sensitive to faces and pay more attention to them. I suspect the whole image is off, but we really notice it in faces and particularly the eyes.
Playing with the prompt-only demo at https://huggingface.co/spaces/stabilityai/stable-diffusion I got the impression that many apparently harmless requests corner the model into a very sparse set of examples exhibiting extreme bias and utter nonsense. For example (seed 0, other advanced options at default values):
Blue hamsters filling donuts with nails
mostly edible donuts and quasi-donuts; some blue, but no hamsters and no nails.
purple hamsters in free fall, eating gold nuggets
no free fall, some purple objects and backgrounds, somewhat metallic hamster hair, a five-limbed black and white hamster, credible gold nuggets without rodents, eating hamsters without food, multiple almost identical instances of eyeless white and yellow hamsters in the same pose.
Martian hamsters wrestling
generic reddish hamsters, not wrestling at all.
a hamster is the CEO of a financially struggling startup, in San Francisco
remarkably standard hamsters, in somewhat office-like environments (wooden tables and harsh light), with curiously unbalanced camera angles that might be random or inspired by the source material.
the most beautiful hamster in the world, parading on a toy car and wearing sunglasses
No complete pair of sunglasses, but an interesting hybrid between round black lenses and round black hamster eyes; unusually deformed hamsters; an incoherent furry thing with a plastic part floating in the air; a deformed toy car with its back nicely replaced by a hamster.
Vampire hamsters feeding on unsuspecting tourists in a dirty alley
No vampirism and no tourists, but nice dirty alleys.
I insisted on hamsters because they are usually photographed in very few poses and activities, leading to completely ignored indications.
Prompt (modified from a recent post on the /r/StableDiffusion subreddit):
A picture of 2 hamsters wrestling on the surface of mars, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by greg rutkowski and alphonse mucha
I think many of your prompts have trouble getting the right representation because you're using a lot of abstract and implied language. For these prompts to work right, you need to be very specific and often add keywords as if you're specifying a mood board. You also need some overlap between what the model has likely been trained on and the expected output.
The system works much better in img2img mode. Slapping together a rough shape of what you want in GIMP/Photopea and then applying a prompt to blend everything together make the entire process a lot more reliable.
>You also need some overlap between what the model has likely been trained on and the expected output.
Ooooo. Ouch. That's... Kind of a death knell for a good tool in my experience, and suggests the tool in question is a glorified search engine in a sense. With a really confusing query syntax composed of words, and graphical starting states.
If I have to become <an expert on SD's training data> to get anything done... Why not just learn to draw/hire a creative?
Learning to paint hyper realistic paintings is something that takes years if not decades of hard work. Learning how to formulate your queries in a way that the algorithm outputs what you want takes days at worst.
If you want something unique and abstract, you're going to need to go through a lot of trial and error to get what you want. That's still a lot easier than teaching yourself how to create such art.
"Graphical starting states" in this instance isn't as hard, a very rough MS Paint picture of the general shapes you want things to appear in is enough. Alternatively you can grab rough cutouts from stock art, position them right, and the algorithm should figure out how to turn it into a single, flowing picture.
Take a look at these examples (https://huggingface.co/spaces/huggingface/diffuse-the-rest/d...), they're far from perfect but the autocompletion is done quite well. It should be noted that the demo application doesn't expose a lot of the flexibility the underlying model provides (like blend strength and such) but I don't know a free online alternative that does.
Trollface:
Disambiguation of lines in the source image is poorly executed. The model appears confused as to whether those lines are indicative of depth, or lighting artifacts. The shape and perspective are poorly chosen, and in all the resulting images the lighting arrangement is quite inconsistent.
The ears are completely unspecified, so too the nose. This is somewhat of a deliberate omission in a trollface, and adding them in without careful thought as to how it changes the piece is... Well, not the best move. The eyes are terribly arranged in all submissions.
The plate of meat, fries, and beans. You can barely see the beans in the first sample, they are hidden underneath the fries, enough that an inattentive eye may miss them entirely. No specifi ation was given as to the state of the meat, or kind, so I suppose the being cut is a nice bonus. Interesting in a sense since one may get the impression the model may have confused the grammatical deep structure such that "fries and beans" was taken as a compound predicate.
The second with the meat surrounded by the beans is an interesting contrast, but without more samples, I have questions about why all the curated samples include rare beef instead of say, sausage.
The Colloseum:
I too could use Photoshop, and select a particular palate. The more interesting aspect here seems to be the color pallete processing, and I'll admit that I wasn't able to find source works of the artist being initated to compare against. Still looking for those.
The Unicorn/Butterfly:
These still disturb me in the sense that once again, we're replacing actual artistic technique, with the ability to tweak prompts or assemble graphical starting states/prompt combos. Is it making some hellish form of combined Natural Language/graphical programming pipeline? Yes.
However, none of this would have any value without being trained on works done by previous artists who likely were not asked whether or not they wanted their works included in the dataset.
As the guy who blew up a Philosophy of Art class by positing that a well executed forgery was as much a work of Art as the imitated piece, I still see here more problems than solutions. Yes, a new art form may have emerged. However, with it comes serious questions around data curation practices. As for the efficacy of the model/runtime characteristics/how this bodes for the environment... I'm increasingly concerned the more I apply ny "what if everyone started doing this?" supposition.
In short, see a hell of a lot of hype, but precious lite coming to terms with what will ultimately be the hard questions.
The underlying data, I gather, must associate an image with a description. There aren’t millions of well-written descriptions of images. Yet, they’ve accomplished this with, I think, with merely the sparse text that comprises the title of the scraped JPEG or perhaps the surrounding text on a website.
However, shortly, there will be millions of extremely detailed descriptions of new images… that is, a person puts in a detailed prompt, and if you can capture how pleased the user is with the result, you could then add that new picture — along with the user’s prompt — to your training data. Eventually, you will have millions of well-described images, which will make the system even more amazing.
Now there are so many very successful description -> image research and tooling, I find it surprising that most of the image -> description/labels solutions are closed source and/or paid.
BTW generating images using Anaconda on Windows using my RTX 2080 Super was very easy (followed the instructions here: https://rentry.org/SDInstallGuide ). The only thing was, because it has just (!) 8GB of ram, I had to use --H 256 --W 256 to limit the output image to its minimum size, which barely fit into its memory.
I am excited thinking the next generation of video cards will all have 10GB of memory from the mid-tier up. This is going to push up the requirements of some of the enthusiast GPUs for non-gamers.
>You can use it for commercial and non-commercial purposes, under the terms of the Creative ML OpenRAIL-M license—which lists some usage restrictions that include avoiding using it to break applicable laws, generate false information, discriminate against individuals or provide medical advice.
What I'm going to do is take all those new image-generation AIs, and use them to train an image-generation-AIs-generation AI, this way I won't be bound by copyright anymore.
Is this going to start changing the world as much as search did? My guess within 10 years AI will have changed everything, it won’t look like AGI but it’ll be good enough to be better than humans at most creative endeavours including things like generating whole films and possibly games with a few carefully arranged text prompts and start images. Quite terrifying.
I wonder if it will be possible to train a neural network to do our programming tasks for us?
Reading about how it's done it's not so clear those other things are close at hand. Most of these SD things generate very low-res images and then use upscaling to make them high enough resolution that they don't look like crap.
Apparently the computational power required to make them larger grows exponentially (or geometrically?) so until we find new algorithms it might be a long time before the same technique can generate whole films and games and any coherent way.
Of course maybe that breakthrough will be announced tomorrow.
In art humans always look for novelty and authenticity and not for mechanical reproduction, which quickly becomes generic. When every guy or girl on the planet could create a DeviantArt account and start drawing, did that have an impact on professional art? Not really.
We've increased the total artistic output and reduced cost several magnitudes over through tech, and if anything it's increased the demand for human novelty rather than reduce it. But as soon as the 'AI' label gets slapped on it people start to have weird Terminator fantasies.
IMO math > programming > rest is the order. On proof search there’s a lot of prior-work. Programs are super easy and cheap to verify or simulate/generate. BTW I bet that even if you aren’t using copilot VS Code collects your input as training data. With programs it’s not necessary to understand NL, might as well work as show me example of input->output
Having seen some of the pictures of AI art (I don't know if they use the same or different programs or models), my opinion is that the pictures are not good enough, although sometimes they are almost good enough. (I think there might still be a use for it, despite this. Some other comments mention some possible ideas, although there may also be others.)
It's rapidly getting good enough for things like stock images for blog posts. Where you just need an image vaguely related and don't want to pay for an actual stock image.
After the novelty wears off and everyone has already seen tons of these automatically generated images, will writers still want to adorn their blogs with these generic illustrations? Or will it turn negative: "If you have nothing to say, add an AI illustration."
I could see it becoming a marker of low-brow taste where vaguely related illustrations become something akin to decorating your home with velvet paintings of Elvis.
Prompt writing is a challenging and creative endeavor like any other, and a well-written prompt/image need not be mearly "vaguely related"; Certainly not in comparison to the stock images that are already utilized today.
While I agree that they usually look pretty bad (and that the resolution is too low), some of them have been on par of what I've seen in video games. Stellaris in particular comes to mind (my memory might not serve right, haven't played the game since release). So I think what we're already have is good enough for some use cases.
SD is so popular these days. It has been on the top of GitHub trending for a long time, last hour, last day, and even last week. https://ossinsight.io/#trending-repos
Is it because it represents that AI can do things that humans thought would never be replaced by AI before?
This doesn't strictly have to do with stable diffusion, but I really really hope that someone eventually creates a service that hosts live interfaces for all of the legacy image generation models. The turn around on these things is wild, and I worry that unique image generation techniques will become antiquated and forgotten.
In case with SD, you can download every version yourself. The main code is in git (you can go back to any point), and weights are versioned, I have every version saved, for example.
Sure. It is probably hard to parse because I do not have solid knowledge regarding image generation AI stuff.
To restate my previous sentiment, I am quite attached to image generation with GPT-2, which is now legacy. Even though GPT-2 is not as advanced in many ways as more modern alternatives, I am artistically attached to its' shortcomings/artifacts. I really hope older image generation methods like gpt-2 remain available as a service to the general public.
I've been trying to get it to draw a picture of a man trapped inside of a light bulb. Can anyone think of a prompt that works? It draws all sorts of freaky things featuring men and light bulbs but none with the former inside the latter.
Try "a green field with sheep on it" versus "a green field with robots on it". From a human position that feels like it should be a simple juxtaposition of objects onto a background. Clearly not so for this model.
in this instance i didn't ask for an example. I was just observing that juxtaposing objects in a sentence doesnt work at all as you might reasonably expect.
I am repeating myself from another SD post here in HN [1]
"I mean, what stops us a life-imitates-art system where we have speech2img like in the Westworld ( the narative creating scenes ) ?
I guess, I hope someone reads this and will pick up this. Maybe coupled with a VR set?"
Now I really hope on this :) Imagine giving this power to kids ( they cannot write but talk! )
I'm not really sure I know how to use these tools. I tried the following prompt:
"elon musk giving donald trump a massage using pizza sauce instead of oil, in a majestic room filled with flowers and golden toilets"
...and the result was a crappy AI-generated picture of not-quite donald trump holding a terrible rendering of a pizza, and some hands sticking out of random places. There were some red flowers in the picture at least.
I realize this isn't a "serious" use case, but clearly the tech isn't doing what I'm hoping it does.
I tried "coffee beans with cartoon mouths" and it's just a picture of some coffee beans. I don't get it.
yeah i'm getting the same. pick something much simpler like:
"A cup on a plate".
And then replace cup for other objects. It generates nonsense pretty quickly. It seems most able to generate stuff close to something which already exists in a complete form. Sort of a pastiche machine.
That's the absense of reason sticking out: the "ai" mimics, but doesnt understand, so its creations are frankenstein-like bodies, almost human and for this reason more revulsing. So this "ai" represents an insane painter.
> [...] It takes a while to get a feel for what prompts work and how to phrase things. I suggest browsing https://lexica.art/ and using some prompts from there as a template.
I misspoke. I meant to say photorealistic faces. I haven't seen a good one yet from DALL-E 2 or Stable Diffusion. The systems that are custom built just for faces do a good job at photorealistic faces, though.
Not sure what you mean by "standards/frameworks". It's an art rather than a science. Some of the tips passed around smack somewhat of cargo-culting, some genuinely make a difference. Everyone has their own approach and some techniques work for one topic but not others.
> the model can be seen as a direct threat to their livelihoods.
When is somebody going to point out that this isn't Stable Diffusion’s fault, but capitalism’s? Why are we willing to stifle innovation for this instead of first making sure that everybody can live and create?
I am a contrarian by nature. Wearing my investor hat, I continue to be unimpressed with AI tools, including the latest image generation enhancements. I do find stable diffusion interesting, but I don't see the disruption. Similarly, I don't see the disruption from Github Copilot considering I run a company full of highly paid and extremely skilled developers and not one of them uses copilot.
How often have you wanted to hire a conceptual artist? I have never done so in my life. While I have visited many art museums and appreciated it, this has not affected my commercial endeavors.
I think if I was a creative writer or professional artist, the current generation of AI tools could be useful for inspiration. But the bar for quality is just so, so high in the creative industries, I am not concerned that AI will replace them.
It's September 2022 - where is my self driving car?
I think you're viewing this the wrong way entirely. Don't view AI as something unique, it's just a class of algorithms. View it in the same way that as the anti-lock brakes on your car. Did anti-lock braking systems entirely remove your need to operate the brakes? No. But its net effect was to massively reduce the number and severity of crashes.
All that these algorithms are going to do is slightly bump up the productivity of some people. But that's amazing. Productivity is basically the free money tool. If people are more productive, they get richer for doing less, and the single best thing we can do to improve people's lives is make them richer. Poor people? Make them richer and they'll be happier and healthier. Rich people? Make them richer and tax the hell out of them, make the world better for everyone. Productivity is the best tool we have for making some people richer without making others poorer.
I’m going to partially agree and partially disagree with you.
Agree that Stable Diffusion seems a bit over hyped. It is very cool! I am definitely going to play with it. I’m not sure really how much it’s going to change the world - there will be some neat apps and it may put a certain class of concept artist out of business (note I am not saying it will eliminate them - there will always always always be human artists creating wonderful art. But as a business model / way of earning a living, that might change).
I am going to disagree with you on GitHub Copilot. It has radically changed the way I do software engineering on my personal projects and it has increased the efficiency of the engineers at my company by at least 10%, maybe more. You should check it out. It eliminates an entire class of puzzle solving, and the least efficient kind (“how do I drop a column by condition in a pandas dataframe again?”). Simple answer via Google, but maybe 3-5 minutes of reading. 10 seconds via Copilot.
I can think of two good businesses you could build with SD right now. Neither of them are text to image models or necessarily "art", both involve using it to visualize other kinds of sensing by transforming them to image embeddings.
Moravec's paradox is actually an observation only relevant to 80's state of robotics and AI. I don't think it holds true, and low-level stuff does absolutely not cost more than high-level stuff. What holds full self driving back is the vast amount of rare edge cases that can't be dealt with by current methods. (largely because the system is only trained on available data and doesn't understand first principles)
> I think if I was a creative writer or professional artist, the current generation of AI tools could be useful for inspiration. But the bar for quality is just so, so high in the creative industries, I am not concerned that AI will replace them.
I don't think AI will replace professional artists, but instead supplement them. Creation of Photoshop, Wacom tablets, 3d modelling and animation tools didn't put professional artists out of their jobs, it gave them new tools to create even better stuff in less time, and I think this is what's happening here too.
What affect will it have on kids? Every kid can now just stick figure draw into painting or else ask Siri "draw me a picture of Trogdor holding up a sword" etc... I can imagine that having some kind of large influence on creativity (not sure negative or positive)
> we grabbed the data for over 12 million images used to train Stable Diffusion, and used his Datasette project to make a data browser for you to explore and search it yourself.
> Read on to learn about how this dataset was collected, the websites it most frequently pulled images from, and the artists, famous faces, and fictional characters most frequently found in the data.
LAION collected the images used to train Stable Diffusion:
> All of LAION’s image datasets are built off of Common Crawl, a nonprofit that scrapes billions of webpages monthly and releases them as massive datasets. LAION collected all HTML image tags that had alt-text attributes, classified the resulting 5 billion image-pairs based on their language, and then filtered the results into separate datasets using their resolution, a predicted likelihood of having a watermark, and their predicted “aesthetic” score (i.e. subjective visual quality).
Probably because:
1. You don't make any actual point that could lead to a constructive discussion.
2. What you are saying is not true: this model was developed in an university in Germany and sponsored by a private company. The only public founder I could find is this one: https://www.crunchbase.com/person/emad-mostaque
Perhaps you should reevaluate your own biases as well.
You are exaggerating. HN is fairly broad in it's politics and there's often quite challenging discussions around topics related to diversity. Whilst there is a tendency towards "Silicon Valley neo rationalist" and other voices are sometimes in the minority, you using a term like "suppressed" comes across as slightly overdramatic.
> you using a term like "suppressed" comes across as slightly overdramatic.
I don't think so, because the person I was responding to was flagged. Abusing the flagging functionality to silence difficult opinions about race is not uncommon on this website
When witnessing a huge leap forward in human technological achievement, doesn't it feel a little strange to respond with "looks like crap"?
Even on your own terms, you're engaging in hyperbole. I've looked though thousands of these images and the harshest I can come up with is "assuming the result isn't obviously broken (i.e. some generated anatomy) then the results are sometimes a little bit uncanny or 'off'")
It seems like AI image generation is close, but still slightly "off" as you say. To me it remains to be seen whether that's actually a huge achievement, or whether there's more work to be done to achieve that last 10% than has been done so far. The poster you're replying to used gaming as an example, and you only have to look there to see how graphical improvements in the last 10 years have diminishing returns versus earlier improvements.
I could be wrong but this seems like an area where our progress will look like an S-curve. Getting that last 10% could be the real achievement.
Stabile diffusion is most of all a waste of resources! 20 billion parameters trained, using I dont know how many CPU hours, all for making better artistic images than what an amateur would do and on par with professionals. Art is nice, but not important for survival, contrary to smartly using the resources of the planet. Do you want to see a future way to live on or "the rest of Mona Lisa"?
But two things I've noticed:
First, artists will still have a massive advantage over non-artists with this tool. A photographer who intimately knows the different lenses and cameras and industry terms will get to a representation of their idea much faster than someone without that experience. Without that depth of knowledge, someone might have to rely instead on random luck to create what's in their head. Art curators might be well-positioned here since having a wide breadth of knowledge and point of reference is their advantage.
Second, we need the ability to persist a design. If I create a character using SD, I need to be able to persist that character across different scenarios, poses, emotions, lighting, etc. Based on what I know about the methods SD/Midjourney/Dall-E are using, I'm not sure how easy this will be to implement, or if it's even possible at all. There will always be subtle differences and that's where being an artist who can use SD for inspiration instead of merely creation will retain their advantage over a non-artist.
That said, holy crap. This tech is insane.