Doesn't make sense to me why OpenAI has kept DALL-E closed source for so long. I can only guess either safety from misuse or leveraging it for money. At this rate though, Stable Diffusion is going to dwarf it
I still don't understand what this would mean. Where are all of the terrible things that were supposed to happen, now that Stable Diffusion is available?
We've been able to create completely photorealistic fiction for decades now. See any movie with CGI for an example of whole worlds, and people, that don't exist. The bar has gradually been lowering (see the amazing CGI that YouTubers do these days), and now maybe there is a bit of a step function down, but being able to make things that aren't real isn't remotely new. I don't understand the fear.
While I think the fear might be overexaggerated, being able to make realistic fake content with such ease means it's harder to know what's true and what's not. Plus this has been the claim of OpenAI from the beginning. It's possible the true objective is to keep it private to leverage for money and this is just their excuse.
Indeed, this talk OpenAI does is basically security through obscurity, and it's holding us back. Look at how often people make noise with screenshots of tweets or emails that never happened. You don't need photorealism or fancy machine learning for that, and it creates a lot of problems! If they weren't pretending that all we need is to put some yellow tape around machine learning, maybe there would be some interest in solving this type of stuff properly. But you don't need "AI" for that. You just need public awareness and some basic, pre-existing cryptography knowledge.
Well sure, I think that's dangerous too. I think more people should be skeptical of the images and content they consume in addition to it being a problem that truth is harder to discern.
Like every new technology that came before, it’s a tool that will be used for good and for nefarious purposes. Some tools are powertools. There will be new benefits, but be prepared for a new raft of modern problems.
We are already living at a gatling gun level of content, from any individuals perspective into the web.
We are way past that. If anything, trivialising and democratising the generation of fake content, will educate the masses. Perhaps, bring back the trust for science, which we need a lot.
My grandmother once emailed my family frantically after she saw a picture of the Abraham Lincoln statue defaced with graffiti. Obviously that was a Photoshop, and in this case, even a bad one, but clearly fake images and content make it harder to discern truth
> I still don't understand what this would mean. Where are all of the terrible things that were supposed to happen, now that Stable Diffusion is available?
Mainly people making porn (e.g. stuff like deepnudes). It seems like a lot of work has gone into into preventing that (e.g. filtering porn out of training data, having porn-detection models to block porny output). There's also been a lot of talk about political fakes, etc, but I'm not sure how likely that is to actually happen at this point. I think one of the "selling points" of limiting access to DALL*E was that they could revoke access to people who they deemed to be misusing it.
Stable Diffusion allows anyone to make kiddie porn with a half-second of curiosity/effort. Maybe you didn't know about that, maybe you think it's NBD, but in any case, that is the tire fire which aspiring AI majors want to avoid.
Actually, anyone can't draw anything, especially the closer you get to photorealism. Also drawing by humans cannot be done in parallel very well. With an AI you can get an unending stream of pictures.
It’s been a week, there is going to be an explosion of believable fake items that are going to be used to lure people in to even more unbelievable conspiracy theories than there currently exist. Your average conspiracy nut didn’t have the skills or know how before, but they sure do now.
Also you’re probably not seeing all the pedo content that people are already generating for themselves.
I think that's a good thing. A slow creep is dangerous, because people may not notice, and would be tricked. An explosion of insanity makes everyone realize they can't believe everything they see on the internet.
It might end up having the opposite effect. When all sides of an argument can bomb each other with AI generated content it could poison the well and make people less likely to believe things they see.
> Doesn't make sense to me why OpenAI has kept DALL-E closed source for so long. I can only guess either safety from misuse
Paternalistic moralizing as a method to discriminate who gets access to models. Everyone else gets these cloud-service table scraps. That's why Stable Diffusion is so awesome -- YOU have the model!
It's sort of both. OpenAI, being an outgrowth of the AI doomerist community, does have a bunch of people who really do think the technology is too dangerous to be given to the masses. This happens to mesh perfectly with the other group of people at OpenAI who want to make tons of revenue. It's a harmonious alignment for everyone! Except, y'know, us.
Content creators, like artists, also happen to hate filters. They do not want to have San Francisco VC culture induced political correctness imposed on their work. This helps Stable Diffuse to quickly gain popularity.
Elites for democracy! With their elite studies and abilities they will democratise AI by teaching it what is right and wrong. They already know better than regular people and AI.
And being so open they first lock the model up and charge a fee, so anyone can pay. Just spreading democracy through paid API calls. /s
I was a bit mean, they did kick the field in the butt and pushed us ahead even with all the stubbornness and secrecy. But now they are just holding us back.
That's the thing, once the cat is out of the bag, it's out. Once someone develops AGI, it now exists. You can choose to either share it, or sell it.
You might think that the nuclear bomb is a good analogy to use here, but it is not, because once the field has advanced to the point in which one group can develop AGI, it is now possible for other groups to develop it with relative ease, unless you actively take over the world first and deny those other groups the compute resources necessary to train/run AGI.
The point is, once these algorithms are upon us, you must be willing to accept what impacts they will have, even if it destroys entire industries. The alternative being that you destroy the industry slowly rather than quickly, while simultaneously widening the gap between the elites and everyone else.
The mistake is thinking that people can't adapt to the times, which is only true if you are actively holding them back.
If someone developed AGI today, the best thing to do would be to instantly throw up a torrent of it and spread it as fast as possible, because if a sole entity is able to get it first and kick the ladder away, we are most likely screwed.
I've never understood what exactly the "open" in OpenAI is supposed to imply. They produce proprietary, gated models - not open in any meaningful sense.
OpenAI isn't open at all, it's just named that way to attract attention, like the bright green "FREE BIKES and rentals" place near fisherman's warf in SF
I don't follow this stuff very closely - is there any open-source model for text generation that outclasses GPT-3? Stable Diffusion has been released for barely a week and already seems like the clear winner. It doesn't seem like any of the open (actually open) text models have made as much of a splash.
Of course maybe it's just because text is less visually impressive than images.
There are some open models as good as initial GPT-3 (which wasn't hard), but whatever they did to create InstructGPT hasn't been reproduced as far as I know, and it's the first one to really seem magical.
They’re just harder to run on your own resources, since large language models are very large. BLOOM was released a month ago, is likely better than GPT-3 in quality, and requires 8 A100s for inference, which pretty much no one has on their desk.
GPT-3 was fine-tuned after release to be better at following instructions. I don’t think that’s been done for BLOOM.
BLOOM incorporates some new ideas like ALiBi which might make it better in a more general sense. They haven’t released official evaluation numbers yet though so we’ll have to see.
My headcanon is they realized this stuff might be the essence of consciousness itself and wanted to shelter it in a persistent storage medium where it could grow and learn safely instead of releasing it to the wild to be booted up and destroyed by every yokel with a gpu
It was a long gap between DALL-E 1 and 2, a whole year. In that time they just sat on it, didn't release anything. Such a bummer. My theory is that they wanted to hype everyone up even more for the grand commercial release.
Funny thing is that people didn't stand still and invented diffusion and other CLIP guided image synthesis methods, and DALL-E 2 copied the method, completely changing from the first architecture.
Their arrogance is that they think they can ride the dragon. They want to be the ones to discover, advance it, and control it. But everyone else doesn't have time for that shit.
The fact that any of this stuff works is so surprising that none of it could possibly have been planned ahead on anyone's part. StableDiffusion has some real novel research of its own in there; the methods OpenAI and Google used couldn't produce a model that trains or runs as fast as SD.
I think they wanted to keep public attention on this technology “positive” for as long as possible by curating the art that becomes public. Long enough to position themselves profitably. Eventually this technology will serve as a window into the minds of teenage boy gamers and 4channers and the general public will be disgusted.
If you checked out the Photoshop extension inpainting that also used Stable Diffusion. I figured this Krita extension was just using a lower quality setting or something, since the Stable Diffusion Photoshop results looked much better. Maybe it could come down to the amount of VRAM in the machine the demos were recorded on, or something.
That doesn't appear to be the same thing. That plugin looks like it's for generating AI images directly in photoshop, rather than providing an image and it automatically generates the area outside the image.
sama is very active in YC and makes the call on OpenAI product roadmap. Furthermore YC encourages good CEO-community relations. The fact that OpenAI is so far behind Stable Diffusion and has reduced pricing shows that sama wants OpenAI to be a highly profitable enterprise company. I.e. not “Open.” You can do both (e.g. Cloudera) but clearly sama is not strong enough at AI to make this happen.
> sama is very active in YC and makes the call on OpenAI product roadmap. Furthermore YC encourages good CEO-community relations
Sam hasn't been at YC in years and (based on anything I've seen) isn't active in YC at all. As for "YC encourages good CEO-community relations", I have no idea what that means* but it has nothing to do with HN. We encourage good content-community relations and that's it.
You have a long history of posting dark insinuations about YC/HN, not to mention nagging the mods about how bad we are and how much better you yourself have done the job in the past. I mostly let the latter go, but when you start with the ethical insinuations, that gets my dander up. It's time you stopped smearing people's reputations on HN. If you have evidence of wrongdoing, post it—I'm sure the community will be extremely interested. If you have no evidence, please stop from now on.
(Edit: I realize it probably sounds like I'm over-reacting to the parent comment, but this has been a longstanding pattern. We can cut people slack for years, but not infinitely.)
OpenAI stuff and Stable Diffusion stuff (and DeepMind stuff for that matter) are all popular on HN because the community is super interested—that's literally it. We're not pulling strings or playing favorites (we don't even have favorites in that horserace, at least I don't). As a matter of fact, the last thing I did before randomly running across your comment was downweight the current thread because of the complaints at https://news.ycombinator.com/item?id=32665587.
* unless you mean that we advise founders about how to write content that actually interests the community—that we do, and not only YC founders but non-YC founders, open source programmers, bloggers, and anyone else. That's all a consequence of wanting HN to have good content and seeking to avoid the boring stuff. By the way, I'm working on an essay about how to write good for HN and avoid boring stuff too; if anyone would like to read it, email me at hn@ycombinator.com and I'll send you a copy.
I kinda feel like they chose the name "open"ai when they started back in 2015 because musk etc wanted exactly the kind of thing stability ai is now creating. I.e. something other than a corporation like google having primary access to these models, and it being more democratized. But as time as gone by they've strayed away from that vision but changing the name would be a PR nightmare.
This is actually an interesting illustration of how infectious a name can be.
If they hadn't called it OpenAI, the conversion from open to corporate would be much easier, where's now it's a significant pain point in OpenAI's reputation. It's kinda nice to see the original vision still somewhat having an effect despite no one left to propagate it.
You don't have to drive SD with a text prompt; if you take the model weights you can feed anything you want in there and generate an image out of it.
Since DALLEmini and DALLE2 are more "creative" (since they use a better text transformer) you can use them to generate the input and SD to refine it for more fine detail.
So... are we done politely coughing and looking out the window at the idea that the gatekeeping was motivated by altruism so that we can move on and just use this much better innovation model going forward?
Various (subjectively judged) SOTAs on at least some subset of at least this family of tasks is changing somewhere between daily and hourly right now. I've been watching this stuff closely since fairly early ImageNet days and I've never seen a Cambrian explosion of "how the hell did that do that?" events at anything like this cadence.
If you're talking about scientific gatekeeping like pre-publication peer review, the gatekeeping was never really the root problem.
If researchers have great products or findings to show off, it's easier than ever to simply publish them somewhere online and let impressed audiences spread the word. Gatekeepers have been irrelevant to truly great science for a long time.
It's mediocre science that needs gatekeepers to distinguish it from not-even-mediocre, truly substandard research.
The gatekeeping GP is referring to is OpenAPI keeping DALL-E 2 behind an invite-only API, with weights unpublished, vs. Stable Diffusion publishing the whole model for anyone to download and use.
Feels like a race to the bottom. More features, lower cost, every week. No idea where it’ll level out, but I like it. Just bought some more Dalle credits today because it’s so much fun. This is a revolution in ‘art technology’ it’s like Steve Job’s bicycle for the mind. Best I could do a month ago was a stick figure in MS Paint, but now..
Stable Diffusion is arguably better, has more features, and is free. OpenAI can't compete with free.
Even if you don't want to take the 30 seconds to set it up in a free Google Colab environment, the paid DreamStudio version is still half the price of Dalle.
I find Stable Diffusion better overall, but it has downsides. Stable Diffusion tends to be more creative than DALL-E, but does a lousy job of following directions, especially complex ones. DALL-E is good if I know what I want specifically.
I can think of ways to fix Stable Diffusion since it's open-source. I think I could bridge the gaps as I see them in about a weekend of hacking. I'm not sure when I'll get that weekend.
(Footnote: What I want to do is not something I can explain without a technical blog-post-length document or a zoom call; it's about the same level of complexity as the other major SD hacks we've seen)
Raising the cfg ("classifier-free guidance") scale is essential for following the prompt, but if you raise it too high the image gets weird and saturated.
According to Google's Imagen paper this is literally because the pixels get multiplied by the cfg scale and start clipping; they have a technique called dynamic thresholding that replaces it. Not sure if SD uses this, but I saw Emad hinting they were training an Imagen model…
I don't think Stable Diffusion is technologically better yet.
Sure, both SD and Midjourney produce absolutely beautiful artworks most of the time. But if you want something specific and out of the ordinary it takes a lot of attempts and promptcrafting (and sometimes you are unable to accomplish what you want at all).
However, my experience is that these prompts (which SD/MJ struggles with) often produce good results in Dalle2 even on the first try.
Of course, OpenAI has very limiting content policy. But if I have something very specific in mind and it passes their rules I currently chose Dalle-2. Even though I've spent much more time with SD.
Note that Dalle2 uses CLIP guidance while SD doesn't, it is a feature that will be added soon and can already be added by users if they want although im not sure how easy it is. MidJourney has already shown theyve implemented it in their beta using stable diffusion and it make the results 200% better trust me
After many months of waiting on my invite I got it and I entered the prompt which is my greatest fear for some reason "a red eyed hairy spider with human hands as feet" I got a warning about violating policy/harmful content etc or something. Not only that, the results I got were super underwhelming, after playing with it for a half hour I haven't looked back. Now playing with SD and an upscaler, there is no limit to what I can create. Also I always found it funny the company name hilarious. "Open"AI.
There's still a NSFW filter (at least, in the version I used). I'm sure it's easy enough to disable if you poke around but it's not exposed as a function parameter out-of-the-box.
I didn't investigate any further because I'm not actively trying to generate porn. It was pretty annoying to have my results blanked out because they were (apparently) NSFW - so either the filter was triggering a lot of false positives or the model was generating NSFW content for non-NSFW prompts.
Seconded. I got awesome results in making "artwork in the style of Yoshitaka Amano" in DALLE but horrible ones in Stable Diffusion. Maybe the prompt was incorrect there (it would be great if these were more discoverable), but they art in SD was lacking.
There was a good example somewhere I can’t find, but of a really complex prompt that Dalle could understand, but SD couldn’t. Maybe some of the GPT-3 is being leveraged for parsing.
Anyways I think it’s way too early to start taking sides. I enjoy using all these system.
One of SDs big limitations (understanding from what I had read about it) is positional prompts. dall-e seems to understand x on top of Y, but simple diffusion does not.
Running this at home is only free like mining cryptocurrency is free if you didn't buy your computer and don't pay for the electricity. Plus you can only run it on the computer that has the good graphics card, which probably isn't your laptop.
I expect most people aren't going to be generating images all day, so using a cloud-based service for occasional use will still make a lot of sense.
Stable Diffusion offers a paid service to do this too, and there's nothing wrong with that business model. Prices will probably come down, though.
Not sure if GP had this in mind, but SD is (more) free in terms of liberty. So yes, you pay with electricity and hardware, but you control the process yourself, which is invaluable. DALL-E could change or go offline at any time.
Considering the threat from DALL-E going offline, it seems quite acceptable. These aren't precious photos since it's all made up anyway, you can download any pictures you make, and you probably already did for the ones you care about.
I'd worry more about, say, keeping your photos on Google and losing your account somehow.
It's not only the threat of going offline. DALL-E makes it extremely difficult to generate many ideas because of its absurd content blocker - for example, I had something like "ominous, foreboding landscape beneath a black sun" blocked because (from what I could tell) it has words with negative connotations and the word "black" in the same sentence. It does this all the time, their discord is full of examples.
That's pretty good, but with that level of latency, I can still see people paying to use an online service that's faster. Maybe they'll speed it up more, though?
Race to the bottom implies that they're only competing on price. Here, they're competing on new functionality as well. If DALL-E's outputs were substantially better than Stable Diffusion, more people would use it, even if it cost more.
It often feels like words are losing their meaning, with everyone misusing terms they don't fully understand.
I don't want to be a doomer and have have surely unknowingly misused terms as well, but its definitely noticable how these originally clearly defined terms are getting used in entirely new ways.
And it's not just with technical terms like this, it also applies to originally obvious terms such as racism, sexism etc which have lost their original meaning entirely
I can understand the criticism about technical terms (they work better if stable and precise), but regarding the rest: that's just how language works. You can't (and shouldn't) expect words to keep their original meaning forever.
For example, the word "term" comes from the original latin "terminus" that means "end" or "boundary". It only got the meaning you used it for centuries after it was first used in English. See: https://www.etymonline.com/word/term
Oh, it wasn't my intention to criticize anything or anyone in particular with that comment.
I was just pondering that our originally clearly defined terms are rapidly getting used in very confusing manner, which increases the difficulty of a discussion, as participants interpret words very differently.
I dont think that people look up the actual definition of terms in a thesaurus anymore. They hear it in some context and create their own personal definition. It wasn't as obvious before the internet i think, but nowadays everyone is bombarded with technical terms all the time, which likely contributes massively to this increasingly fluid terminology
There is generally a negative connotation to race to the bottom. The Investopedia definition captures this:
The race to the bottom refers to a competitive situation where a company, state, or nation attempts to undercut the competition's prices by sacrificing quality standards or worker safety (often defying regulation), or reducing labor costs.
Thanks. Yes this is what I'm trying to say - a race to the bottom is about companies seeing how crap of a service they can give you that you're still willing (or have little choice) to pay for.
This appears to the the exact opposite, a race to provide more services and more features for a lower price, based on optimising and/or lower profit margins. AKA Capitalism actually working for a change.
Price would have gone up if SD wasn't open source, look at the new google collab pro limitations and you have indications that they're loving this new wave for milking it properly, I just ordered a GPU to run on local.
I feel like you aren't using the phrase "race to the bottom" correctly here. Generally a race to the bottom implies some kind of detrimental outcome for the world as a result of people failing to internalize externalities generated by a business.
It has to do with commoditization and decreasing costs. Taking something technologically sophisticated and having it become open source and accessible so quickly is going from the top of the pyramid - big companies gate keeping betas, to the bottom - the public, available to everyone, cheaply. These companies are desperately trying to monetize this technology, but the value in terms of what people will pay is falling fast. It might not be a sustainable business model for OpenAI or anyone else for very long. Hence the race to the bottom - quickly make a buck before you can’t.
I share your enthusiasm for this development but curious what you mean by race to the bottom?
There does seem to be a lot of vague angst about how this will affect the nascent "Prompt Engineer" career track, but I hope most are comfortable letting the open innovation play out a bit before trying to personally monetize it..
In this context it's a good race. This software seems to have caught fire and tons of people are playing with it and providing tons of crazy new tools for cheap or free.
>Best I could do a month ago was a stick figure in MS Paint
You're forgetting (or not knowinng?) NVIDIA Canvas that came out one year ago, give or take. Literally turns stick figure complexity drawing into photorealistic stuff.
> Best I could do a month ago was a stick figure in MS Paint
That is still the best you can do... which happens to be about the best I can do! Just like my introduction to the computer at a young age has atrophied my handwriting quality.
I guess if we’re going to get into semantics and the definition of self, where does the ‘I’ end and something else begin then I don’t really do anything. You could also say I can’t walk either without the ground.
I was just being needlesssly pedantic. I guess it's a spectrum from "I painted the mona lisa" to "I pushed a button and a mona lisa appeared". That's a very individualistic view also though, maybe the thousands of programmers than commited to the thread of history that arrived at you pressing the button are part of the art performance.
It does feel like art's disruptive "Calculator moment" is happening where you can now leave a lot of basic/mechanical tasks to a tool and give more focus to higher-minded problems.
It's going to get so cool and interesting, I think.
A lot of the conversation around art may focus more on composition and objectives of the artist in the new prompt engineering world, with less bias from factors such as rendition quality etc. creeping in since it's so incidental.
New forms of art will emerge and/or gain popularity that focus on trying things the tools aren't good at yet. The human artist of the gaps. The niches will constantly be shifting.
I wonder if we'll learn to recognize the output of certain popular models and perceive them as instruments. "Made by xy on z" instead of "xz on guitar", so to speak. I remember the 90s/early 00s internet when it was always easy to tell when something had been done on Flash, just because of its line anti-aliasing rendition style being so distinct and familiar.
The novelty will wear off, and we'll all start to feel a bit disappointed that the average human's imagination is pretty limited and novel/original ideas remain somewhat rare as the patterns and tropes in all the generated art emerge. It's great you can put the space needle where you want it and get a good-looking city and space ship, but how many variations of a cyberpunky skyline with a space ship do you need? And then we'll celebrate the novel stuff that does happen, as always. I suppose the tropes will evolve faster as the throughput goes up.
>basic/mechanical tasks to a tool and give more focus to higher-minded problems.
>rendition quality... [is] so incidental.
There's this thing in painting called 'mark making' and it can be the difference between an all-time-great painting and a throwaway portrait. Mark making speaks to every momentary choice of physical process a painter employs and reveals their thought process. For some of the greatest painters, it reveals their genius.
Do not discount execution. Overlooking "basics" and "mechanics" is what results in disappointing work.
It's a fair point, and thanks for teaching me a new term!
There's a lovely documentary called "Tim's Vermeer" about Tim Jenison's - one of the founders of NewTek, the people behind Video Toaster and LightWave, incidentally both tools that made hard visual art tasks accessible to wider audiences - hobby side project to prove that Vermeer used sophisticated optical tools to capture and copy his scenes from physical sets, rather than e.g. paint his famous grasp on lighting purely from his own mind. He builds such tools himself and then proceeds to successfully create his own Vermeer-alike painting, despite possessing very artistic skill himself.
It's full of good ruminations (and good at sparking more) on tools-vs-artistry but also execution-vs-method, and whether designing and adopting innovative tools and the tedious process to use them made Vermeer less of a genius, or just a genius of a different kind than otherwise presumed.
It's very accessible and doesn't require knowing anything in particular from the art world.
Tim's Vermeer is kind of bad in my opinion. A lot of the musing border on misinformation. If you're not a painter, it sounds great, but if you have some training, it's a very frustrating doc. The resulting painting is neat, but it was immediately obvious (at least to my eyes) how different his result was from Vermeer's.
Hockney, one of the featured 'expert painters' is a hack who doesn't actually know how to paint* and therefore claims that certain gradations are certainly impossible without some sort of additional lens device. Meanwhile there are 19 year olds at the Grand Central Atelier pulling off just that.
I own a camera lucida. It got in the way more than it helped. At best, it's a novelty, now collecting lots of dust. Vermeer probably had one but it's altogether way more likely that he just had a well-trained observational/representational faculty. There are some killer painters using cameras now (Will St. John for one) but they typically have a decade or more of very rigorous direct observation to lean on.
I suggest following Ramon Alex Hurtado. IMO he's one of the more exciting young scholars on historical representational painting techniques. I don't think he has written anything himself yet, but he does do workshops and has a big informational update for his website coming.
*the definition of painting is now so broad it is meaningless. Here, I mean "attempting some degree of visual accuracy" which can be achieved in endless creative ways. Compare (easily on instagram):
- Colleen Barry
- Peder Mørk Mønsted
- Cecelia Beaux
- Jas Knight
- Felicia Forte
- Eric Johnson
- Ksenya Istomena
- Sergei Danchev
- Glenn Dean
- Blair Atherholt
- Jose Lopez Vegara
- Hongnian Zhang
- Hans Baluschek
These artists all have their own voice and stylistic choices. They also all represent things they see with some sincere accuracy. Look up Hockney's ipad paintings he got lauded for. People treat them like they're some misunderstood genius, but really, they're just bad paintings.
I'm sure he's a sweet old man and I'd drink tea with him. But if it weren't for his ilk I might have found proper instruction 10 years earlier in life. Modernists and postmodernists robbed generations of proper art instruction. Imagine if all the music teachers burned all the sheet music and refused (or forgot how) to teach the diatonic scale. "Hit the keys in a new way! Don't let yourself be bound by conformist ideals!"
> Do not discount execution. Overlooking "basics" and "mechanics" is what results in disappointing work.
That is what really bothered me in art lessons in high school. When discussing any famous work it was always about concepts, ideas, composition,... and execution was very much secondary. But for your own work all that is completely ignored if your coloring is just slightly uneven or lines are too rough. If you could hand in a photorealistic drawing of anything, no matter how boring, that would give you much higher marks than a rough drawing of something worthwhile.
It's not as if generative art is new. Nor is figurative painting relevant anymore since the invention of the camera. A basic Burger joint in Gerhard Richter kind of style transfer is very much derivative. This isn't bad in view of the classics, but it's more like art-work to me.
The true artists in this one are the coders, no doubt (corrolar to the inteligence debate).
On the other hand, you mention an important point with layout but you underestimate the progress these days. Surely there are companies who are working on automated design beyond CAD (computer aided design), eg. for specialized antenna.
> we'll all start to feel a bit disappointed that the average human's imagination is pretty limited and novel/original ideas remain somewhat rare as the patterns and tropes in all the generated art emerge
Well, one might argue that Richter's most highly priced piece looks a little like prehistoric art of the pleistocene. It's a little vain to mention it, because I can much better relate to the more basic form, of course. A more frequently sore point would be the pop music industry between professionals and the amateurish.
Anyway, this may be thinking too big. For the time being, the bunch of techniques is better understood as a toolbox, because it will be a long time before it trumps demo-scene productions, for instance. Here it is the technique that counts more often than not. The rest is an acquired taste.
So DALL-E is already old news and the Stable Diffusion ecosystem is once again already ahead especially with this announcement.
Quite funny to see OpenAI panicking and falling on their own sword, as they were supposed to be 'Open' in the first place and are now being disrupted by open source.
Couldn't happen to a more deserving group of people. Good riddance. Squatting the name "open" and trying to reap the benefits therein while being anything but.
The parent may be non-profit, but OpenAI LP accepts investments and delivers returns to investors like any other regular company. The only difference is that they 'cap' the returns. However, the cap is negotiated with individuals investor and I haven't seen anything disclosing the cap except for the fact that in the opening round the cap was 100x the initial investment.
I'm not a business investor so I can't really tell if x100 is "a lot" or not — Apple and Amazon share prices went up by x500 in the last 20 years, Tesla x200 in the last 10. But how rare is that?
Their research is closed -- they don't release model weights, nor in most cases the training or model scripts. Certain things they release, just like any other for-profit research firm.
I don't see what it has to do with profit since this is pretty normal in academia too. Scientists will often publish papers, but not everything they do.
Open means something. It is a, for lack of a better phrase, virtue signal. When you do that but don't actually represent the virtue you are trying to signal, people will understandably get pretty upset about that.
The notion of open in software and in academia is different, that for sure. But does that make one or the other wrong? My friends doing physics or math don't understand the hate at all when I discuss it with them. From their point of view, you don't have to pay hundreds of dollars to read it = it's open. The research is there, code is left as exercise to the reader - that's also normal to them... And the notion of giving away free compute time is funny to them. Run your own cluster, they say.
I'm not saying I agree with them, I'm a software person and open means to me what it means to you. I'd prefer if it was truly open by our definition. But I don't think we can bash the choice of the word so easily.
Well, I can't speak for anyone else, but it makes it wrong for me (and judging by the number of upvotes I got from the original comment, a lot more than just me).
It's not just the code left to an exercise to the reader...it's the training sets. You don't get to trade on the suggestion of "open" while keeping everything closed. They aren't idiots, they knew exactly what they were implying when they picked the name.
Botany-related articles don't ship together with the discussed plants... If you want to repeat some experiments, you're looking at a hardcore mountain trek at the other side of the world. Is this too different?
What if the training sets are proprietary (as in, they don't have licence to share)? Should they keep the research to themselves just because of that? I don't think that's better than not sharing the training set. It also doesn't mean the research is invalidated - find your own pictures and it's going to work. Same as - find your own plants and it's going to work.
TBH, I just don't see the training set as part of the research... In my case, I'd feed it tons of electronic circuits to try to teach it generate some. Why should I care about some random other pictures? I care about the research and I have my own training sets.
This is just nonsense. I pay (a small amount) for both. They have different strengths and it’s fun to compare. Adding new features to a product is not a sign of panic, it’s just normal.
Dall-E so far hasn't been able to grow an ecosystem because of how restricted it is. Meanwhile Stable Diffusion makes trial-and-error and innovation around it easy, and as a result only 9 days after Stable Diffusion's release we see OpenAI release a feature that looks like a copy of a tool from the Stable Diffusion ecosystem.
I agree that Dall-E isn't obsolete. I'd also add MidJourney to that list. All three are great models in their own right with their own pronounced strengths and weaknesses. But when it comes to enabling novel workflows Stable Diffusion seems lightyears ahead of the others.
Except you are wrong. This feature was already available as part of the Dall-e ecosystem. There was a website called patch-e which facilitated this exact same workflow.
Also its quite funny to see OpenAI (with all their researchers and engineers) get disrupted by someone with little to no background (Emad) in AI and ML but who embraced OpenAI's original mission about making AI as open as possible.
I came to say something similar. It feels like “OpenAI” was just a trademark grab to prevent others from using it. Of course, all conspiracy theories work well when looking backward in time.
This was already possible with DALL-E using the inpainting feature going from defined image to transparent edge; this just automates what was a manual process before. Do wish the inpainting tool had more options, for example to fade a transparency in, since my understanding is it makes a difference; not to mention magic wand selection/deselection tool.
In case it is not obvious, every time a user generates an additional section of an image using the outpainting feature, it costs a credit.
Yes indeed, and it shows the advantages of Stable Diffusion's model of just releasing the model and letting people do what they want with it - this was straightforward to implement oneself.
And while OpenAI released this feature now, it's probably just a matter of days until even better features built on Stable Diffusion will be released, given how much community energy is focussed on it right now.
Only matter of time before Adobe adds inpainting with hooks to local or API generative tools, using OpenAI to edit works like this is like transporting back to past using basic image editing tools.
Temporal coherence will still take a while to solve but it's not undoable. Making things that look correct upon closer inspection rather than just looking "nice" will probably take some degree of human curation for quite a while.
Someone has been upscaling DS9 already[1]. Obviously not release anywhere.
Not sure I'd want them in 16:9, hd 4:3 like the other HD releases of TNG and TOS would do me. I understand they shot on video so an official true HD remaster is likely to never happen.
I don't have a deep understanding of how training models work, but I wonder if training a model with every frame of TNG and then outpaint it into 16:9 would work.
Why would you do that? What an awful idea. People made those shows in the 4:3 format, you’d just be adding fluff. This is like adding more description to a book so it becomes an epic novel instead of a novella. I’d say keep to the creators intent…
I'm not the person who suggested it, but I wouldn't mind having it fill my (wide)screen when watching. That said, I understand that some film/tv uses the frame very precisely, however I'm not sure that these two particular examples do that throughout their entire episodes. (Though I bet that in Seinfeld in particular it might weaken/ruin a few visual gags.)
Still seems to me like adding fluff - it seems a bit impossible to me that the AÍ would add anything pertinent to the plot. It would add “stuff” like corridors and background sets and maybe someone out of focus.
Do the black bars actually bother you that much? You know there are cropped 16:9 widescreen versions of some of these shows (which I personally detest, but I work in the business of moving images).
It doesn't bother me much (if the show is good, I don't realize it after maybe 5 minutes), hence why I said "I wouldn't mind..." instead of something harder. But it'd be interesting to see what would happen if shows were expanded frame-wise like this... The 16:9 cropped widescreens you mentioned take something away, whereas expanding the frame with AI adds something, and theoretically if you don't want it you can just matte it out and still have the original. It seems like a more advanced version of Ambilights.
I think we still have a ways to go before the results would work well without looking like those nightmarish deepdream videos where things are constantly shifting, so society will have plenty of time to discuss the merits.
I'm inclined to agree, but if it were coherent I'd take it over what platforms like netflix do and chop off the tops and bottom of the content so it'll fit 16:9.
What if there is critical visual content in the half of the view that you're removing. Television is a visual medium, one could assume that a good filmmaker would be using the full viewport afforded them.
Not as terrible sometimes, since often television shows and films “protect” for different aspect ratios (yes this is a thing). So people might shoot 16:9 and protect (basically make sure you have a readable frame) for 4:3 or you can shoot Scope and protect for 16:9.
It’s not perfect but if the filmmakers thought of it it can be ok.
Damn, Dall-E really lost its competitive edge overnight when Stable Diffusion was released. They dropped their prices across the board in response, but honestly I think it still isn't enough to save them. The magic of open-source competition.
Oh shoot, I thought it was for everything. Fair enough. Although I think new competitive features like Outpainting are definitely in response to Stable Diff
The UX is evolving around AI image generation so fast, everyday is something new. There's so much greenfield exploration space for new interaction models.
6 months from now, how we interact with these models will probably look entirely different.
Comments made just 3 days ago, "well...it can't do that", are already obsolete. I've never seen innovation at this breakneck speed. We're talking a WEEK since release.
A few weeks ago I was skeptical that this technology would get past the emotional response we get from procedurally generated game environments, but I've been convinced otherwise. The emotional response I get from some of the best of these images are novel and thought provoking. Makes me wonder what percent of what makes us human is now algorithmically solved....
Not to be pedantic, but we have on the order of 100B neurons, and afaik each of them can be connected to thousands of other neurons. I assume we probably have a ways to go before we're encoding the amount of information a brain can comprehend.
My wife likes impressionists and sunflowers. "A lone sunflower in a grassy field at sunset oil painting claude monet" plus stable-diffusion and a few minutes of tweaking some settings; she now has a new desktop background.
I actually paint and spend a lot of time looking at 'serious' paintings. AI hasn't even scratched the field to a trained eye.
Doesn't mean I'm not excited though. This kind-of feels like I'm watching the camera or printing press being invented. Everyone is comparing it to fine art, but I think ultimately it's going in a different and bigger direction.
What I did was, IMO, a different and bigger direction to fine art. I mean I could tell that this wasn't an impressionistic painting just given that some areas of the grass were too detailed. It looks "just fine" though to untrained eyes, which are well over 90% of the population.
1. How long would it have taken me to get good enough at painting to exceed what I generated in under an hour? How many people have the motivation to spend that time?
2. How much would I have had to pay an art student to make a painting better than what I generated in under an hour?
Ten million sub-par Monet knock-offs didn't exist, but could exist very shortly at minimal cost. Even if it never gets any better this is already potentially disruptive, and the models are getting better every month.
I've heard this a lot, luckily it's not that hard to test if you can really tell the difference. We need someone to create the 'AI Pepsi challenge' for artists to settle this.
This was sort of misreported. He won in the "Digital Arts / Digitally-Manipulated Photography" category, not the entire contest, and it's fair to say AI fits in that category.
This picture is also unusually coherent for Midjourney; if you just ask for a 16:9 image the sides tend to evolve into totally different pictures.
Those are garbage western judges. Modernism really destroyed representational art in the west. Try a competition with Russians or Italians running the jury. Or ARC-recognized atelier instructors.
Also per the email release, variations/inpainting, the trick used to simulate outpainting before this, now generates 4 images like a normal DALL-E generation instead of 3 (which was arbitrary anyways).
I do wonder how expensive the outpainting is. I'm assuming that each additional step in the timelapse is a full generation, in which case ~15 generations is about $1 total.
A few possible theories, some might be mutually exclusive:
Organizational scar tissue making them more risk averse about the PR risks of letting the genpop use AI generation tools, and create something offensive. With the safe assumption that Google will get blamed, not the user.
Fear of government regulation on AI if they don't self-regulate.
No need to actually release it, since this isn't the core business but just research. (While openai needs to actually create the business.) Corollary: more to lose -- a scandal around offensive content will not hurt openai's non-existent other businessess. It might make some advertisers pull their ads from Google.
The opportunity cost of building a self serve platform is too high. (Can't pull in people writing those kind of apps from projects with more commercial importance. Can't make the ML researchers do that.)
They misjudged how much demand there would be, and thought that building a platform would not be useful for a few years. And if it now turns out to actually be a great business it'll now take them a year to productionize and build a platform.
Their compute requirements are so high that selling access is not viable, the costs are prohibitive for real users.
It's not that different from e.g. self driving cars. Pretty obviously they had better tech from early on, but were not willing to take the risks that Tesla was.
Google is most interested in maintaining mind-share so that researchers don't jump ship. They could always monetize Imagen through Google Cloud but are concerned about risks (NSFW, legal issues, bias, etc.) so would rather wait for others to step into the water first.
No one has actually figured out a business for this yet. Sure people are paying small amounts to play around with DALL-E, but it's not a business model, it's basically just a subsidised tech demo at the moment. What will the business model actually be?
I imagine there's a market for a stock-photo style service that has a very large number of good images for very diverse topics, but DALL-E etc are a bit low level for that, there's a lot of product development that needs to happen on top of that. A stock photo service doesn't seem to be the sort of thing that Google would get into.
Maybe it's an art-helper plugin to image editors? The Stable Diffusion based plugin sounds promising for this, but will _artists_ want to use it? Surely the point of art is (except for art making A Point) that the artist produced it with their talent? If you're not trying to make Art then maybe you need the stock photo service.
Perhaps I'm just not very imaginative, but I can't think of any use-cases that aren't either extremely niche, or are better served by a higher level style service that happens to use a model like this under the hood.
But my guess is that the market for “programmatically created good enough images quickly” is larger than the market for “inspired, perfected hand drawn digital images”.
Does the new generated picture take into account of all previously generated image or just whatever is around the square, the first is amazing, the latter was a feature that was already there.
Regardless, this is a great way for people to fight the lack of detail in Dall-E which I think is one of it's largest flaw.
Obviously, it’s going to be an incredible boon for content creation. I suppose that in the future it’ll make creating videos an order of magnitude easier, which will allow a single person or a small team to make a high quality movie where all the assets are generated, so that’ll really give us an eye into a lot of people’s imaginations, for better or worse.
To leave a thought provoking example, what’s going to happen when every adolescent has the ability to make a convincing deepfake?
It’ll put nation states in a similar position than they already have with crypto, where they wonder if they should ban, or regulate… doing nothing wont be an option.
I can't help but feel like they're adding this at this particular point since Stable Diffusion has announced they're releasing their 'inpainting' model next week.
I really doubt it's related at all, though everyone would think it looks that way. SD has only been out a week and this feature would have taken much more than that to build, test, enroll demo users, make a webpage for, etc.
I can't prove it of course but it wouldn't surprise me if they had this pretty much done already long ago (dall-e has been out for several months at this point). The actual implementation doesn't look like it'd take more than a few days to code honestly (and they've got quite competent coders over there). Only speculation of course.
I have been working on an outpainting piece (in Photoshop) currently 10609 x 8144. I am very pleased to see more support for this, though hoping it doesn't kill my current flow.
Seems like it is currently not working on their site.
I cannot get this to work properly (in Safari). It just won't regenerate anything above or to the left of the image; it acts like I selected the opposite sides if I try it.
Looks like they pushed a fix. Now I'm getting funny "you get what you asked for" issues; if my prompt mentions a face, then using outpainting to create a 16:9 background behind it doesn't work so well - it just starts making more faces.
While maybe not "as good as a human" creatively, wonder when this matures a little more, we'll see whole art/design departments go to the wayside and be replaced by stuff like this...
I love this idea of extending the canvas to build out the scene. It makes me wonder if anyone's tried using Poe's stories for illustrating with AI? His descriptive writing style seems ideal.
SD is very capable, especially considering it runs on like 6GB of VRAM (we don’t know on what A100 96GB VRAM clusters DALL-E runs), but you will need to be more specific with your prompts to achieve comparable results.
Img2img mode where you provide SD a sketch and a textual description of what you want to achieve abd it figures out result is a killer!
I uploaded a digital painting, selected "Edit mode", added a generation frame and prompted "complete the painting in frame" ...but it just added a completed unrelated photo related to painting in that frame.
I guess prompting that is "similar" to the image. The output mine gave was pretty lackluster. I had to overlap the image significantly, and even then it didn't seem to take into account enough of the context to make something that resembled the style close enough.
And every time I drag that little square reticle to fill in a 128x128 patch of an image, you can be sure it'll be a 15 second API call that I'm charged $0.25 for. Yipee! Very open.
It doesn't seem odd to me that a product that involves an absurd amount of data and computing power isn't an easily consumable commercial product available for mass download.
Oh sorry, didn't realize you wanted them to develop the model for you. And give it to you. I meant no one was stopping you from building what you want for yourself.
Because depicting a woman in a kitchen is perpetuating the pernicious male patriarchy? Sorry we're not doing that. You might find some reception for this sort of thing on Twitter though.
https://old.reddit.com/r/StableDiffusion/comments/wyduk1/sho...