Hacker News new | past | comments | ask | show | jobs | submit login
DALL·E: Introducing Outpainting (openai.com)
450 points by dannyw on Aug 31, 2022 | hide | past | favorite | 289 comments



Meanwhile someone has already built a photoshop plugin for Stable Diffusion that you can use today to do basically the _exact_ same thing:

https://old.reddit.com/r/StableDiffusion/comments/wyduk1/sho...


Doesn't make sense to me why OpenAI has kept DALL-E closed source for so long. I can only guess either safety from misuse or leveraging it for money. At this rate though, Stable Diffusion is going to dwarf it


> I can only guess either safety from misuse

I still don't understand what this would mean. Where are all of the terrible things that were supposed to happen, now that Stable Diffusion is available?

We've been able to create completely photorealistic fiction for decades now. See any movie with CGI for an example of whole worlds, and people, that don't exist. The bar has gradually been lowering (see the amazing CGI that YouTubers do these days), and now maybe there is a bit of a step function down, but being able to make things that aren't real isn't remotely new. I don't understand the fear.


While I think the fear might be overexaggerated, being able to make realistic fake content with such ease means it's harder to know what's true and what's not. Plus this has been the claim of OpenAI from the beginning. It's possible the true objective is to keep it private to leverage for money and this is just their excuse.


> means it's harder to know what's true and what's not

The danger for society is in not already knowing that is the case, since it's relatively trivial, without AI, to make fake content.


Indeed, this talk OpenAI does is basically security through obscurity, and it's holding us back. Look at how often people make noise with screenshots of tweets or emails that never happened. You don't need photorealism or fancy machine learning for that, and it creates a lot of problems! If they weren't pretending that all we need is to put some yellow tape around machine learning, maybe there would be some interest in solving this type of stuff properly. But you don't need "AI" for that. You just need public awareness and some basic, pre-existing cryptography knowledge.


Well sure, I think that's dangerous too. I think more people should be skeptical of the images and content they consume in addition to it being a problem that truth is harder to discern.


Like every new technology that came before, it’s a tool that will be used for good and for nefarious purposes. Some tools are powertools. There will be new benefits, but be prepared for a new raft of modern problems.


Photoshop is a pistol, this thing is a gatling gun.


We are already living at a gatling gun level of content, from any individuals perspective into the web.

We are way past that. If anything, trivialising and democratising the generation of fake content, will educate the masses. Perhaps, bring back the trust for science, which we need a lot.


And how often does this happen with Photoshopped images that aren't immediately disproven?


My grandmother once emailed my family frantically after she saw a picture of the Abraham Lincoln statue defaced with graffiti. Obviously that was a Photoshop, and in this case, even a bad one, but clearly fake images and content make it harder to discern truth


> means it's harder to know what's true and what's not.

Obligatory XKCD:https://xkcd.com/2650/


>> I can only guess either safety from misuse

> I still don't understand what this would mean. Where are all of the terrible things that were supposed to happen, now that Stable Diffusion is available?

Mainly people making porn (e.g. stuff like deepnudes). It seems like a lot of work has gone into into preventing that (e.g. filtering porn out of training data, having porn-detection models to block porny output). There's also been a lot of talk about political fakes, etc, but I'm not sure how likely that is to actually happen at this point. I think one of the "selling points" of limiting access to DALL*E was that they could revoke access to people who they deemed to be misusing it.


Someone else will come along that doesn't have the same arbitrary limitations, it's a battle you're bound to lose.


Agreed. Unfortunately, it's just a matter of time before someone develops a similar open source project that someone uses for unsavory purposes.


A problem is that training huge model costs $$. https://twitter.com/EMostaque/status/1563870674111832066

I wonder is it possible to distribute the task for volunteer's idle GPU (like BOINC)


The porn industry is much more than $600k. If there's a business model/need for deepfake porn somebody would do it eventually.


Porn industry will happily pay but I suspect that they won't open source it.


Stable Diffusion allows anyone to make kiddie porn with a half-second of curiosity/effort. Maybe you didn't know about that, maybe you think it's NBD, but in any case, that is the tire fire which aspiring AI majors want to avoid.


Pen and paper can do the same. Or Photoshop. Anyone can draw anything! OMG, stop the paper factory.


>Anyone can draw anything!

I'm pretty sure one of the primary arguments for Dall-E and Stable Diffusion existing is that there are lots of people who can't draw anything.


Actually, anyone can't draw anything, especially the closer you get to photorealism. Also drawing by humans cannot be done in parallel very well. With an AI you can get an unending stream of pictures.


Stable diffusion can make very realistic looking images (probably videos soon) that is accessible to anyone.


I can't quite decide if that's good or bad.

On one side there's no problem to like CP if it's AI generated and no real kids are harmed.

On the other side it may desensitize the public to this type of content.


It’s been a week, there is going to be an explosion of believable fake items that are going to be used to lure people in to even more unbelievable conspiracy theories than there currently exist. Your average conspiracy nut didn’t have the skills or know how before, but they sure do now.

Also you’re probably not seeing all the pedo content that people are already generating for themselves.


> be an explosion of believable fake items

I think that's a good thing. A slow creep is dangerous, because people may not notice, and would be tricked. An explosion of insanity makes everyone realize they can't believe everything they see on the internet.


It will be all of the above so it doesn’t really matter since it’s all going to be indistinguishable from the real thing.


Which has been the case for the last few decades. Nobody should be trusting anything they see on the internet.


It might end up having the opposite effect. When all sides of an argument can bomb each other with AI generated content it could poison the well and make people less likely to believe things they see.


Instead they'll pick and choose whatever to believe the conforms to their world view or comfort, and dismiss the rest as 'fake news'.

We'll get more Q-Anon than Wikipedia.


You have to wait a little bit more, until HD video synthesis is possible on a mid-range GPU. Then on a mid-range smartphone.


(see the amazing CGI that YouTubers do these days)

Any favorite examples?


Here's one (that also goes into the details of how it was done): https://www.youtube.com/watch?v=0Cz8CjLq0fQ


Thanks! Are you currently hacking on any interesting hardware projects, if you don't mind me asking?


CGI sucks. Monty python had the best special effects.


> Doesn't make sense to me why OpenAI has kept DALL-E closed source for so long. I can only guess either safety from misuse

Paternalistic moralizing as a method to discriminate who gets access to models. Everyone else gets these cloud-service table scraps. That's why Stable Diffusion is so awesome -- YOU have the model!


It's sort of both. OpenAI, being an outgrowth of the AI doomerist community, does have a bunch of people who really do think the technology is too dangerous to be given to the masses. This happens to mesh perfectly with the other group of people at OpenAI who want to make tons of revenue. It's a harmonious alignment for everyone! Except, y'know, us.


Content creators, like artists, also happen to hate filters. They do not want to have San Francisco VC culture induced political correctness imposed on their work. This helps Stable Diffuse to quickly gain popularity.


That's not "VC political correctness", it's "you can't use credit cards or be hosted in many countries if your online service produces porn".


There is a wide gulf between "porn" and the things that will get DALL-E 2 to yell at you.


Wasn't OpenAI supposed to "democratize deep learning"?

It seems more like they were trying to accomplish the opposite.


Elites for democracy! With their elite studies and abilities they will democratise AI by teaching it what is right and wrong. They already know better than regular people and AI.

And being so open they first lock the model up and charge a fee, so anyone can pay. Just spreading democracy through paid API calls. /s

I was a bit mean, they did kick the field in the butt and pushed us ahead even with all the stubbornness and secrecy. But now they are just holding us back.


It's in their interest to posture that way publicly while controlling scarcity of access where there's financial upside to


Exclusively licensing GPT-3 to Microsoft seems like a clear example of this.

https://www.technologyreview.com/2020/09/23/1008729/openai-i...


That's the thing, once the cat is out of the bag, it's out. Once someone develops AGI, it now exists. You can choose to either share it, or sell it.

You might think that the nuclear bomb is a good analogy to use here, but it is not, because once the field has advanced to the point in which one group can develop AGI, it is now possible for other groups to develop it with relative ease, unless you actively take over the world first and deny those other groups the compute resources necessary to train/run AGI.

The point is, once these algorithms are upon us, you must be willing to accept what impacts they will have, even if it destroys entire industries. The alternative being that you destroy the industry slowly rather than quickly, while simultaneously widening the gap between the elites and everyone else.

The mistake is thinking that people can't adapt to the times, which is only true if you are actively holding them back.

If someone developed AGI today, the best thing to do would be to instantly throw up a torrent of it and spread it as fast as possible, because if a sole entity is able to get it first and kick the ladder away, we are most likely screwed.


I've never understood what exactly the "open" in OpenAI is supposed to imply. They produce proprietary, gated models - not open in any meaningful sense.


OpenAI isn't open at all, it's just named that way to attract attention, like the bright green "FREE BIKES and rentals" place near fisherman's warf in SF


I don't follow this stuff very closely - is there any open-source model for text generation that outclasses GPT-3? Stable Diffusion has been released for barely a week and already seems like the clear winner. It doesn't seem like any of the open (actually open) text models have made as much of a splash.

Of course maybe it's just because text is less visually impressive than images.


There are some open models as good as initial GPT-3 (which wasn't hard), but whatever they did to create InstructGPT hasn't been reproduced as far as I know, and it's the first one to really seem magical.


They’re just harder to run on your own resources, since large language models are very large. BLOOM was released a month ago, is likely better than GPT-3 in quality, and requires 8 A100s for inference, which pretty much no one has on their desk.


Can anyone confirm if BLOOM is better than GPT-3 at instruction following? I might have read somewhere that it's not as well behaved.


GPT-3 was fine-tuned after release to be better at following instructions. I don’t think that’s been done for BLOOM.

BLOOM incorporates some new ideas like ALiBi which might make it better in a more general sense. They haven’t released official evaluation numbers yet though so we’ll have to see.


That makes sense, I didn't consider that angle. Thanks for the info.


> I can only guess either safety from misuse or leveraging it for money.

The former is being used as justification for the latter.


My headcanon is they realized this stuff might be the essence of consciousness itself and wanted to shelter it in a persistent storage medium where it could grow and learn safely instead of releasing it to the wild to be booted up and destroyed by every yokel with a gpu


Fun idea, but we've got 70 billion conscious beings killed every year for food. Money trumps consciousness.


>I can only guess either safety from misuse or leveraging it for money.

and it shouldn't be difficult to pick which of these is actually true.


>Doesn't make sense to me why OpenAI has kept DALL-E closed source for so long.

>leveraging it for money


It was a long gap between DALL-E 1 and 2, a whole year. In that time they just sat on it, didn't release anything. Such a bummer. My theory is that they wanted to hype everyone up even more for the grand commercial release.

Funny thing is that people didn't stand still and invented diffusion and other CLIP guided image synthesis methods, and DALL-E 2 copied the method, completely changing from the first architecture.

Their arrogance is that they think they can ride the dragon. They want to be the ones to discover, advance it, and control it. But everyone else doesn't have time for that shit.


The fact that any of this stuff works is so surprising that none of it could possibly have been planned ahead on anyone's part. StableDiffusion has some real novel research of its own in there; the methods OpenAI and Google used couldn't produce a model that trains or runs as fast as SD.


I think they wanted to keep public attention on this technology “positive” for as long as possible by curating the art that becomes public. Long enough to position themselves profitably. Eventually this technology will serve as a window into the minds of teenage boy gamers and 4channers and the general public will be disgusted.



Which now supports inpainting (as of 36 minutes ago): https://www.reddit.com/r/StableDiffusion/comments/x2tk1g/sta...


These examples aren't very convincing. It seems like Stable Diffusion's inpainting has a bit of catching up to do.


If you checked out the Photoshop extension inpainting that also used Stable Diffusion. I figured this Krita extension was just using a lower quality setting or something, since the Stable Diffusion Photoshop results looked much better. Maybe it could come down to the amount of VRAM in the machine the demos were recorded on, or something.


I'm not sure but by the looks of it, it appears to be doing only 3 iterations. On https://replicate.com/stability-ai/stable-diffusion I can only get good results with hundreds of iterations


re: Stable Diffusion: is there a site similar to https://www.craiyon.com/ where I can experiment with Stable Diffusion?


Here's one by Stability AI themselves: https://beta.dreamstudio.ai



A collection of sites using stable diffusion:

https://www.reddit.com/r/StableDiffusion/comments/wzj8kk/a_c...




That doesn't appear to be the same thing. That plugin looks like it's for generating AI images directly in photoshop, rather than providing an image and it automatically generates the area outside the image.


sama is very active in YC and makes the call on OpenAI product roadmap. Furthermore YC encourages good CEO-community relations. The fact that OpenAI is so far behind Stable Diffusion and has reduced pricing shows that sama wants OpenAI to be a highly profitable enterprise company. I.e. not “Open.” You can do both (e.g. Cloudera) but clearly sama is not strong enough at AI to make this happen.


> sama is very active in YC and makes the call on OpenAI product roadmap. Furthermore YC encourages good CEO-community relations

Sam hasn't been at YC in years and (based on anything I've seen) isn't active in YC at all. As for "YC encourages good CEO-community relations", I have no idea what that means* but it has nothing to do with HN. We encourage good content-community relations and that's it.

You have a long history of posting dark insinuations about YC/HN, not to mention nagging the mods about how bad we are and how much better you yourself have done the job in the past. I mostly let the latter go, but when you start with the ethical insinuations, that gets my dander up. It's time you stopped smearing people's reputations on HN. If you have evidence of wrongdoing, post it—I'm sure the community will be extremely interested. If you have no evidence, please stop from now on.

(Edit: I realize it probably sounds like I'm over-reacting to the parent comment, but this has been a longstanding pattern. We can cut people slack for years, but not infinitely.)

OpenAI stuff and Stable Diffusion stuff (and DeepMind stuff for that matter) are all popular on HN because the community is super interested—that's literally it. We're not pulling strings or playing favorites (we don't even have favorites in that horserace, at least I don't). As a matter of fact, the last thing I did before randomly running across your comment was downweight the current thread because of the complaints at https://news.ycombinator.com/item?id=32665587.

* unless you mean that we advise founders about how to write content that actually interests the community—that we do, and not only YC founders but non-YC founders, open source programmers, bloggers, and anyone else. That's all a consequence of wanting HN to have good content and seeking to avoid the boring stuff. By the way, I'm working on an essay about how to write good for HN and avoid boring stuff too; if anyone would like to read it, email me at hn@ycombinator.com and I'll send you a copy.


>that gets my dander up

Your w0t m8?


Endangered Words Bureau Agent D23 at your service


> D23 at your service

That's my username, and I'm also named Daniel. This is a conspiracy.


Endangered Words Bureau Agent D23a at your service


idk how you have time to shitpost while also moderating HN, hats off dang

EDIT: just realized.. what if dang was an advanced AGI the entire time and HN has been one giant turing test to see if any of us would notice


>Endangered Words Bureau<

"Use Them or we will lose Them!"

I used Twixt (An abbreviation of Betwixt) to replace inbetween in a submission.

Edit: to fix formatting and spelling


I kinda feel like they chose the name "open"ai when they started back in 2015 because musk etc wanted exactly the kind of thing stability ai is now creating. I.e. something other than a corporation like google having primary access to these models, and it being more democratized. But as time as gone by they've strayed away from that vision but changing the name would be a PR nightmare.


This is actually an interesting illustration of how infectious a name can be.

If they hadn't called it OpenAI, the conversion from open to corporate would be much easier, where's now it's a significant pain point in OpenAI's reputation. It's kinda nice to see the original vision still somewhat having an effect despite no one left to propagate it.


But aren’t the results from stable diffusion not nearly as good as DALLE2?


You don't have to drive SD with a text prompt; if you take the model weights you can feed anything you want in there and generate an image out of it.

Since DALLEmini and DALLE2 are more "creative" (since they use a better text transformer) you can use them to generate the input and SD to refine it for more fine detail.

https://twitter.com/hardmaru/status/1559861001163788289


So... are we done politely coughing and looking out the window at the idea that the gatekeeping was motivated by altruism so that we can move on and just use this much better innovation model going forward?

Various (subjectively judged) SOTAs on at least some subset of at least this family of tasks is changing somewhere between daily and hourly right now. I've been watching this stuff closely since fairly early ImageNet days and I've never seen a Cambrian explosion of "how the hell did that do that?" events at anything like this cadence.


If you're talking about scientific gatekeeping like pre-publication peer review, the gatekeeping was never really the root problem.

If researchers have great products or findings to show off, it's easier than ever to simply publish them somewhere online and let impressed audiences spread the word. Gatekeepers have been irrelevant to truly great science for a long time.

It's mediocre science that needs gatekeepers to distinguish it from not-even-mediocre, truly substandard research.


The gatekeeping GP is referring to is OpenAPI keeping DALL-E 2 behind an invite-only API, with weights unpublished, vs. Stable Diffusion publishing the whole model for anyone to download and use.


In fairness OpenAI aren’t the only ones to use fairly thin logic to avoid saying “this is about money”. Everybody was doing that until roughly, now.

They’re just the only ones calling themselves “OpenAI”.


Feels like a race to the bottom. More features, lower cost, every week. No idea where it’ll level out, but I like it. Just bought some more Dalle credits today because it’s so much fun. This is a revolution in ‘art technology’ it’s like Steve Job’s bicycle for the mind. Best I could do a month ago was a stick figure in MS Paint, but now..


Stable Diffusion is arguably better, has more features, and is free. OpenAI can't compete with free.

Even if you don't want to take the 30 seconds to set it up in a free Google Colab environment, the paid DreamStudio version is still half the price of Dalle.


I find Stable Diffusion better overall, but it has downsides. Stable Diffusion tends to be more creative than DALL-E, but does a lousy job of following directions, especially complex ones. DALL-E is good if I know what I want specifically.

I can think of ways to fix Stable Diffusion since it's open-source. I think I could bridge the gaps as I see them in about a weekend of hacking. I'm not sure when I'll get that weekend.

(Footnote: What I want to do is not something I can explain without a technical blog-post-length document or a zoom call; it's about the same level of complexity as the other major SD hacks we've seen)


Setting a high cfg parameter, like 13, drastically helps with the prompt following.

That said, for me, I agree that dalle does much better pencil sketches.


Raising the cfg ("classifier-free guidance") scale is essential for following the prompt, but if you raise it too high the image gets weird and saturated.

According to Google's Imagen paper this is literally because the pixels get multiplied by the cfg scale and start clipping; they have a technique called dynamic thresholding that replaces it. Not sure if SD uses this, but I saw Emad hinting they were training an Imagen model…


Something like prompt weighting? I've seen implementations of that floating around.


I don't think Stable Diffusion is technologically better yet.

Sure, both SD and Midjourney produce absolutely beautiful artworks most of the time. But if you want something specific and out of the ordinary it takes a lot of attempts and promptcrafting (and sometimes you are unable to accomplish what you want at all).

However, my experience is that these prompts (which SD/MJ struggles with) often produce good results in Dalle2 even on the first try.

Of course, OpenAI has very limiting content policy. But if I have something very specific in mind and it passes their rules I currently chose Dalle-2. Even though I've spent much more time with SD.


Note that Dalle2 uses CLIP guidance while SD doesn't, it is a feature that will be added soon and can already be added by users if they want although im not sure how easy it is. MidJourney has already shown theyve implemented it in their beta using stable diffusion and it make the results 200% better trust me


After many months of waiting on my invite I got it and I entered the prompt which is my greatest fear for some reason "a red eyed hairy spider with human hands as feet" I got a warning about violating policy/harmful content etc or something. Not only that, the results I got were super underwhelming, after playing with it for a half hour I haven't looked back. Now playing with SD and an upscaler, there is no limit to what I can create. Also I always found it funny the company name hilarious. "Open"AI.


Also, unlike DALL-E, SD comes without a content filter and "anti-bias diversity" filter so it gives you what you ask and treats you as an adult.


There's still a NSFW filter (at least, in the version I used). I'm sure it's easy enough to disable if you poke around but it's not exposed as a function parameter out-of-the-box.

I didn't investigate any further because I'm not actively trying to generate porn. It was pretty annoying to have my results blanked out because they were (apparently) NSFW - so either the filter was triggering a lot of false positives or the model was generating NSFW content for non-NSFW prompts.


After removing the NSFW filter, I can confirm that 99% of the blocked results are not in-fact NSFW. The filter is VERY over aggressive.

To remove the filter, you just have to comment out the two lines of code which call it. Instructions are here: https://reddit.com/r/StableDiffusion/comments/wv2nw0/tutoria...


Stable Diffusion is much less of a nanny too.

Amusingly it's more open in every way.


Better in what way? I tried 10 prompts that returned good results in DALLE, but nothing good in stable diffusion.


Seconded. I got awesome results in making "artwork in the style of Yoshitaka Amano" in DALLE but horrible ones in Stable Diffusion. Maybe the prompt was incorrect there (it would be great if these were more discoverable), but they art in SD was lacking.


SD definitely needs more coaxing and naive prompts tend not to fare as well as with Dall-E.


There was a good example somewhere I can’t find, but of a really complex prompt that Dalle could understand, but SD couldn’t. Maybe some of the GPT-3 is being leveraged for parsing.

Anyways I think it’s way too early to start taking sides. I enjoy using all these system.


One of SDs big limitations (understanding from what I had read about it) is positional prompts. dall-e seems to understand x on top of Y, but simple diffusion does not.


Doesnt seem to get IN examples either. E.g. a prompt like 'an eagle holding a snake in its beak' ends up generating eagle snake hybrid creatures.


Ironically, "the cat is on the mat" is a conventional example sentence in linguistics of metonymy (semantics).

I have no examples but imagine things like at the top of his game are immensely problematic, albeit not very visual to begin with.


img2img drawing should take care of that


I'm experimenting with the base model right now, I'm going to be very excited to try that one out too.


Do you know the best Google Colab tutorial / repo?


Hi, there are a couple of good UIs. https://github.com/cmdr2/stable-diffusion-ui is an easy-to-install and use tool, written by me (with contributions by many). Version 2 is in beta, which is a 1-click installer for Windows, no dependencies or command line needed. v2 beta: https://github.com/cmdr2/stable-diffusion-ui/tree/v2

https://github.com/hlky/stable-diffusion is another popular and good tool.


I'm impressed how fast this is getting adopted. Dozens of repos have popped up.


Any of these that work work with apple arm64 m2?


Awesome, thanks a lot!


Thank you!


dreamstudio is also waaaay faster than openai. generally a second or two for 512x512 at 50 steps.


Running this at home is only free like mining cryptocurrency is free if you didn't buy your computer and don't pay for the electricity. Plus you can only run it on the computer that has the good graphics card, which probably isn't your laptop.

I expect most people aren't going to be generating images all day, so using a cloud-based service for occasional use will still make a lot of sense.

Stable Diffusion offers a paid service to do this too, and there's nothing wrong with that business model. Prices will probably come down, though.


Not sure if GP had this in mind, but SD is (more) free in terms of liberty. So yes, you pay with electricity and hardware, but you control the process yourself, which is invaluable. DALL-E could change or go offline at any time.


Considering the threat from DALL-E going offline, it seems quite acceptable. These aren't precious photos since it's all made up anyway, you can download any pictures you make, and you probably already did for the ones you care about.

I'd worry more about, say, keeping your photos on Google and losing your account somehow.


It's not only the threat of going offline. DALL-E makes it extremely difficult to generate many ideas because of its absurd content blocker - for example, I had something like "ominous, foreboding landscape beneath a black sun" blocked because (from what I could tell) it has words with negative connotations and the word "black" in the same sentence. It does this all the time, their discord is full of examples.


Yeah, if you run into those then you'll want to use something else. (I haven't in my casual usage.)


It does run on Apple silicon. 55 seconds in M1 Pro (vs 15 seconds on RTX 3070).


That's pretty good, but with that level of latency, I can still see people paying to use an online service that's faster. Maybe they'll speed it up more, though?


it will be faster soon as Emad said there was issues with the ISA for the gpu not being fully ported or something of the sort


Is this native? Or Rosetta?


Native, and judging by the speed it's using Metal too (as opposed to CPU fallback).


Now many people have gaming PC for gaming, so they can use virtually free GPU to generate images if VRAM is enough.


It's a race to the top. New functionality is added and the model is improved week over week.


That's still a race to the bottom if the price isn't going up.


Race to the bottom implies that they're only competing on price. Here, they're competing on new functionality as well. If DALL-E's outputs were substantially better than Stable Diffusion, more people would use it, even if it cost more.


That’s not what a race to the bottom is. It’s just competition, which is usually good.


It often feels like words are losing their meaning, with everyone misusing terms they don't fully understand.

I don't want to be a doomer and have have surely unknowingly misused terms as well, but its definitely noticable how these originally clearly defined terms are getting used in entirely new ways.

And it's not just with technical terms like this, it also applies to originally obvious terms such as racism, sexism etc which have lost their original meaning entirely


I can understand the criticism about technical terms (they work better if stable and precise), but regarding the rest: that's just how language works. You can't (and shouldn't) expect words to keep their original meaning forever.

For example, the word "term" comes from the original latin "terminus" that means "end" or "boundary". It only got the meaning you used it for centuries after it was first used in English. See: https://www.etymonline.com/word/term


Oh, it wasn't my intention to criticize anything or anyone in particular with that comment.

I was just pondering that our originally clearly defined terms are rapidly getting used in very confusing manner, which increases the difficulty of a discussion, as participants interpret words very differently.

I dont think that people look up the actual definition of terms in a thesaurus anymore. They hear it in some context and create their own personal definition. It wasn't as obvious before the internet i think, but nowadays everyone is bombarded with technical terms all the time, which likely contributes massively to this increasingly fluid terminology


Also, the use of "unironically". What is going on there.


There is generally a negative connotation to race to the bottom. The Investopedia definition captures this:

The race to the bottom refers to a competitive situation where a company, state, or nation attempts to undercut the competition's prices by sacrificing quality standards or worker safety (often defying regulation), or reducing labor costs.


Thanks. Yes this is what I'm trying to say - a race to the bottom is about companies seeing how crap of a service they can give you that you're still willing (or have little choice) to pay for.

This appears to the the exact opposite, a race to provide more services and more features for a lower price, based on optimising and/or lower profit margins. AKA Capitalism actually working for a change.


Price would have gone up if SD wasn't open source, look at the new google collab pro limitations and you have indications that they're loving this new wave for milking it properly, I just ordered a GPU to run on local.


I don't think so, Colab pro limitations are precisely because they weren't charging by compute unit, so they were over-subscribed.


I feel like you aren't using the phrase "race to the bottom" correctly here. Generally a race to the bottom implies some kind of detrimental outcome for the world as a result of people failing to internalize externalities generated by a business.


It has to do with commoditization and decreasing costs. Taking something technologically sophisticated and having it become open source and accessible so quickly is going from the top of the pyramid - big companies gate keeping betas, to the bottom - the public, available to everyone, cheaply. These companies are desperately trying to monetize this technology, but the value in terms of what people will pay is falling fast. It might not be a sustainable business model for OpenAI or anyone else for very long. Hence the race to the bottom - quickly make a buck before you can’t.


I share your enthusiasm for this development but curious what you mean by race to the bottom?

There does seem to be a lot of vague angst about how this will affect the nascent "Prompt Engineer" career track, but I hope most are comfortable letting the open innovation play out a bit before trying to personally monetize it..


> race to the bottom?

In this context it's a good race. This software seems to have caught fire and tons of people are playing with it and providing tons of crazy new tools for cheap or free.

It's a race to the top for us.


>Best I could do a month ago was a stick figure in MS Paint

You're forgetting (or not knowinng?) NVIDIA Canvas that came out one year ago, give or take. Literally turns stick figure complexity drawing into photorealistic stuff.


> Best I could do a month ago was a stick figure in MS Paint

That is still the best you can do... which happens to be about the best I can do! Just like my introduction to the computer at a young age has atrophied my handwriting quality.


I guess if we’re going to get into semantics and the definition of self, where does the ‘I’ end and something else begin then I don’t really do anything. You could also say I can’t walk either without the ground.


I was just being needlesssly pedantic. I guess it's a spectrum from "I painted the mona lisa" to "I pushed a button and a mona lisa appeared". That's a very individualistic view also though, maybe the thousands of programmers than commited to the thread of history that arrived at you pressing the button are part of the art performance.


I think he’s calling you a cyborg.


Just switch to a PalmPilot.


> it’s like Steve Job’s bicycle for the mind

I have been thinking the same thing, it's sad Steve will not be able to see it


Steve would be trying to lock it down in his walled garden.


Not sure why that is more sad than all the other dead people that can't see it.


Nobody said it is more sad.


the same thing could be said about ethereum and solana


I think Dall-E would benefit from a "sketch-based" prompt in addition the text based. This was mindblowing - https://andys.page/posts/how-to-draw/


It does feel like art's disruptive "Calculator moment" is happening where you can now leave a lot of basic/mechanical tasks to a tool and give more focus to higher-minded problems.

It's going to get so cool and interesting, I think.

A lot of the conversation around art may focus more on composition and objectives of the artist in the new prompt engineering world, with less bias from factors such as rendition quality etc. creeping in since it's so incidental.

New forms of art will emerge and/or gain popularity that focus on trying things the tools aren't good at yet. The human artist of the gaps. The niches will constantly be shifting.

I wonder if we'll learn to recognize the output of certain popular models and perceive them as instruments. "Made by xy on z" instead of "xz on guitar", so to speak. I remember the 90s/early 00s internet when it was always easy to tell when something had been done on Flash, just because of its line anti-aliasing rendition style being so distinct and familiar.

The novelty will wear off, and we'll all start to feel a bit disappointed that the average human's imagination is pretty limited and novel/original ideas remain somewhat rare as the patterns and tropes in all the generated art emerge. It's great you can put the space needle where you want it and get a good-looking city and space ship, but how many variations of a cyberpunky skyline with a space ship do you need? And then we'll celebrate the novel stuff that does happen, as always. I suppose the tropes will evolve faster as the throughput goes up.


>basic/mechanical tasks to a tool and give more focus to higher-minded problems.

>rendition quality... [is] so incidental.

There's this thing in painting called 'mark making' and it can be the difference between an all-time-great painting and a throwaway portrait. Mark making speaks to every momentary choice of physical process a painter employs and reveals their thought process. For some of the greatest painters, it reveals their genius.

Do not discount execution. Overlooking "basics" and "mechanics" is what results in disappointing work.


It's a fair point, and thanks for teaching me a new term!

There's a lovely documentary called "Tim's Vermeer" about Tim Jenison's - one of the founders of NewTek, the people behind Video Toaster and LightWave, incidentally both tools that made hard visual art tasks accessible to wider audiences - hobby side project to prove that Vermeer used sophisticated optical tools to capture and copy his scenes from physical sets, rather than e.g. paint his famous grasp on lighting purely from his own mind. He builds such tools himself and then proceeds to successfully create his own Vermeer-alike painting, despite possessing very artistic skill himself.

It's full of good ruminations (and good at sparking more) on tools-vs-artistry but also execution-vs-method, and whether designing and adopting innovative tools and the tedious process to use them made Vermeer less of a genius, or just a genius of a different kind than otherwise presumed.

It's very accessible and doesn't require knowing anything in particular from the art world.


Tim's Vermeer is kind of bad in my opinion. A lot of the musing border on misinformation. If you're not a painter, it sounds great, but if you have some training, it's a very frustrating doc. The resulting painting is neat, but it was immediately obvious (at least to my eyes) how different his result was from Vermeer's.

Hockney, one of the featured 'expert painters' is a hack who doesn't actually know how to paint* and therefore claims that certain gradations are certainly impossible without some sort of additional lens device. Meanwhile there are 19 year olds at the Grand Central Atelier pulling off just that.

I own a camera lucida. It got in the way more than it helped. At best, it's a novelty, now collecting lots of dust. Vermeer probably had one but it's altogether way more likely that he just had a well-trained observational/representational faculty. There are some killer painters using cameras now (Will St. John for one) but they typically have a decade or more of very rigorous direct observation to lean on.

I suggest following Ramon Alex Hurtado. IMO he's one of the more exciting young scholars on historical representational painting techniques. I don't think he has written anything himself yet, but he does do workshops and has a big informational update for his website coming.

*the definition of painting is now so broad it is meaningless. Here, I mean "attempting some degree of visual accuracy" which can be achieved in endless creative ways. Compare (easily on instagram):

- Colleen Barry - Peder Mørk Mønsted - Cecelia Beaux - Jas Knight - Felicia Forte - Eric Johnson - Ksenya Istomena - Sergei Danchev - Glenn Dean - Blair Atherholt - Jose Lopez Vegara - Hongnian Zhang - Hans Baluschek

These artists all have their own voice and stylistic choices. They also all represent things they see with some sincere accuracy. Look up Hockney's ipad paintings he got lauded for. People treat them like they're some misunderstood genius, but really, they're just bad paintings.

I'm sure he's a sweet old man and I'd drink tea with him. But if it weren't for his ilk I might have found proper instruction 10 years earlier in life. Modernists and postmodernists robbed generations of proper art instruction. Imagine if all the music teachers burned all the sheet music and refused (or forgot how) to teach the diatonic scale. "Hit the keys in a new way! Don't let yourself be bound by conformist ideals!"

P.S. I'm not fun at parties


> Do not discount execution. Overlooking "basics" and "mechanics" is what results in disappointing work.

That is what really bothered me in art lessons in high school. When discussing any famous work it was always about concepts, ideas, composition,... and execution was very much secondary. But for your own work all that is completely ignored if your coloring is just slightly uneven or lines are too rough. If you could hand in a photorealistic drawing of anything, no matter how boring, that would give you much higher marks than a rough drawing of something worthwhile.


Surely this too can be instrumentalized to evoke emotions, stylized to ease execution or faked to justify a result.


It's not as if generative art is new. Nor is figurative painting relevant anymore since the invention of the camera. A basic Burger joint in Gerhard Richter kind of style transfer is very much derivative. This isn't bad in view of the classics, but it's more like art-work to me.

The true artists in this one are the coders, no doubt (corrolar to the inteligence debate).

On the other hand, you mention an important point with layout but you underestimate the progress these days. Surely there are companies who are working on automated design beyond CAD (computer aided design), eg. for specialized antenna.

> we'll all start to feel a bit disappointed that the average human's imagination is pretty limited and novel/original ideas remain somewhat rare as the patterns and tropes in all the generated art emerge

Well, one might argue that Richter's most highly priced piece looks a little like prehistoric art of the pleistocene. It's a little vain to mention it, because I can much better relate to the more basic form, of course. A more frequently sore point would be the pop music industry between professionals and the amateurish.

Anyway, this may be thinking too big. For the time being, the bunch of techniques is better understood as a toolbox, because it will be a long time before it trumps demo-scene productions, for instance. Here it is the technique that counts more often than not. The rest is an acquired taste.


Someone should name the next image generator OWL, since it “draws the rest of the owl”.



I thought it was odd that I hadn’t seen anyone else make that joke. Turns out they had, I just hadn’t seen it. Thanks!

Reference, for those who haven’t seen the original joke to which my joke was referring: https://www.reddit.com/r/pics/comments/d3zhx/how_to_draw_an_...

(See also: https://knowyourmeme.com/memes/how-to-draw-an-owl)


I thought the crude MS paint images were a joke - I was not expecting them to be useful as actual input for the final image!


similar work using Stable Diffusion in a Photoshop plugin:

https://old.reddit.com/r/StableDiffusion/comments/wyduk1/sho...


So DALL-E is already old news and the Stable Diffusion ecosystem is once again already ahead especially with this announcement.

Quite funny to see OpenAI panicking and falling on their own sword, as they were supposed to be 'Open' in the first place and are now being disrupted by open source.


Couldn't happen to a more deserving group of people. Good riddance. Squatting the name "open" and trying to reap the benefits therein while being anything but.


I've been complaining for years about WikiLeaks not being a wiki -- no one wants to listen....


To be fair, it started out as a wiki, and they just never changed the name.

There's no CSS here but you can clearly see the MediaWiki template: https://web.archive.org/web/20090422103636/http://www.wikile...


Wiki means "quick". Are you complaining about the speed of their publishing submitted information?


Wiki means quick in Hawaiian. In English it refers to a website who’s content is user-editable.


What benefits? The parent is non-profit.

I'd argue they're imperfect, but they don't look like arses. Big gap between the two, too.


The parent may be non-profit, but OpenAI LP accepts investments and delivers returns to investors like any other regular company. The only difference is that they 'cap' the returns. However, the cap is negotiated with individuals investor and I haven't seen anything disclosing the cap except for the fact that in the opening round the cap was 100x the initial investment.

100x seems like a pretty generous cap to me.


I'm not a business investor so I can't really tell if x100 is "a lot" or not — Apple and Amazon share prices went up by x500 in the last 20 years, Tesla x200 in the last 10. But how rare is that?


They call them unicorns for a reason.


Are they non-profit? Does receiving $1b investment from a for-profit company still mean you can be non-profit?


Yes to both questions. It's a (set of) specific thing(s) in company law.

https://projects.propublica.org/nonprofits/organizations/810...


I thought their research actually is open, at least? That's still something...


Their research is closed -- they don't release model weights, nor in most cases the training or model scripts. Certain things they release, just like any other for-profit research firm.


I don't see what it has to do with profit since this is pretty normal in academia too. Scientists will often publish papers, but not everything they do.

"Open" is not well-defined.


As the sibling says, in academia this is already more than open...


Open means something. It is a, for lack of a better phrase, virtue signal. When you do that but don't actually represent the virtue you are trying to signal, people will understandably get pretty upset about that.


The notion of open in software and in academia is different, that for sure. But does that make one or the other wrong? My friends doing physics or math don't understand the hate at all when I discuss it with them. From their point of view, you don't have to pay hundreds of dollars to read it = it's open. The research is there, code is left as exercise to the reader - that's also normal to them... And the notion of giving away free compute time is funny to them. Run your own cluster, they say.

I'm not saying I agree with them, I'm a software person and open means to me what it means to you. I'd prefer if it was truly open by our definition. But I don't think we can bash the choice of the word so easily.


> But does that make one or the other wrong?

Well, I can't speak for anyone else, but it makes it wrong for me (and judging by the number of upvotes I got from the original comment, a lot more than just me).

It's not just the code left to an exercise to the reader...it's the training sets. You don't get to trade on the suggestion of "open" while keeping everything closed. They aren't idiots, they knew exactly what they were implying when they picked the name.


Botany-related articles don't ship together with the discussed plants... If you want to repeat some experiments, you're looking at a hardcore mountain trek at the other side of the world. Is this too different?

What if the training sets are proprietary (as in, they don't have licence to share)? Should they keep the research to themselves just because of that? I don't think that's better than not sharing the training set. It also doesn't mean the research is invalidated - find your own pictures and it's going to work. Same as - find your own plants and it's going to work.

TBH, I just don't see the training set as part of the research... In my case, I'd feed it tons of electronic circuits to try to teach it generate some. Why should I care about some random other pictures? I care about the research and I have my own training sets.


This is just nonsense. I pay (a small amount) for both. They have different strengths and it’s fun to compare. Adding new features to a product is not a sign of panic, it’s just normal.


Dall-E so far hasn't been able to grow an ecosystem because of how restricted it is. Meanwhile Stable Diffusion makes trial-and-error and innovation around it easy, and as a result only 9 days after Stable Diffusion's release we see OpenAI release a feature that looks like a copy of a tool from the Stable Diffusion ecosystem.

I agree that Dall-E isn't obsolete. I'd also add MidJourney to that list. All three are great models in their own right with their own pronounced strengths and weaknesses. But when it comes to enabling novel workflows Stable Diffusion seems lightyears ahead of the others.


Except you are wrong. This feature was already available as part of the Dall-e ecosystem. There was a website called patch-e which facilitated this exact same workflow.


Also its quite funny to see OpenAI (with all their researchers and engineers) get disrupted by someone with little to no background (Emad) in AI and ML but who embraced OpenAI's original mission about making AI as open as possible.


> and falling on their own sword

That's not what that means.


The only move left for OpenAI, is honour their name and make their own AI Open Source.


Or rename themselves OpaqueAI.


I came to say something similar. It feels like “OpenAI” was just a trademark grab to prevent others from using it. Of course, all conspiracy theories work well when looking backward in time.


This was already possible with DALL-E using the inpainting feature going from defined image to transparent edge; this just automates what was a manual process before. Do wish the inpainting tool had more options, for example to fade a transparency in, since my understanding is it makes a difference; not to mention magic wand selection/deselection tool.

In case it is not obvious, every time a user generates an additional section of an image using the outpainting feature, it costs a credit.


Yes indeed, and it shows the advantages of Stable Diffusion's model of just releasing the model and letting people do what they want with it - this was straightforward to implement oneself.

And while OpenAI released this feature now, it's probably just a matter of days until even better features built on Stable Diffusion will be released, given how much community energy is focussed on it right now.


Automation of manual processes is generally useful.


Maybe kludge it with a dithered transparency mask?


Only matter of time before Adobe adds inpainting with hooks to local or API generative tools, using OpenAI to edit works like this is like transporting back to past using basic image editing tools.


How long until we can run this over shows like Star Trek Voyager/DS9 and Seinfeld to achieve believable 16:9 scenes?


Temporal coherence will still take a while to solve but it's not undoable. Making things that look correct upon closer inspection rather than just looking "nice" will probably take some degree of human curation for quite a while.


Anyone working on closed caption models at the moment?


Tesla could use some of that temporal coherence as well.


Someone has been upscaling DS9 already[1]. Obviously not release anywhere.

Not sure I'd want them in 16:9, hd 4:3 like the other HD releases of TNG and TOS would do me. I understand they shot on video so an official true HD remaster is likely to never happen.

1. https://www.extremetech.com/extreme/324466-tutorial-how-to-u...


DS9 shot on film: https://news.ycombinator.com/item?id=19454370

But it did use a lot of early CGI that would need to be redone.


I don't have a deep understanding of how training models work, but I wonder if training a model with every frame of TNG and then outpaint it into 16:9 would work.


Why would you do that? What an awful idea. People made those shows in the 4:3 format, you’d just be adding fluff. This is like adding more description to a book so it becomes an epic novel instead of a novella. I’d say keep to the creators intent…


I'm not the person who suggested it, but I wouldn't mind having it fill my (wide)screen when watching. That said, I understand that some film/tv uses the frame very precisely, however I'm not sure that these two particular examples do that throughout their entire episodes. (Though I bet that in Seinfeld in particular it might weaken/ruin a few visual gags.)


Still seems to me like adding fluff - it seems a bit impossible to me that the AÍ would add anything pertinent to the plot. It would add “stuff” like corridors and background sets and maybe someone out of focus.

Do the black bars actually bother you that much? You know there are cropped 16:9 widescreen versions of some of these shows (which I personally detest, but I work in the business of moving images).

Genuinely interested in why this bothers people.


It doesn't bother me much (if the show is good, I don't realize it after maybe 5 minutes), hence why I said "I wouldn't mind..." instead of something harder. But it'd be interesting to see what would happen if shows were expanded frame-wise like this... The 16:9 cropped widescreens you mentioned take something away, whereas expanding the frame with AI adds something, and theoretically if you don't want it you can just matte it out and still have the original. It seems like a more advanced version of Ambilights.

I think we still have a ways to go before the results would work well without looking like those nightmarish deepdream videos where things are constantly shifting, so society will have plenty of time to discuss the merits.


I'm inclined to agree, but if it were coherent I'd take it over what platforms like netflix do and chop off the tops and bottom of the content so it'll fit 16:9.


What if there is critical visual content in the half of the view that you're removing. Television is a visual medium, one could assume that a good filmmaker would be using the full viewport afforded them.


I’m saying adding is better than removing, and removing is terrible.


Not as terrible sometimes, since often television shows and films “protect” for different aspect ratios (yes this is a thing). So people might shoot 16:9 and protect (basically make sure you have a readable frame) for 4:3 or you can shoot Scope and protect for 16:9.

It’s not perfect but if the filmmakers thought of it it can be ok.


Next week?


Damn, Dall-E really lost its competitive edge overnight when Stable Diffusion was released. They dropped their prices across the board in response, but honestly I think it still isn't enough to save them. The magic of open-source competition.


They dropped the prices for GPT-3, not for Dall-E.


Oh shoot, I thought it was for everything. Fair enough. Although I think new competitive features like Outpainting are definitely in response to Stable Diff


The UX is evolving around AI image generation so fast, everyday is something new. There's so much greenfield exploration space for new interaction models.

6 months from now, how we interact with these models will probably look entirely different.


Comments made just 3 days ago, "well...it can't do that", are already obsolete. I've never seen innovation at this breakneck speed. We're talking a WEEK since release.


The scrambling to stay relevant after Stable Diffusion is very very enjoyable to watch.


A few weeks ago I was skeptical that this technology would get past the emotional response we get from procedurally generated game environments, but I've been convinced otherwise. The emotional response I get from some of the best of these images are novel and thought provoking. Makes me wonder what percent of what makes us human is now algorithmically solved....


We still know basically nothing about our Brain/Consciousness. I would say we have a lot more to explore/research


Our brain is apparently just a 4 gb large arrangement of electrical weights.


Not to be pedantic, but we have on the order of 100B neurons, and afaik each of them can be connected to thousands of other neurons. I assume we probably have a ways to go before we're encoding the amount of information a brain can comprehend.


Also, your brain knows how to look up things in external storage, but ML models have to keep everything in their weights even if it's not a good fit.


My wife likes impressionists and sunflowers. "A lone sunflower in a grassy field at sunset oil painting claude monet" plus stable-diffusion and a few minutes of tweaking some settings; she now has a new desktop background.


I actually paint and spend a lot of time looking at 'serious' paintings. AI hasn't even scratched the field to a trained eye.

Doesn't mean I'm not excited though. This kind-of feels like I'm watching the camera or printing press being invented. Everyone is comparing it to fine art, but I think ultimately it's going in a different and bigger direction.


What I did was, IMO, a different and bigger direction to fine art. I mean I could tell that this wasn't an impressionistic painting just given that some areas of the grass were too detailed. It looks "just fine" though to untrained eyes, which are well over 90% of the population.

1. How long would it have taken me to get good enough at painting to exceed what I generated in under an hour? How many people have the motivation to spend that time?

2. How much would I have had to pay an art student to make a painting better than what I generated in under an hour?

Ten million sub-par Monet knock-offs didn't exist, but could exist very shortly at minimal cost. Even if it never gets any better this is already potentially disruptive, and the models are getting better every month.


I've heard this a lot, luckily it's not that hard to test if you can really tell the difference. We need someone to create the 'AI Pepsi challenge' for artists to settle this.


That was fast, looks like artists can't tell, AI wins first prize in an art competition!

https://www.vice.com/en/article/bvmvqm/an-ai-generated-artwo...


This was sort of misreported. He won in the "Digital Arts / Digitally-Manipulated Photography" category, not the entire contest, and it's fair to say AI fits in that category.

This picture is also unusually coherent for Midjourney; if you just ask for a 16:9 image the sides tend to evolve into totally different pictures.


That was the State Fair fine arts competition.. I wouldn't exactly say you're about to fool the MOMA


Ah ok and when they are fooled I wonder what rationalization you will fall back on next.


Those are garbage western judges. Modernism really destroyed representational art in the west. Try a competition with Russians or Italians running the jury. Or ARC-recognized atelier instructors.


Are they particularly good at telling the difference between wet neural nets and dry ones?


Maybe. I've been often reminded lately of Herbert Goldstone's "Virtuoso" (1958):

http://elateachers.weebly.com/uploads/2/7/0/1/27012625/virtu...


Also per the email release, variations/inpainting, the trick used to simulate outpainting before this, now generates 4 images like a normal DALL-E generation instead of 3 (which was arbitrary anyways).

I do wonder how expensive the outpainting is. I'm assuming that each additional step in the timelapse is a full generation, in which case ~15 generations is about $1 total.


Why would Google hold back on releasing Imagen if there are competitors that are publicly available already?

Imagen isn't special anymore.


A few possible theories, some might be mutually exclusive:

Organizational scar tissue making them more risk averse about the PR risks of letting the genpop use AI generation tools, and create something offensive. With the safe assumption that Google will get blamed, not the user.

Fear of government regulation on AI if they don't self-regulate.

No need to actually release it, since this isn't the core business but just research. (While openai needs to actually create the business.) Corollary: more to lose -- a scandal around offensive content will not hurt openai's non-existent other businessess. It might make some advertisers pull their ads from Google.

The opportunity cost of building a self serve platform is too high. (Can't pull in people writing those kind of apps from projects with more commercial importance. Can't make the ML researchers do that.)

They misjudged how much demand there would be, and thought that building a platform would not be useful for a few years. And if it now turns out to actually be a great business it'll now take them a year to productionize and build a platform.

Their compute requirements are so high that selling access is not viable, the costs are prohibitive for real users.

It's not that different from e.g. self driving cars. Pretty obviously they had better tech from early on, but were not willing to take the risks that Tesla was.


Google is most interested in maintaining mind-share so that researchers don't jump ship. They could always monetize Imagen through Google Cloud but are concerned about risks (NSFW, legal issues, bias, etc.) so would rather wait for others to step into the water first.


No one has actually figured out a business for this yet. Sure people are paying small amounts to play around with DALL-E, but it's not a business model, it's basically just a subsidised tech demo at the moment. What will the business model actually be?

I imagine there's a market for a stock-photo style service that has a very large number of good images for very diverse topics, but DALL-E etc are a bit low level for that, there's a lot of product development that needs to happen on top of that. A stock photo service doesn't seem to be the sort of thing that Google would get into.

Maybe it's an art-helper plugin to image editors? The Stable Diffusion based plugin sounds promising for this, but will _artists_ want to use it? Surely the point of art is (except for art making A Point) that the artist produced it with their talent? If you're not trying to make Art then maybe you need the stock photo service.

Perhaps I'm just not very imaginative, but I can't think of any use-cases that aren't either extremely niche, or are better served by a higher level style service that happens to use a model like this under the hood.


I think it’s really early.

But my guess is that the market for “programmatically created good enough images quickly” is larger than the market for “inspired, perfected hand drawn digital images”.


Does the new generated picture take into account of all previously generated image or just whatever is around the square, the first is amazing, the latter was a feature that was already there.

Regardless, this is a great way for people to fight the lack of detail in Dall-E which I think is one of it's largest flaw.


Just what's in the square I believe. The only difference here is one of UI, since they give you a canvas in which to place your generations.


This is moving fast !

Obviously, it’s going to be an incredible boon for content creation. I suppose that in the future it’ll make creating videos an order of magnitude easier, which will allow a single person or a small team to make a high quality movie where all the assets are generated, so that’ll really give us an eye into a lot of people’s imaginations, for better or worse.

To leave a thought provoking example, what’s going to happen when every adolescent has the ability to make a convincing deepfake?

It’ll put nation states in a similar position than they already have with crypto, where they wonder if they should ban, or regulate… doing nothing wont be an option.


Everyone says stable diffusion is a free alternative. Where do I get the weights without passing a gatekeeper?


They're currently hosted on Google in a way that you can download them via curl/wget. Here's a guide including the link: https://www.assemblyai.com/blog/how-to-run-stable-diffusion-...


I can't help but feel like they're adding this at this particular point since Stable Diffusion has announced they're releasing their 'inpainting' model next week.


I really doubt it's related at all, though everyone would think it looks that way. SD has only been out a week and this feature would have taken much more than that to build, test, enroll demo users, make a webpage for, etc.


I can't prove it of course but it wouldn't surprise me if they had this pretty much done already long ago (dall-e has been out for several months at this point). The actual implementation doesn't look like it'd take more than a few days to code honestly (and they've got quite competent coders over there). Only speculation of course.


Everything looks easier when someone else is doing it.


I have been working on an outpainting piece (in Photoshop) currently 10609 x 8144. I am very pleased to see more support for this, though hoping it doesn't kill my current flow.

Seems like it is currently not working on their site.


I cannot get this to work properly (in Safari). It just won't regenerate anything above or to the left of the image; it acts like I selected the opposite sides if I try it.


Looks like they pushed a fix. Now I'm getting funny "you get what you asked for" issues; if my prompt mentions a face, then using outpainting to create a 16:9 background behind it doesn't work so well - it just starts making more faces.


I would like to see Stable Diffusion and DALL-E to work with words instead of images.

Go from Hello to any topic, and see how it responds. Trained on everything ever written.

Seems like it would/could generate realistic discourse. Is that possible, to work with words instead of images?


While maybe not "as good as a human" creatively, wonder when this matures a little more, we'll see whole art/design departments go to the wayside and be replaced by stuff like this...


I love this idea of extending the canvas to build out the scene. It makes me wonder if anyone's tried using Poe's stories for illustrating with AI? His descriptive writing style seems ideal.


Lol I just want to be able to use the thing. How long is this waitlist?


How does Stable Diffusion compare to DALL-E in terms of features/capabilities?

I've used DALL-E/DALL-E-2, but have yet to try Stable Diffusion. Can someone give me some insight?


SD is very capable, especially considering it runs on like 6GB of VRAM (we don’t know on what A100 96GB VRAM clusters DALL-E runs), but you will need to be more specific with your prompts to achieve comparable results.

Img2img mode where you provide SD a sketch and a textual description of what you want to achieve abd it figures out result is a killer!

https://www.reddit.com/r/StableDiffusion/comments/wzlmty/its...

Krita plugin: https://www.reddit.com/r/StableDiffusion/comments/x209sb/pre...


Thanks for the info. I'll check it out. Img2img mode looks incredible.


Do you reckon we will have Prompt Engineers who are skilled at getting AI to generate what they want before long?


what kind of prompting is required for this?

I uploaded a digital painting, selected "Edit mode", added a generation frame and prompted "complete the painting in frame" ...but it just added a completed unrelated photo related to painting in that frame.


I guess prompting that is "similar" to the image. The output mine gave was pretty lackluster. I had to overlap the image significantly, and even then it didn't seem to take into account enough of the context to make something that resembled the style close enough.


Yeah, you have to first describe what the image is supposed to be and then it picks it up from there. Got some good results with digital art.


Is it broken right now? Makes my fans spin but it never finishes.


This is great news! I spent multiple hours doing this exact thing by hand only last week when creating new graphics for codeball.ai.


And every time I drag that little square reticle to fill in a 128x128 patch of an image, you can be sure it'll be a 15 second API call that I'm charged $0.25 for. Yipee! Very open.


In an invite system. I have to wonder if this is the "open", what's the "closed"?


Do you expect them to provide computing power for free?


Did we really get to the point that anything that isn't SaaS seems alien?

You know companies sold software that you paid once, and then ran as much as you wanted on your pc?


It doesn't seem odd to me that a product that involves an absurd amount of data and computing power isn't an easily consumable commercial product available for mass download.


The obvious counterpoint being what stable diffusion just released.


Why can't we provide it ourselves and skip the middleman?


No one is stopping you?


Sir can you point us to the weight url for dalle?


Can you develop it yourself?


Dalle is currently a cloud only service. How behind are you?


Oh sorry, didn't realize you wanted them to develop the model for you. And give it to you. I meant no one was stopping you from building what you want for yourself.


This is cool and useful!

Putting "Girl with a Pearl Earring by Johannes Vermeer" in the kitchen in 2022 does not look good!


It's 2022, women are allowed to enjoy cooking and baking just as much as men are.


Because depicting a woman in a kitchen is perpetuating the pernicious male patriarchy? Sorry we're not doing that. You might find some reception for this sort of thing on Twitter though.


What makes you think it's a kitchen? It looks more like a thrift shop, or a store room to me.


It's an open-concept home. It's chic.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: