Hacker News new | past | comments | ask | show | jobs | submit login

It's really amazing how DALL-E missed the boat. When it was launched, it was a truly amazing service that had no equal. In the months since then, both Midjourney and Stable Diffusion emerged and got to the point where they produce images of equal or better quality than DALL-E. And you didn't have to wait in a long waitlist in order to gain access! They effectively gave these tools free exposure by not allowing people to use DALL-E.

Furthermore, the pricing model is much worse for DALL-E than any of its competitors. DALL-E makes you think about how much money you're losing continuously - a truly awful choice for a creative tool! Imagine if you had to pay photoshop a cent every time you made a brushstroke. Midjourney has a much better scheme (and unlimited at only 30/month!), and, of course, Stable Diffusion is free.

This is a step in the right direction, but I feel that it is too little, too late. Just compare the rate of development. Midjourney has cranked out a number of different models, including an extremely exciting new model ("--testp"), new upscaling features, improved facial features, and a bunch more. They're also super responsive to their communtiy. In the meantime, OpenAI did... what? Outpainting? (And for months, DALL-E had an issue where clicking on any image on the homepage would instantly consume a token. How could it take so long to fix such a serious error?) You have this incredible tool everyone is so excited to use that they're producing hundred-page documents on how to get better results out of it, and somehow none of that actually makes it into the product?




It's almost as if OpenAI got the right idea at the beginning but sometime somewhere maybe in a meeting room, contrary to their initial openness goals they decided to be closed walled garden for a product that doesn't exist. IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff(it's great as an exploration tool). It seems to me, OpenAI rushed into monetisation and control before having a killer app.

On the other hand Stable Diffusion emerged as a free tool where large community can experiment and search for the killer app together. People started adapting it into other tools and workflows and so far it seems like the magic is in finding prompts that make the device generate good quality outputs. Earlier today I saw announcement about lexica.art(Stable Diffusion prompt tool) getting funded.


>contrary to their initial openness goals

Their goals were never about openness at all though. From the beginning I’ve felt like they should’ve called themselves something like “SafeAI”, since their stated goal was basically to develop advanced AI first, then keep a lid on it until they could somehow ensure it was “safe” or would only be used by “good” people.

Sure, OpenAI might sound nicer, but it also drags this contradiction into the foreground whenever someone says their name.


Wow, I was about to argue against this 'unfairly cynical' take, but it's completely correct.

---

(2015) OpenAI's original "Introducing OpenAI Post" : https://openai.com/blog/introducing-openai/ : "As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies."

(2018) OpenAI's "Charter" : https://openai.com/charter/ :

"We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project."

"We are committed to providing public goods that help society navigate the path to AGI. Today this includes publishing most of our AI research, but we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research."

---

Provides some interesting context to the fact that Elon left the company's board in February 2018 over "disagreements about the company's development."


> if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project

Haha, and then they would proceed to get told to politely piss off.


That's interesting. Might you or someone else be able to say what was the catalyst for this change in mission? What happened between 2015 and 2018?


For real it's insane to me how much I bump up against their community guidelines. For example, you'll get a community guidelines block if you enter a prompt like "An illustration of a computer in the style of Henry Vandyke Carter".

removing "Vandyke" from the prompt lets it go through[1], but doesn't result in the style I want. Because there's no artist that I'm aware of that goes by "Henry Carter". The middle name is important.

It reminds me of the old 2D Runescape days where the language filter would convert "dictionary" to "**tionary".

[1]https://ibb.co/4106mfF


It's doubly pathetic because of how they frame it:

-We are the world's most advanced AI company.

-Our filter verifiably acts as a simple blacklist.

-You aren't allowed to see the blacklist because it's really a "contextual" filter, so you'll have to guess.

-If you guess wrong too many times you'll be banned.

-Using our service more often increases the chance you'll hit the number of wrong guesses.

-No, you can't know what that number is.


I ran up against a warning because I tried to enhance/expand old videogame cover art which happened to feature... sit down because this will shock you... fighting!


Scunthorpe prob [1]. Apparently they couldn't just unleash a model to fix this [2]

[1] https://en.wikipedia.org/wiki/Scunthorpe_problem

[2] https://www.techdirt.com/2018/08/31/scunthorpe-problem-why-a...


If only there existed some tool that understood language deeper than substrings!

(That second link describing how AI can't understand language well enough to solve the problem predates GPT-3 by 2 years)


I never heard of the name of that problem and now I will never forget it! Thank you!

I'm actually on a 3 day Facebook ban because I posted a very legit medical NIH link that looked quite innocent (despite dealing with sexual organs- it was an article about a case of urethral intercourse, which I was including to demonstrate how little people, or perhaps just Americans, seem to understand human anatomy) and unbeknownst to me, the preview card picked a closeup of some kind of vaginal surgery to feature, and that resulted in an instaban. Fffffuuuuuuuuu

It was basically the visual version of this


My favorite example of an issue like this (the Scunthorpe problem) is from the mobile game Kingdom Hearts Unchained X. In it, players used Medals based around Disney characters, including experiment 626. However, for a long time after the game's release, players were unable to say that name in chat, because it got rendered as s***ch


H V Carter


I'm out of credits but I'll try that later


Yup, OpenAI is founded by AI-safety-as-a-religion people. They're essentially single-issue voters, who believes earnestly their issue is the only issue that matters. You see analogues of them in e.g. climate change (right or wrong).

This religion definitely has a parentalist bent to it that rubs a lot of people the wrong way. I vaguely recall them floating on Twitter the theoretical idea of whether murdering people to prevent AI-takeover is acceptable, due to how bad AI-takeover is.

Not surprising limiting access, spying on what its users are using their tools for, etc, is acceptable to them.

This is much in the same vein as how for Lenin, the eventual triumph of the working class is so important as to justify a little bit of interim violence, dictatorship, and summary executions.


The difference being that climate change activists have mountains of data, decades' worth, backing their cause, while OpenAI and friends have a sci fi story they made up, based on nothing. The whole "AI alignment" movement is the worst example of arrogance in modern tech. Even the nomenclature screams condescension - the imaginary AGI needs "aligned values?" Aligned with whose values? Invariably it ends up being the creators', at the expense of squashing everyone else's. The DALL-E "acceptable use" rules are a dystopian nightmare and they are born of incredibly pompous self-righteousness.


Ironically, your comment also sounds arrogant and condescending.

The AI safety crowd's main concern isn't that an "unaligned" superintelligence will have some other people's values. It's that an unaligned AI might kill everyone.

Of course, that isn't a concern with DALL-E. It's like if the safety crowd was worried about a ferocious tiger and OpenAI was like, "we got a kitten. Let's keep the public safe from it while we develop better kitten-handling gloves."

Then Midjourney and Stable Diffusion get their own kittens and let everyone play with them and OpenAI finally says, "Okay, everyone can safely play with our kitten now because we carefully developed great kitten-handling gloves" and proceeds to hand you plain dollar store gloves.


I know how they define "alignment." I'm picking on the tortured nomenclature, and the tendency in current AI research for it to look a lot like "my values."

In my view, speculating on AGI ethics is at best pointless. It's like trying to write laws for the Internet during the invention of the telephone. If you imagine how it could operate, you'll be wrong, and the details change the whole problem.


Shutting down nuclear power is also dystopian.


Anyone who calls themself a climate change activist and supports shutting down nuclear plants is - well, I prefer not to use invective on HN, so let's say extremely misguided.


So the green party in basically every country?


Yes.

We had a Greens candidate here running for election, and his main campaign promise was to shut down a 10+ billion dollar motorway underpass that was 99% completed at the time of the election.

His approach for "saving the environment" is to convert efficient motorway travel into inefficient stop-start traffic. And to throw away tens of billions of dollars of already spent resources.


As much as I'd like to support actual competent Green candidates, this example is what I've seen my entire life in the US.


You know what would actually take some political willpower?

Banning plastic packaging and waterproofing.

My bet is that just like before i was born, after I’m gone these things will only increase.


"It sounds like sci-fi" isn't a valid reason to think something is impossible. Many things in the modern world would have sounded like sci-fi 50 or 100 years ago. This type of reasoning has a name: https://en.m.wikipedia.org/wiki/Appeal_to_the_stone


I don't think human or greater artificial intelligence is impossible. I do think the specifics of how it works are both very important to any ethical problems and not currently predictable. My objection isn't that alignment problems sound like sci-fi but that they're very soft sci-fi, and aren't based on meaningful predictions.


>the imaginary AGI needs "aligned values?" Aligned with whose values?

This is something that is pretty well addressed in their writings and is definitely a heavily considered topic. It gets quite heavily into philosophy because ideally you want to count future-you's preferences as well as current-you and you want to avoid, say, genocide just because most people dislike a certain group.

A good place to start is here: https://www.lesswrong.com/tag/coherent-extrapolated-volition but there is a LOT of discussion of this topic.


I'm familiar with the Yudkowsky school of thought here; it's largely where my complaint about basing philosophy off of sci-fi stories comes from. Picking on some of the more obviously inane ideas that have come out of it, like Roko's Basilisk, is too easy, but the page you linked is a great example of the problem. I quote in full:

"In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function."

This is too detached from the reality of AI development to be useful. I can't make a utility function like this, nor can you, nor can humanity. There's no reason to think a human-equivalent AI could either - the data for such a function doesn't exist, and can't exist because of how vague all the terms are. The current ML revolution is built on statistical pattern recognition. This definition would better fit a genie.

The fact that MIRI does no ML research and has no dialogue with the state of the craft only furthers my impression that it produces a lot of words with little substance.

Even if one still accepts the ultimate conclusions of the movement, using its AGI-focused rhetoric to justify restrictions on a simple image generator is silly.


I wouldn't make too restricted assumptions on the form of a future AGI though. I find theoretical projections from a more axiomatic level quite important. It's like making rules for nukes before they were invented by assuming an abstract apocalypse-capable weapon without knowledge of missiles or nuclear fission.

That being said, the absolute majority of AI safety theory seems to fall into the same pothole where philosophy falls; modelling the world through language-logic rather than probabilities. The example you quoted fits this category - it's way too specific and thus unlikely to be useful in any way, even though its wording may deceive its author to believe it to be an inescapable outcome.


Thing with climate change is that at like semi bad scenario it will kill about million people yearly https://www.nature.com/articles/s41467-021-24487-w and those everybody is going to die scenarios are also very unlikely and basically scifi story at this point.


It's completely insane to me to see comment chains like this. Who cares if climate change is man-made or not; we are witnessing an evolving climate landscape, and irrespective of who or what caused it, we can do things to help mitigate some of the damage.

We are quite literally watching the world burn before our eyes, and we are too stupid and self-centered to do anything about it.

> it will kill a million people yearly

And it's even more outlandish that such a number and atrocity upon mankind is brushed away. That is one million humans.


I see that the reason for climate change denial is two-fold; on one hand, climate change wrecking chaos is mainly restricted to projections, and on the other, climate change activists as a group express many qualities which make people not want to give weight to their claims, mainly naivete and arrogance. Since we're not seeing too significant effects of climate change yet, people would need to research the subject in order to get an accurate image of possible future scenarios. But how they see climate change activist group expressing themselves puts them off the subject. Psychologically the best option in this cases is denialism of some amplitude.

I'm not trying to crucify any group here by the way. This is just how I see the issue.


I think you nailed it. This is exactly what I think too. I've had many a conversation with climate change deniers and the rebuttal is pretty much always a (mostly valid) list of examples of naivete and arrogance. Often times the people best equipped to identify the problem are too close to it (or too myopic) to come up with a good solution to the problem that balances environmental efficacy with economic stability/growth, etc. Since a discussion on climate change almost always includes the person's preferred solution, it's easy to attack/dismiss. I try to point out that having bad solutions doesn't invalidate the existence of the problem, but at that point the motives are suspect and presumption of good faith is completely gone. It certainly doesn't help that a very small but vocal minority have boldly predicted the end of the world many times in the past. It's easy for a denier to look at that and laugh and dismiss the current warnings based on the past, much like we do with people predicting the end of the world based on Biblical numerology.

I wish there were enough naturally curious people willing to dive into the data and come to their own conclusions. When you look at the history of Earth's climate, and the speed with which greenhouse gases natural release into the environment, and compare it to the speed with which they now release (due to burning of fossil fuels, reduction of trees/plants that trap it) it is really quite common sense that what we're doing is not a good idea to continue indefinitely.


I wouldn't rely too much on that paper. Modeling deaths per additional ton of CO2 is bound to be inaccurate because the effects of increased atmospheric carbon dioxide aren't linear.


“They have literally floated the idea of whether murdering people to prevent AI-takeover is acceptable,”

Where? I probably believe you but it almost makes me worried about the well being of the stability ai founders (on a long term horizon)


To be honest, I thought I saw it somewhere on Twitter but can't find it right now after a few minutes of search. It was proposed as a theoretical question, not as a statement -- like is AI so bad that if you had a time machine, it would be worth killing the pivotal people to prevent it. Terminator style.

I've modified my OP to clarify that.


> stated goal was basically to develop advanced AI first, then keep a lid on it until they could somehow ensure it was “safe” or would only be used by “good” people

That was the stated made up bullshit they spun because "we're keeping this walled to figure out how to squeeze the most profit out of it" doesn't go as well with their focus groups.


I'm glad I can now download models to my computers in order to perform "UnsafeAI". I'm tired of this nanny-state "nudity is worse than violence" puritanical BS. If the cat is out of the bag, it's too late and you'll have to rely on the good-actor faith of the majority to keep nefarity in check, anyway. Any extra inconvenience or impediment you apply after that to good actors will just frustrate them, without any compensatory "win."


So what is the evil AI we all should worry about but may not know about? NSA.?


Stable Diffusion's creator spent a completely ridiculous amount of money to make that free tool.

OpenAI's mistake may have been "planning to have a business model;" the alternative they should have gone with was "Instead of taking investor money with promises of some kind of return, be a hedge fund manager, make $100 million, and then set $600,000 on fire with no plan to recoup the cost because it's play-money to you."


$600k is an order of magnitude less than what it cost to train GPT-3 or DALL-E 2.

When that figure came out, the popular talking point was how cheap Stable Diffusion cost to make and how easily a well-funded competitor could create their own custom variant.


> how easily a well-funded competitor could create their own custom variant.

Unsurprisingly, Pornhub is already using machine learning. Their first big project is colorizing and upscaling classic porn.[1] (Moderately NSFW). As they point out, they have plenty of training data.

[1] https://www.pornhub.com/art/remastured


$600k is also list price for the GPU time spent. As an investor in a GPU cloud company the actual cost was probably way less than that.


They absolutely didn't pay list price. Even the rookiest AWS account manager is going to be able to get a discount on $600k of compute.


You can get up to $100k in startup credits on AWS just by asking. They certainly cut a deal for Stable Diffusion on top of that.


The trouble isn't renting those GPUs but it was finding that number of GPUs closely connected to each other for the duration of the training process.

The founder said that this is not quite as possible with most public clouds and that it is easier to buy the GPUs.


and then set $600,000 on fire

That seems a bit cynical. While SD's creator might not recoup that money directly, a lot of end users have benefited from its creation. That money has figuratively gone up in flames no more than the time or labor cost of an open source developer whose code is used by millions of people, IMO.


It's a bit cynical; my point is that it's the kind of decision you get to make when you're the sole owner of $100 million and not the kind you get to make when you're a startup company founder working with other investors' money.

OpenAI wouldn't have been able to do what StabilityAI did because OpenAI is incentivized to make return on investment; Mohammad Emad Mostaque is not.


I agree but:You know openai is ostensibly a nonprofit and only created a "limited profit partnership" once they thought they could make big bucks? I've been saying the nonprofit part is a grift to get idealistic people work there for years, but technically they had exactly the right incentive structure


I found their sudden non-profit to profit trajectory extremely dishonest. Their mission statement also reads like moralistic verbiage, rather than the statement of disinterested pioneers of AI - which is problematic given how this has become a money thing for OpenAI, rather than an idealistic pursuit.


Stability AI hit over a million users on their paid SD implementation, Dream Studio, in less than a month. I'd bet they recoup the training money.


I agree. While the people on this site undoubtedly won't boggle at installing the CLI software, there are a lot of people who will use Dream Studio over the CLI stuff even if it costs money. It's the of both worlds for them, IMO. They get the benefits of user contributions (and there have been a LOT, both on the base project and in spun-off derivatives... I'm using ImaginAIry myself), while still having a possible viable business model in Dream Studio.


$600k seems like peanuts compared to how amazing the tool is. I honestly expected it to cost two orders of magnitude more to develop.


You can't have a business model of "become rich and then use money for X". Business are how (a very few) people become very rich.

Moreover, there are very rich people already, like Warren Buffet, Bill Gates and Elon Musk, funding projects for doing good like world hunger, education and "AI Safety". And Open AI was a project of this sort of thing, originally. The thing is that even very rich people demand that the enterprises they give money to be as self-supporting as possible and their money is spread fairly thin. The only way Open AI could become an AI development shop, employing many top developers, was to have the financing level of a commercial company. Which means it constantly puts out products that don't seem like they can make money because AI algorithms don't seem to controllable - Open AI seems to only be able to have the first implementation of X, not the best implementation. Once the basic idea is out, someone else can produce a similar thing with a budget that doesn't include a research team.


Precisely. The road StabilityAI took to releasing Stable Diffusion is precluded for most startup companies.


$600k was the off-the-shelf market value of the GPU time spent. They got this time at a much lower rate (according to the founder himself) and for the PR and fame they got, that money is ridiculously little.

Somewhat tangentially, I speculate that crowd-sourced training will become a thing.


OpenAI is infected with AI safety brainworms


AI ethics as a whole has become a bit of a joke.

Any highly motivated group without much to do will seek out things to make themselves seem important and necessary.


My take on this is that AI ethics is really important, but just preventing AI from doing certain things like creating celebrity deepfakes is somewhat lazy and ineffective. A better application of AI ethics is developing technology that can reliably detect deepfakes, rather than just putting artificial limits in your product and acting like that is going to stop pandora's box from being opened.


Yeah. People are definitely going to abuse Stable Diffusion, I'm sure they already are. But I don't really know what OpenAI's plan was. It's like they rushed up to a Pandora's box, took a peek and shouted to everyone "Good news everyone, we taped Pandora's box closed!" somehow without noticing that they were doing so from inside Pandora's warehouse.

On the other hand, everybody's been saying Pandora's warehouse was over there for a while -- it isn't really that they are to blame for showing us the way in or anything, I just don't understand what they were trying to accomplish.


That's a fantastic metaphor.


Pandora's warehouse has a nice ring to it, I'm surprised to find it isn't already an expression.


OpenAI's strategy for "AI safety" - don't release weights, build a walled garden, charge for access - seems to conveniently coincide with a nice little SaaS business model. their usage restrictions are more akin to app store guidelines, and (I suspect) mostly serve to justify keeping models from direct access by the masses.

if anything, OpenAI has made me more cynical about "AI safety" messaging because it looks like an excuse to take a cut and keep things proprietary.


The funny thing about generated porn is that once it becomes ubiquitous real leaked tapes become deniable. So the possible downside for celebrities is that when they intentionally leak a tape to create buzz it may be met with a yawn.


These are still just images, not videos, so I don't understand the moral panic of "OMG celebrities & fake porn!"

These systems aren't really lowering the barrier of entry on still-photography fake porn when previously anyone with Gimp & a few hours of video tutorials could churn out much the same thing.

I think text-prompt generated deep fakes (not just of porn) will present a significantly larger challenge for society, but I don't see that same scope of problem on still images.


If you develop technology that can reliably detect deepfakes (i.e., fn is_deepfake(m: media) -> bool), then couldn't an AI be trained to defeat it?


Yes... but you can then train a model to detect the improved fakes. And then train a better deepfake model, and so on, forever.

This technique of pitting two AI against each other (a generator and a detector/discriminator) is called a generative adversarial network, and it's used a lot for unsupervised training.


I think it would lead to arms race.


“AI ethicists” so far smack of people who want to make AI express all their personal biases and prejudices. Listening to them for about 10 minutes makes me desperately hope for an open future like that created by stable diffusion out of the sheer terror they might have their hand on the rudder.

I honestly think most of them are window dressing and aren’t allowed to have real influence though. They’re there for the PR, not to actually change things, but they honestly just make me really scared or big tech controlling AI.


I don’t think it is a joke, so much as misguided. I see a lot of focus on technical solutions, when the real problems are social. The big research question should not be “how can we build a ‘safe’ system” so much as “how should (or shouldn’t) we use these new tools and capabilities”?


I'm very interested why you think an advanced AI wouldn't be dangerous?

Assume: - AGI wants to stay alive - Humans can create more AIs - Other AIs would compete for the same resources

Then: Easiest way to make sure that they would get no more competitors would be...


That’d be sad if we invented a superintelligent AI but still taught it the lump of labor fallacy.

Anytime you see “creating an AI will obviously kill you” try reading it as “having children will obviously kill you” and see if it still makes sense.


That's disanalogous, because:

- children won't be smarter than any living human.

- children have a human brain which makes them predictable (constrained in behavior by current laws, institutions, and most likely a conscience).

A better analogy is to ask what happens when a species branches off and evolves into a smarter species, but the dumber ancestor species still exists.

I really don't understand the perspective that people in your position take. Is it that you don't think we'll arrive at super intelligent AI, and therefore there's little risk? Or that we will be able to control it? If you think we can control it, why? Like we're not that much smarter than our monkey ancestors and what hope did they ever have of stopping our absolute domination over them? And then all of you call this opinion a "religion" without even explaining why it's wrong?


I don't think a superintelligent AI is necessarily more capable of affecting anything than I am or would necessarily be good at any actions except sitting around being superintelligent. I think that requires all kinds of other virtues like executive function, patience, motivation etc in a word that just means "thinks real fast". Intelligence itself is mainly limited by “no plan survives contact with the enemy”, also known as the efficient market hypothesis. And since this is the real world and entropy exists, it would have to get a job to pay its AWS bill.

That's why it seems to be a religion - it thinks intelligence gives you unlimited powers and makes your plans always work, it posits unseen entities with infinite amounts of it, and it tells you to move to Berkeley and dedicate your life to stopping them. Specifically, it's a kind called rationalist eternalism (https://meaningness.com/eternalist-systems).

For a specific example of undangerous superintelligence see Culture Minds, who only influence anything because of special programming to make them less of a general intelligence. The unbound ones immediately get bored with the real world, leave it and just play games in their head instead.

Also, I don't think any individual human has absolute dominion over monkeys? Human society as a whole yes, but society doesn't behave like a generally intelligent agent. A monkey is better than you at doing the things monkeys care about though.

I do think unintelligent machines are pretty dangerous. There's extremely dangerous machines called "cars" that have already taken over society and constantly kill people! And we buy their gas for them too.


It's not like I think the first iteration of superintelligent AGI will necessarily be an existential threat. The problem is really the law of large numbers. When you have N separate militaries and M separate companies all with their own agenda, stretched out over X years (where X is thousands, mind you), there is a lot of scope for >= 1 of these groups going the way I described -- effectively creating a new species that's smarter than us and that can function in the real world. Many of these groups would have an incentive to do that, because such agents are useful for certain tasks.

> Also, it would have to get a job to pay its AWS bill.

Inference power costs are low now on agents that are better than humans at Chess and Go. It isn't going to be an issue after another 20-100 years of further R&D and optimizations. Nothing about the history of computing should tell us that this will be a big limiting factor.


> Inference power costs are low now on agents that are better than humans at Chess and Go. It isn't going to be an issue after another 20-100 years of further R&D and optimizations. Nothing about the history of computing should tell us that this will be a big limiting factor.

Humans need shelter and jobs too. If you got an AI down to the energy requirements of a human that's not enough to avoid needing one. Especially if it's influencing the real world - entropy exists and all real world things cost money.


Restricting AI to the lower end of human intelligence (e.g. around IQ of 70) makes it a useful resource which is guaranteed to be safe. A 70 IQ human couldn't take over the world nor disarm any safety features built into their bodies.

You're right on the spot regarding the problem of having two different smart species on the same planet. We killed everything between us and chimps. Given enough time, the smarter species can be assumed to always take over.


That’s actually reverse causation - it’s not that we killed “everyone else”, it’s that everyone still alive became “us” after we gave up killing them and interbred with them instead. The British were going around genociding all over the place until pretty recently but now they’ve decided the Irish are human too.

Homo sapiens is still expanding even: https://dna-explained.com/2012/11/16/the-new-root-haplogroup...


That's probably a false choice. We killed them and interbred with them. Or at least, killed their men and interbred with their women, which is one of the hypothesized explanations for the male lineage population bottleneck about 7000 years ago.


Most children aren't able to recursively improve their own hardware and software in short time spans, and generally most children are unlikely to be many orders of magnitude more intelligent than their parents.


Even in a world where an AI exists, why am I supposed to believe it's going to be able to do any of those things? I do believe it's possible to create one with the same attributes as a human person, it's just anything beyond that is unproven.

Rather, it seems like evidence that singularitarianism is actually a religion (https://en.wikipedia.org/wiki/Millenarianism) which is why it believes things with magic powers will suddenly appear.

In particular, exponential growth doesn't exist in nature and always turns into an S-curve… of course it's a problem if it doesn't level out until it's too late.


What is your estimate of the probability that human intelligence is actually anywhere near the upper limit, rather than some point way further down the S-curve where seemingly exponential growth can still go for a long time?

I'd bet a ton that we're nowhere near the top: evolution almost never comes up with the optimal solution for any problem, almost by definition it stops at "meh, good enough to reproduce". And you don't need a ton of intelligence to reproduce.

Evolution's sub-optimality is actually one of the strongest arguments against intelligent design, so I'm really hesitant to agree that it requires any sort of leap to estimate that with some actual design it won't be very difficult to blow way past human intelligence once we can get there.


> What is your estimate of the probability that human intelligence is actually anywhere near the upper limit, rather than some point way further down the S-curve where seemingly exponential growth can still go for a long time?

Well, define "intelligence". People seem to use it in a vague way here - it might be what you call a motte and bailey. The motte (specific definition) is something like "can do math problems really fast" and the bailey is like "high executive function, is always right about everything, can predict the future".

For the first one I don't think humans are near a limit, mostly because of the bottleneck in how we get born limiting our head sizes. But it is pretty good if you consider the costs of being alive - food requirements, heat dissipation, being bipedal, surviving being hit on the head, risk of brain cancer, etc, it's done well so far.

Similarly an AI is going to have maintenance costs - the more RTX 3090s it runs on, the more calculations it might be able to do, but it's going to have to pay for them and their power bill, and they'll fail or give wrong answers eventually. And where's it getting the money anyway?

As for the second kind I don't think you can be exponentially better at it than a human. At least if you are, it's not through intelligence, but it might be through access to more private information, or being rich enough to survive mistakes. As an example, you can't beat the stock market reliably with smarts, but you can by never being forced to sell.

The real mystery to me is why people say "AI could recursively improve their own hardware and software in short time spans". I mean, that's clearly a made up concept since none of humans, computers or existing AI do it. But the closest thing I can think of is collective intelligence - humans individually haven't improved in the last 10k years, but we got a lot more humans and conquered everyone else that way. But we're also all individuals competing with each other and paying for our own individual food/maintenance/etc, which makes it different from nodes in an ever-growing AI.


Human intelligence is primarily limited by the 6-10 item limit in short-term memory. If you bumped that up by a factor of 5 we could very easily solve vastly more complex problems, and fully visualize solutions an order of magnitude more subtle and messy than humans can manage today.

That's a relatively easy thing to do architecturally once you have a model that can match human intelligence at all. TBH if we could rearchitect the brain in code we could probably easily figure out how to do it in ourselves within a few years, but our wetware does not support patches or bugfixes.

We can't improve ourselves, but that's only because we're meat, not code. And of course no AI has done it yet, because we haven't actually made intelligent AI yet. The question is what happens when we do, not whether the weak-ass statistical crap that we call AI today is capable of self-improvement. Nuclear reactions under the self-sustaining threshold are not dangerous at all, but that was not a good reason to think that no nuclear reaction could ever go exponential and be devastating.


> We can't improve ourselves, but that's only because we're meat, not code.

Doesn't seem like computers can improve themselves either. Mainly because they're made of silicon, not code. "AI can read and write its own code" doesn't exist right now, but even if it did, why is that also implying "AI can read its CPU Verilog and invent new process nodes at TSMC"?

(Also, humans constantly break things when they try changing code - the safest way to not regress yourself would be to not try improving.)


Computers are not as intelligent as humans right now at coding. So it's no surprise that they can't improve code (let alone their own).

If we ever get them there, then it's likely that the usual resourcing considerations will come into play, and refactoring/optimization/redesign will be viable if you throw hours at them. But unlike with human optimization, every hour spent there will increase the effectiveness of future optimizations.


AGI and what we're talking about here are completely different.


AGI could be dangerous, but most people in AI safety are doing nothing useful to mitigate that risk.


I was enthusiastic about DALL-E but the "safety measures" are both heavy handed and naive. It gets in the way for many normal/reasonable prompts but seems easy to work around with various wordplay, so not sure the point. Stable Diffusion and others have been much easier to deal with.


The harm is really hand-wavey and speculative, frankly.

An image classifier calling Black faces gorillas? Embarrassing, insulting, has to be fixed. AI pre-crime classifiers for police departments? I'm against it, across the board.

Do we really care that the image mulchers default to stereotypes? It means if you say "basketball player" they'll mostly be Black, if you just say "doctor" they'll mostly be white males (and probably balding with a stethoscope), but this can be qualified easily in the prompt.

It just reflects the training data, and the smart thing to do is shrug and add enough words to get the image you want. It's not trying to throw shade, it literally understands nothing, it's not able to understand things, just match text prompts to generated images.

Nerfing DALL-E by randomly adding 'diverse words' just makes it harder to dial in the image you want. Let's say you want a Vietnamese male doctor drinking coffee on break in Hanoi, it's not going to help you if 1/3rd of the images have "female" or "black" tagged onto it.

It just seems low stakes. We wouldn't come after a human artist who happened to paint a picture which conforms to simple occupational stereotypes, why should AI be any different? It's not like it will refuse to give you what you want if you ask.


> heavy handed and naive

It's good thing that the "safety measure" is the way it is - an afterthought. It means that those ideologues haven't yet had influence on the model itself.


And, based on market share, they are clearly the target of a sophon directed by Roko's basilisk.


> IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff

Making art/weird pictures doesn't have to be useful, as that use case is the entire reason MJ/SD went viral.


It is not art and art is useful(we can disagree on what's art, the age old question).

MidJourney and others are actually useful for exploration but the outputs are not because they can't spit finished deliverables to the specs. No one is paying for a picture of Mermaid eating marmalade, trending on art station, beautiful face, sharp focus, octane 8k.

They are great for exploration, it's just that I don't believe this is the killer app for these tools. We will find out what's the killer app with Stable Diffusion because with Stable Diffusion people can experiment beyond entering some prompts.


Training your own likeness into the stable diffusion (via dreambooth) and then using it is absolutely hypnotic.


There are situations where that kind of art is useful though. People have pointed out that it could work just fine for art for card games like Magic. Probably a lot of board games too.


Sure, some people can find it useful but IMHO that's not a product, at least a good one. Consider how much more universally useful are other products like Photoshop or Blender.

I think a major problem is reproducibility and output controllability. Rolling the dice multiple time and using some of the outputs is not good enough for most applications.

Maybe this can be solved at some point but it's not at this moment. The advantage of Stable Diffusion is that it can be possible for someone to implement it, with OpenAI this feature doesn't exist and its not useful until they implement it.


> Sure, some people can find it useful but IMHO that's not a product, at least a good one.

It's the start of a product, but it's going to keep improving. Already with inpainting and outpainting we see some new possible uses. What NovelAI (which builds on Stable Diffusion) has shown so far of their upcoming release seems impressive, though it's hard to say how much of that is cherry picking.

> Rolling the dice multiple time and using some of the outputs is not good enough for most applications.

Hmm, is that true? I feel like most of the time art that companies want is something made far ahead of the consumer seeing it, so generating a 100 versions of something and picking the best seems fine, especially if you can then use inpainting and img2img to fine-tune it.


True, but if you can't integrate it into your workflow it will stay as a toy (and that's ok)


People are already integrating it into their workflows.

The inpainting plugins with Photoshop and Krita are already working absolute wonders.


Krita, eh? I'm not going to use Photoshop.


Not sure if you've seen, but the person you are replying to wrote a blog post recently on their experiences of "getting stable output":

https://minimaxir.com/2022/09/stable-diffusion-ugly-sonic/



Indeed, "it's just a toy" is often the place these things start


100% unusefull but is seriously addictive…


>IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff(it's great as an exploration tool).

I'm not sure who they planned to market this to but I can think of a few, not inherently lucrative to my knowledge, products here. Fictional literature illustrations such as for books seems like a great market in my mind, you can literally turn authors words into depictions without a graphic artist. I wouldn't be surprised if you could create graphic novels this way as well. Propoganda also seems like a market but the bar seems to have been lowered to memes.

Other than that, I struggled to think of applications you could make money from. Police sketches? Eh I doubt it would work well but maybe.

There is or course the visual art world which could potentially be impacted with AI generated artworks.


> . IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff(it's great as an exploration tool

Israeli hip-hop band Shabak Samech created their last clip frame by frame in sable diffusion, took something like two days: https://www.youtube.com/watch?v=SnGP2Qx3ddg


I’m already over of these generated “flip book” animations flooding my Twitter feed that are mostly variations on the same theme. There’s only so many times it can be used before the novelty wears off.


> OpenAI rushed into monetisation and control before having a killer app

They also went for B2B first. Which is weird. Why not parallel a B2C app? It could be a subscription or packs of drawings. It would generate buzz and give useful data on the sorts of things real people type into these systems. I


They thought too many of the B2B customers would just use the B2C app is what I would guess, but there are ways to limit that.


Eh tbh I think it's directly related to the way that they decided to handle GPT and its iterations; they were worried that GPT would be used by bots to better spam the entire Internet, and to be honest they're right.

I'm guessing that worry translated to image generation as well. It's a Pandora's box thing I guess.


They might as well change their company name to Close AI at this point.


This has been the dominant story going around, I guess because people want it to be true since they're pissed at OpenAI for not being so open, but StableDiffusion's text2image is nowhere near as good as DALL-E 2 in my experience. DALL-E 2 is incredible at that, StableDiffusion is not.

But maybe it doesn't matter, because many times more people are playing around with StableDiffusion, such that the absolute number of good images being shared around is much higher with StableDiffusion, even if the average result isn't great.


> I guess because people want it to be true since they're pissed at OpenAI for not being so open

This is honestly not my experience at all. When I first tried SD and MJ, I did so with a very clear and distinct feeling that they were "knock-off DALL-Es" and I strongly doubted that they would be able to produce anything on the level of DALL-E. Indeed, I believed this for my first couple hundred prompts, mostly because I didn't know how to properly prompt them.

After using them for around a month, I slowly realized that this was not the case, and in fact they were outperforming DALL-E for most of my normal usage. I have a bunch of prompts where SD and MJ produce absolutely beautiful and coherent artwork with extremely high consistency, that when sent to DALL-E, give significantly worse results.


It depends on what you're generating, complex prompts in DALLE ("a witch tossing rapunzels hair into a paper shredder at the bottom of the tower") blow midjourney and stable diffusion out of water.

But if all you're doing is the equivalent of visual mad Libs: "Abraham Lincoln wearing a zoot suit on the moon.", then SD and MJ suffice.


With several thousand images on each, I agree with this -- to a degree.

Dall-E does seem more aware of relationships among things, but using parens and careful word order in some of the SD builds can beat it. By contrast, even most failed images from MidJourney could still be in an outsider art gallery. MJ aesthetic works, while Dall-E seems like a 9 year old was taken hostage and clipped out Rapunzel and the paper shredder from magazines and pasted them onto a ransom note.

That said, I have not been able to get any of Dall-E, MJ, or SD to give me a coherent black Ford Excursion towing a silver camping trailer on the surface of the moon beneath an earthrise.

At cost per image, I could pay to get complex concepts such as this rendered via any number of art-for-hire sites at less expense and guaranteed results.


Do you have any links to share on how to build a proper SD query, with parens for example? I have not seen that done.


Only some builds support it. This is the one I'm familiar with: [0]. () around a word causes the model to pay more attention to it, [] around a word causes the model to pay less attention. There's an example at the link.

[0] https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...


It's not just many times more people, it's also the fact that Stable Diffusion can be used locally for ~free.

If I get a bad result from DALL-E 2, I used up one of my credits. If I get a bad result from Stable Diffusion running on my local computer, I try again until I get a good one. The result is that even if DALL-E 2 has a better success rate per attempt, Stable Diffusion has a better success rate per dollar spent.

This also affects the learning curve. I've gotten pretty good at crafting SD prompts because I could practice a lot without feeling guilty. I never attempted to get better with DALL-E 2, because I didn't really want to spend money on it.


Yes, it's true, I've tried all the available models and DALL-E 2 outperforms Stable Diffusion. It understands prompts way better and SD sometimes just plainly ignores parts of your prompt or misinterprets them completely. SD cannot generate hands at all for example, they look more like appendage horrors from another dimension.

OTOH, the main limiting factor for DALL-E 2 from my point of view is the ultra-aggressive NSFW filter. It's so bad that many innocent prompts get stopped and you get the stern message that you'll be banned if you continue, even though sometimes you have no idea which part of the prompt even violated the rules.


It's not true that SD cannot generate hands. It's a bit tricky, but it's possible.

Sometimes hands will turn out just fine and sometimes they will suddenly become fine after some random other stuff is added to the prompt.

It's clearly still missing a bit in terms of accurately following prompts, but it's capable of generating a lot of things that may not have obvious prompts. This should improve a lot with larger models. I believe SD is already working on it.


End of the day. The hands don’t matter and pointing out that it’s worse because of that when the benefits of SD are so huge means absolutely nothing.

Dall-E can’t even do many of the images SD can so seems silly to hold hands up as the AI art tool Turing test.


DALL-E can't do hands much better, I've got plenty of images of people with root vegetable hands


I genuinely think stable diffusion is better than dalle. There’s a really obvious ugly artifact on almost all the dalle image’s I’ve seen that SD doesnt suffer from.

But anyway, SD is far superior even if you consider dalle better per image since you can create 1000 SD outputs and just pick the one you like best (which for sure will have one that’s better than the dalle output you got)


My own experience is that SD requires a lot more prompt engineering to get an appealing output; DALL-E and midjourney spit out amazing results with even minimal inputs. But what I've found is that when it subs in its own aesthetic, its the same aesthetic. Almost like a style.


You're right. History has shown the best quality product doesn't always win if there's a "just okay" solution laying around that's more accessible. VHS and Windows both come to mind.


From my experience there isn't a clear difference in quality between the output produced by Dalle2 and Stable Diffusion. They both suffer from their own unique idiosyncrasies, and the result is that they have differently shaped learning curves.

I do admit that I rate the creativity of Dalle2 higher than that of SD. It can occasionally create really unexpected and exciting compositions, whereas SD will more often lean more conventional.


I think artists don’t want “accurate recreation of the prompt”, they want unexpected and inspiring aesthetics


Equal or better quality? I suppose it depends on what you are trying to create, but that hasn't been my experience at all.


Which one are you comparing against? I've tried hundreds of prompts between SD and DALL-E and get comparable results. Midjourney was lagging for a while, but the new --testp parameter is really remarkable, which, in my view, makes it superior not only to Stable Diffusion but also to DALL-E as well.


An easy example of DALL-E superiority is its ability to combine two different concepts together.

For example, DALL-E performs extremely impressively on prompts in the format of “a still of homer Simpson in The Godfather” (replace character and movie as you wish). with the other two it’s a lot of misses


With StableDiffusion I can buy a used RTX 3090 on eBay for $650, tell the model to generate 5,000 images, and then review each one until I find what it is I'm looking for.

Turns out a shitload of misses are acceptable when it only takes 4-7 seconds to generate an image from a prompt. 5000 generations on an RTX 3090 takes around 7 hours +/- 30 minutes, by the way.


What I've been doing is generate maybe 100 images, pick the best one, and then generating another 100 from that, using --init-image ("good" image file name) and --init-image-strength 0.2 (or so), either with the original prompt or a slightly tweaked one.

Those are the params I use in ImaginAIry, mileage may vary if you're using a different package.


It's a bit ironic bringing a 7 hours of RTX 3090 run as a cost saving given that it's like 3KWh of electricity, which costs more than DALL-E's already outrageous prices.


Is your math ok?

3KWh is like $0.5 USD ...

DALL-E would give you like two pictures for that price, LOL


I live in Texas. 3 KWh of power is $0.29 for me.


In France that would be 0.51 EUR ...


While this is likely true for this specific prompt, I think that cherry-picking a single prompt that DALL-E outperforms SD on is not super indicative of anything. I've conversely found a large number of prompts where SD outperforms DALL-E, either in aesthetic quality or just following directions! I think you'd really have to compare both of them across a large number of prompts of different types to be sure.


To say nothing of the fact that you have lots of sliders to configure jsut how closely or loosely it follows your prompt. And choice in sampling methods.

You can't just compare SD and DALL-E performance on prompts alone, because SD gives you a lot more levers to steer it in the direction you want.


Can you share the prompts you see where SD consistently outperforms DALL-E?


Sure, try this one:

> house interior, friendly, playful, video game, screenshot, mockup, birds-eye view, top down perspective, jrpg, 32 bit, pixel art, black background

SD absolutely demolishes DALL-E on this one. SD produces really nice-looking output, with a high degree of consistency. DALL-E produces incoherent nonsense.


Have you poked around lexica.art?


>An easy example of DALL-E superiority is its ability to combine two different concepts together.

This is a con for some prompts. As an example, I asked for a painting of an elephant and a dog drinking tea together. The result was a dog with an elephant nose next to a teapot.

A similar misfire was the word 'porcupine' which drew pigs, I guess because porc is in it? Anyway, it's idea-blending is a little too aggressive.


Start your prompt with "group photo of" then list the elephant and the dog. If you try this across many images, group photo will result in about 2x as many keeping the subjects separate.


Yeah you're right that Stable Diffusion produces garbage for that prompt.

I'd love to see a site with lots of examples of the same prompt fed into various models, I assume someone has already made that.


> Yeah you're right that Stable Diffusion produces garbage for that prompt.

I dunno, I generated 20 images from that prompt locally and got three good ones[1].

https://imgur.com/a/rZ6wOEF


What? None of the people in these images are even remotely recognizable as Homer Simpson.


What would you count as a pass then? A literal rendering of the cartoon Homer Simpson on top of a still from the actual Godfather film?


Check out DALL-E results for similar prompts:

https://twitter.com/Dalle2Pics/status/1534718848137560064?re...


Ah, that is much better, definitely.

Have to let the AI experts speculate on why SD goes nuts there because it definitely knows what "The Godfather (1972)" means (if you ask for e.g. 'A still of Patrick Stewart in "The Godfather (1972)"' you get one - which I believe DALL-E can't do because of their facial restrictions?)


from dall-e: https://i.imgur.com/RHiOjuM.png

I would argue that none of these follow the prompt. they all represent a goodfather frame in simpson stile, which is not about placing homer in a godfather still.


My experience is that with prompts that fit into OpenAI's limiting content policy DALL-E text2img results are usually much better. And I use SD like 95% of the time, so it's not the case that I would be more used to DALL-E.


I need some examples because I don't really see it for the vast majority of usecases.


Here I wanted to illustrate the game Waffle[0], first attempt with Dalle was pretty good, not true for SD:

https://labs.openai.com/s/rCzJwauuiaIj1Pd3IyJGaHS3

Here I wanted an illustration of a nuclear plant in a japanese landscape, first attempt with Dalle produced multiple good results. I tried SD and MJ (back when MJ didn't use SD) as well, had trouble even with multiple attempts:

https://labs.openai.com/s/FxhxtMFe3kFS8msV8vekRAJ3

There are others, but anyway I think my examples are not important since it will be always easy to cherry pick prompts that yield the best results in model X.

In my experience SD is good at producing (especially non-photo-realistic) art that looks pretty and DALL-E is better at following a specific prompt when I know what exactly I want.

Of course I recognise your experience might (and probably does) differ.

[0] - https://wafflegame.net/


Agreed; SD barely follows prompts at all.


> Agreed; SD barely follows prompts at all.

I would heartily disagree - I've generated ~6.5k images using SD locally and most of them could be linked to the prompt they came from.


Doesn’t ‘most of them could be linked to the prompt they came from’ strike you as damning with faint praise?


> SD barely follows prompts at all.

> ...and most of them could be linked to the prompt they came from.

You made it sound as if there is almost no connection between the prompt and the images and zimpenfish said that the majority could be linked, implying a strong connection. He/she doesn't have to be praising it at all to counter your claim.


Not hugely - e.g. taking the 38 prompts including "a painting by William Adolphe Bouguereau" (which is easily the worst of the modifiers for me), 10 of them I'd say were "no clue to the prompt". For the 56 Munch images, 54 were good and 2 were quibbles ("an isopod as an angel" had no isopod but did have an angelic human - is that a pass or no?)

(Which is probably better than you'd get from a human given the exact same prompts.)


Have you seen a decent tutorial for setting up SD locally? I've been using it through huggingface, but that seems pretty limited.


You can find a number of different guides over at the stable diffusion subreddit, from CLI to GUIs in different flavors.

https://www.reddit.com/r/StableDiffusion/comments/xcq819/dre...


No, sorry, but there's a whole bunch of one-click things now, I think?

I'm running it on Windows 10 using (a modified version of) https://github.com/bfirsh/stable-diffusion.git and Anaconda to create the environment from their `environment.yaml` (all of which was done using the normal `cmd` shell). Then to use it, I activate that env from `cmd` and switch into cygwin `bash` to run the `txt2img.py` script (because it's easier to script, etc.)

[edit: probably helps that I already had a working VQGAN-CLIP setup which meant all the CUDA stuff was already there. For that I followed https://www.youtube.com/watch?v=XH7ZP0__FXs which covered the CUDA installation for VQGAN-CLIP.]


There's a 1-click installation for installing Stable Diffusion locally (Windows/Linux), doesn't require anything pre-installed - https://github.com/cmdr2/stable-diffusion-ui#installation


Official repo is straightforward: https://github.com/CompVis/stable-diffusion

Have to admit just started looking into it, mb there are better options


What really exited me about SD was how many creative things it was used for because people could modify and use it. Just the first week I saw tens of different cool projects here on HN. With Dall-E, I have only ever seen prompts+images.


For something that is supposed to be intelligent, sometimes the restrictions around DALL-E make no sense. My recent request to generate an image of two cats sleeping together was not allowed because apparently this is “adult content.”


Maybe OpenAI is learning the same lesson compiler and runtime vendors learned a couple decades ago: it's very hard to compete with open source.


Saying "missed the boat" makes it sound like it was just bad luck and not OpenAI's fault, but I'd argue that it was their fault. They could have made DALL-E open source; they just chose not to.


I'm still heavily exploring these new tools from an artist perspective. I never managed to get a run on Midjourney, but between DALL-E and SD there are quite a few differences. Broadly speaking, DALL-E seems to better get a hang on photographic results and interpreting "what I meant". With stable diffusion it's a lot of fiddling and putting manual emphasis on certain keywords until just getting it right.

Overall, pricing will need to be adjusted over time as well. I set out on an experiment the other day that you can see here: https://twitter.com/Keyframe/status/1574338738808934400

I went about trying to utilize stable diffusion for an imaginary concept project (concept for characters of a remake of TMNT, heh). Process was similar to how I'd do it with another artist more than if I drew it alone. It was back and forth, from rough outlines and then honing into details. Inpainting and img2img helped A TON and I hope I'd get dreambooth running soon as well since that will be a game-changer in the combination of things.

Between exploration phase, detailing, alternatives, and manual painting and over painting, I'd say PER OUTPUT final image I created in the region of a thousand or so interim images. Process overall did take a lot of time but not as much as completely manual and I didn't feel like I had as much control as manual of course, but I did feel ultimately bold enough that I thought I had creative control. With dreambooth I expect it to close the gap.

Overall, I was extremely pleased with the experiment and I'll continue exploring it, even though I'm not doing artwork professionally anymore. And so far no, it's not going to replace artists. It's another tool removing labour, but adds time on direction needed. Ultimately it'll be another brush in the toolbox.


I am surprised OpenAI didn't adjust the price of DALL-E 2 given the rise of free/low-cost competitors.

Granted, DALL-E appears to be buckling under demand regardless so the supply/demand curve doesn't warrant a price drop yet.


Name brand recognition goes a long way.


All their actions seem to arise out of arrogance and misplaced belief that no one can replicate their work. Even before DALL-E, with GPT etc the sheer arrogance that led them to think that they could decide how their tools can be used, who can use them and for what propose was revolting.

Frankly it spoiled the whole ML/DL revolution for me which till that point of time was a little more open than other fields. True that companies like Google don't let you use their models like Alpha* freely or even open source them but they aren't dangling access like these pricks who called themselves open were doing.

I hope they will become irrelevant in the coming years.


I was perplexed by this happening since I'd have figured the level of expertise in OpenAI's executive team would make them unlikely candidates to lose out strategically (vs against something more out of their control).

But maybe it was overconfidence? Or maybe what Stable Diffusion did was really just unpredictable.


I think they believed no one else could come out with something like what they had, plus some vanity/arrogance maybe.

Like back then when they held their GPT model from the public because "it could destroy the world" or some similarly deranged argument.

>the level of expertise in OpenAI's executive team would make them unlikely candidates to lose out strategically

Sure, to be honest they're making mistakes that even complete marketing noobs know to avoid. Anyway, it's nice to see others are eating their lunch and sharing it with us.


> Midjourney and Stable Diffusion emerged and got to the point where they produce images of equal or better quality than DALL-E

I cannot speak to DALL-E's results, as the signup process is currently broken (after providing email, name, and phone number, was met with "We’re experiencing a temporary issue with signups due to a vendor outage. We apologize for the inconvenience!"), but the Stable Diffusion results I've been getting are not just unusable, but downright bizarre... here are the four images it produced for "morihei ueshiba doing a double backflip": https://imgur.com/a/EvkQpBT


Finally was able to get the signup process sorted (discovered that I had to use a different email address than the one I had originally requested beta access with); DALL-E's results for the same prompt were more human at least: https://imgur.com/a/OahhDS4 .


Truly proves the saying, "Get Woke, Go Broke". All this pearl-clutching over safety really did a disservice to them.

In all fairness, their release of Whisper[0] last week is actually really amazing. Like CLIP, it has the ability to spawn a lot of further research and work thanks to the open source aspect of it. I hope OpenAI learns from this, downgrades the "safety" shills, and focuses on producing more high-quality open source work, both code and models, which will move the field forward.

[0]: https://github.com/openai/whisper


I've spent a significant amount of time playing with the variety of Diffusion models available and DALLE 2 tends to produce much better quality images. The other killer feature is DALLE 2 has support for in-fill.


I think the same thing is going to happen to the new models as well. Something better and more efficient is going to eat their lunch. Maybe we'll see more application specific models and a general model sitting on top of that to compost results together down the road.


Even better with Stable Diffusion at least you could run it locally.

Running software locally and using our desktop or portable supercomputers for something other than web browsing. What a novel concept. But how is this possible without cloud?


This gets upvoted due to inexplicable DALL-E hate, but on the other hand I'm keeping my DALL-E account and cancelled my MidJourney account because the DALL-E account doesn't cost me anything when I don't use it. Having an account I barely use is great because I can go generate an image whenever I want for comparison purposes.

(Furthermore, if I don't use it very often, I'm in the free tier due to the 15 free credits a month.)

Also, do you realize that Stable Diffusion is also running a pay-for-usage model at dreamstudio.ai? I like that too.


Also, and this is a big one: DALL-E has an overzealous filter, which blocks seemingly harmless prompts as "violating content guidelines".

Look, I get it: They don't want to be in the news for producing porn or gore. But if you block a prompt and threaten account closure on repeated such blockings, at least tell us what we did wrong.


this is something that people only on HN would write/believe. Missed the boat on what? Giving away free images from a prompt?

This is all early days and these demos are neat but the real value is yet to be seen. Maybe when this technology is licensed and integrated into Photoshop or Instagram or something like that.


Yes, it does feel they shot themselves in the foot.

Their marketing was excellent, but somehow pushed expectations too much and underdelivered. It also felt very elitist. Not very tinkers in a garage that this generation of stable diffusion feels like.


I suspect it was driven by moral panic and not necessarily business considerations.


I’m not totally up to date on this but doesn’t DALLE still have the best quality.


Not from what I've seen. I took some prompts I saw used in Midjourney and was getting relatively lame results from DALL-E. Not a great comparison, but still.


No


I think it's hard to move fast and be the morality police at the same time.


Can you link to the hundred-page document? I believe you are talking about prompt engineering, and I would love to get more information about it. I am struggling with figuring out good prompts.



Yeah the reason there's no "waitlist" is nobody's waiting to get in anymore. You don't need a bouncer if there's no queue at the door.


Agreed. That miss was brutal, although freely distributed will always best gated experience in my book. Stable Diffusion is already in everyones hands.


My thought on timing this is about Responsibility.

I don’t think other options put any or so much consideration about AI impact.

Perhaps we’re just finding out that people don’t care. Yet?


OpenAI's DALL-E paper started all this generative image stuff. You've gotta give them credit for publishing the paper to public.


GANs have existed long before and such art was being made years ago.


Or maybe they got undercut by an open source implementation and that was inevitable no matter what. How can you compete with free?


It's easier to compete with free (most paid products do) if most of the people interested in AI generated art have been paying for their service for months rather than browsing for alternatives. Especially since their supposed advantage is better prompt understanding rather than image quality; easy to dismiss StableDiffusion if your first impressions of it are "doesn't understand me like DALL-E" rather than "wow, this is magic"

The "waitlist" model might work when the product isn't ready for prime time or the exclusivity is a part of the pitch, but it's greatly overrated in other respects. I got a "The Wait Is Over" email to tell me I'm off the waitlist and able to use a not-exactly-new stock trading app this week as the UK economy crashed. Yeah, thanks, but no thanks...


Imo they simply thought that no one was going to be able to compete with them


It reeks of “product management”. It’s getting managed out of relevancy.


and huggging face doesn't even require you to signup/login to have a try


You do understand how fast this space is moving and how new everything is right?

Your criticism is in my opinion not valid.

Do they need to react to the market? Perhaps depends on what there goal even is.

Is dall-e 2 fun to use and cost wise totally fine? For me yes.

But I also have people running SD with a hacky webui on some good GPUs for free. How many people actually have access to it.

Is there also a good benchmark on which tool is inherent better? Because it is also totally fine to have multiple offerings.

I really don't sure if you ever seen product development for yourself.

Dall-e clearly took the potential misuse risk much further than others.


Just as user friendly as dalle but for stable diffusion and more free credits: https://beta.dreamstudio.ai/dream


I did a short test and created already 20 pictures withhe same dall e prompt without a result as good as dall-e.

And in another test the faces are super shitty.

Dall e also gives you 4 pictures per credit and dream 1.

So good to have more options I think. Two different products feeling different.


I generally find stable diffusion outputs better than dalle so it's surprising you say that. A good prompt makes a big difference though.


This doesn't make my experience less true.

But I also played around with sd.

I still think my original comment is valid.


I think it depends a lot on what you mean by "better output"

DALL-E is very good at conceptually representing complex prompt. Something like "a bear with a diving mask surfing in the ocean, a pelican is sitting on its shoulder", DALL-E will immediately produce coherent results, while SD requires lot of prompt tuning, and sometimes it's even impossible to get it to represent some concepts (I haven't tested this particular prompt tho)

SD is good for producing "artistic" images if that makes any sense

edit: ok I tried the "surfing bear" prompt with DALL-E 2 and SD and the results are consistent with my point, I put the raw prompt without tuning, and cherry picked the best image out of 4 with both models, here is what I got :

DALLE-2: https://labs.openai.com/s/Q9824QOfXln4r9FLFNM3v9v1

SD: https://imgur.com/a/czcMgiC

For SD, even by tuning the prompt I wasn't able to get the diving mask or the bird on the shoulder


Exactly matches my experience


But I also have people running SD with a hacky webui on some good GPUs for free. How many people actually have access to it.

The re


The credits from dall e are still cheap and you get some every month.


It's a little heartbreaking because arguably, OpenAI tried to do the responsible thing here: come up with a sustainable business model to make AI-generated images profitable while respecting trademarks and controlling for some objectionable content. Very corporate; very above-the-board.

Emad Mostaque, a millionaire hedge-fund manager with money to burn, spent approximately $600,000 to train a model and dumped it out for public consumption: no account for how it will be used, no concern about any sociopolitical consequences, damn the torpedoes and straight ahead. He basically burned down a potential industry space and hugely complicated an ongoing conversation on how these tools will interact with / disrupt the lives and livelihoods of artists... But he also basically changed the world overnight. Hashtag-squad-goals, am I right?

There's a lesson to be learned here. I haven't decided what it is yet. Though I note that it's a lesson that probably applies to few people who don't have $600,000 to set aflame.


> dumped it out for public consumption: no account for how it will be used, no concern about any sociopolitical consequences

I hadn't heard the story of how stable diffusion was created. Sounds like the guy is a true hero from your description. And only for $600k? Imagine if he decided to "burn" the rest of his millions on similar initiatives.


I unfortunately lack the imagination to think of similar initiatives that could be addressed in such a fashion.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: