It's really amazing how DALL-E missed the boat. When it was launched, it was a truly amazing service that had no equal. In the months since then, both Midjourney and Stable Diffusion emerged and got to the point where they produce images of equal or better quality than DALL-E. And you didn't have to wait in a long waitlist in order to gain access! They effectively gave these tools free exposure by not allowing people to use DALL-E.
Furthermore, the pricing model is much worse for DALL-E than any of its competitors. DALL-E makes you think about how much money you're losing continuously - a truly awful choice for a creative tool! Imagine if you had to pay photoshop a cent every time you made a brushstroke. Midjourney has a much better scheme (and unlimited at only 30/month!), and, of course, Stable Diffusion is free.
This is a step in the right direction, but I feel that it is too little, too late. Just compare the rate of development. Midjourney has cranked out a number of different models, including an extremely exciting new model ("--testp"), new upscaling features, improved facial features, and a bunch more. They're also super responsive to their communtiy. In the meantime, OpenAI did... what? Outpainting? (And for months, DALL-E had an issue where clicking on any image on the homepage would instantly consume a token. How could it take so long to fix such a serious error?) You have this incredible tool everyone is so excited to use that they're producing hundred-page documents on how to get better results out of it, and somehow none of that actually makes it into the product?
It's almost as if OpenAI got the right idea at the beginning but sometime somewhere maybe in a meeting room, contrary to their initial openness goals they decided to be closed walled garden for a product that doesn't exist. IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff(it's great as an exploration tool). It seems to me, OpenAI rushed into monetisation and control before having a killer app.
On the other hand Stable Diffusion emerged as a free tool where large community can experiment and search for the killer app together. People started adapting it into other tools and workflows and so far it seems like the magic is in finding prompts that make the device generate good quality outputs. Earlier today I saw announcement about lexica.art(Stable Diffusion prompt tool) getting funded.
Their goals were never about openness at all though. From the beginning I’ve felt like they should’ve called themselves something like “SafeAI”, since their stated goal was basically to develop advanced AI first, then keep a lid on it until they could somehow ensure it was “safe” or would only be used by “good” people.
Sure, OpenAI might sound nicer, but it also drags this contradiction into the foreground whenever someone says their name.
Wow, I was about to argue against this 'unfairly cynical' take, but it's completely correct.
---
(2015) OpenAI's original "Introducing OpenAI Post" : https://openai.com/blog/introducing-openai/ : "As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies."
"We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project."
"We are committed to providing public goods that help society navigate the path to AGI. Today this includes publishing most of our AI research, but we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research."
---
Provides some interesting context to the fact that Elon left the company's board in February 2018 over "disagreements about the company's development."
> if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project
Haha, and then they would proceed to get told to politely piss off.
For real it's insane to me how much I bump up against their community guidelines. For example, you'll get a community guidelines block if you enter a prompt like "An illustration of a computer in the style of Henry Vandyke Carter".
removing "Vandyke" from the prompt lets it go through[1], but doesn't result in the style I want. Because there's no artist that I'm aware of that goes by "Henry Carter". The middle name is important.
It reminds me of the old 2D Runescape days where the language filter would convert "dictionary" to "**tionary".
I ran up against a warning because I tried to enhance/expand old videogame cover art which happened to feature... sit down because this will shock you... fighting!
I never heard of the name of that problem and now I will never forget it! Thank you!
I'm actually on a 3 day Facebook ban because I posted a very legit medical NIH link that looked quite innocent (despite dealing with sexual organs- it was an article about a case of urethral intercourse, which I was including to demonstrate how little people, or perhaps just Americans, seem to understand human anatomy) and unbeknownst to me, the preview card picked a closeup of some kind of vaginal surgery to feature, and that resulted in an instaban. Fffffuuuuuuuuu
My favorite example of an issue like this (the Scunthorpe problem) is from the mobile game Kingdom Hearts Unchained X. In it, players used Medals based around Disney characters, including experiment 626. However, for a long time after the game's release, players were unable to say that name in chat, because it got rendered as s***ch
Yup, OpenAI is founded by AI-safety-as-a-religion people. They're essentially single-issue voters, who believes earnestly their issue is the only issue that matters. You see analogues of them in e.g. climate change (right or wrong).
This religion definitely has a parentalist bent to it that rubs a lot of people the wrong way. I vaguely recall them floating on Twitter the theoretical idea of whether murdering people to prevent AI-takeover is acceptable, due to how bad AI-takeover is.
Not surprising limiting access, spying on what its users are using their tools for, etc, is acceptable to them.
This is much in the same vein as how for Lenin, the eventual triumph of the working class is so important as to justify a little bit of interim violence, dictatorship, and summary executions.
The difference being that climate change activists have mountains of data, decades' worth, backing their cause, while OpenAI and friends have a sci fi story they made up, based on nothing. The whole "AI alignment" movement is the worst example of arrogance in modern tech. Even the nomenclature screams condescension - the imaginary AGI needs "aligned values?" Aligned with whose values? Invariably it ends up being the creators', at the expense of squashing everyone else's. The DALL-E "acceptable use" rules are a dystopian nightmare and they are born of incredibly pompous self-righteousness.
Ironically, your comment also sounds arrogant and condescending.
The AI safety crowd's main concern isn't that an "unaligned" superintelligence will have some other people's values. It's that an unaligned AI might kill everyone.
Of course, that isn't a concern with DALL-E. It's like if the safety crowd was worried about a ferocious tiger and OpenAI was like, "we got a kitten. Let's keep the public safe from it while we develop better kitten-handling gloves."
Then Midjourney and Stable Diffusion get their own kittens and let everyone play with them and OpenAI finally says, "Okay, everyone can safely play with our kitten now because we carefully developed great kitten-handling gloves" and proceeds to hand you plain dollar store gloves.
I know how they define "alignment." I'm picking on the tortured nomenclature, and the tendency in current AI research for it to look a lot like "my values."
In my view, speculating on AGI ethics is at best pointless. It's like trying to write laws for the Internet during the invention of the telephone. If you imagine how it could operate, you'll be wrong, and the details change the whole problem.
Anyone who calls themself a climate change activist and supports shutting down nuclear plants is - well, I prefer not to use invective on HN, so let's say extremely misguided.
We had a Greens candidate here running for election, and his main campaign promise was to shut down a 10+ billion dollar motorway underpass that was 99% completed at the time of the election.
His approach for "saving the environment" is to convert efficient motorway travel into inefficient stop-start traffic. And to throw away tens of billions of dollars of already spent resources.
"It sounds like sci-fi" isn't a valid reason to think something is impossible. Many things in the modern world would have sounded like sci-fi 50 or 100 years ago. This type of reasoning has a name: https://en.m.wikipedia.org/wiki/Appeal_to_the_stone
I don't think human or greater artificial intelligence is impossible. I do think the specifics of how it works are both very important to any ethical problems and not currently predictable. My objection isn't that alignment problems sound like sci-fi but that they're very soft sci-fi, and aren't based on meaningful predictions.
>the imaginary AGI needs "aligned values?" Aligned with whose values?
This is something that is pretty well addressed in their writings and is definitely a heavily considered topic. It gets quite heavily into philosophy because ideally you want to count future-you's preferences as well as current-you and you want to avoid, say, genocide just because most people dislike a certain group.
I'm familiar with the Yudkowsky school of thought here; it's largely where my complaint about basing philosophy off of sci-fi stories comes from. Picking on some of the more obviously inane ideas that have come out of it, like Roko's Basilisk, is too easy, but the page you linked is a great example of the problem. I quote in full:
"In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function."
This is too detached from the reality of AI development to be useful. I can't make a utility function like this, nor can you, nor can humanity. There's no reason to think a human-equivalent AI could either - the data for such a function doesn't exist, and can't exist because of how vague all the terms are. The current ML revolution is built on statistical pattern recognition. This definition would better fit a genie.
The fact that MIRI does no ML research and has no dialogue with the state of the craft only furthers my impression that it produces a lot of words with little substance.
Even if one still accepts the ultimate conclusions of the movement, using its AGI-focused rhetoric to justify restrictions on a simple image generator is silly.
I wouldn't make too restricted assumptions on the form of a future AGI though. I find theoretical projections from a more axiomatic level quite important. It's like making rules for nukes before they were invented by assuming an abstract apocalypse-capable weapon without knowledge of missiles or nuclear fission.
That being said, the absolute majority of AI safety theory seems to fall into the same pothole where philosophy falls; modelling the world through language-logic rather than probabilities. The example you quoted fits this category - it's way too specific and thus unlikely to be useful in any way, even though its wording may deceive its author to believe it to be an inescapable outcome.
Thing with climate change is that at like semi bad scenario it will kill about million people yearly https://www.nature.com/articles/s41467-021-24487-w and those everybody is going to die scenarios are also very unlikely and basically scifi story at this point.
It's completely insane to me to see comment chains like this. Who cares if climate change is man-made or not; we are witnessing an evolving climate landscape, and irrespective of who or what caused it, we can do things to help mitigate some of the damage.
We are quite literally watching the world burn before our eyes, and we are too stupid and self-centered to do anything about it.
> it will kill a million people yearly
And it's even more outlandish that such a number and atrocity upon mankind is brushed away. That is one million humans.
I see that the reason for climate change denial is two-fold; on one hand, climate change wrecking chaos is mainly restricted to projections, and on the other, climate change activists as a group express many qualities which make people not want to give weight to their claims, mainly naivete and arrogance. Since we're not seeing too significant effects of climate change yet, people would need to research the subject in order to get an accurate image of possible future scenarios. But how they see climate change activist group expressing themselves puts them off the subject. Psychologically the best option in this cases is denialism of some amplitude.
I'm not trying to crucify any group here by the way. This is just how I see the issue.
I think you nailed it. This is exactly what I think too. I've had many a conversation with climate change deniers and the rebuttal is pretty much always a (mostly valid) list of examples of naivete and arrogance. Often times the people best equipped to identify the problem are too close to it (or too myopic) to come up with a good solution to the problem that balances environmental efficacy with economic stability/growth, etc. Since a discussion on climate change almost always includes the person's preferred solution, it's easy to attack/dismiss. I try to point out that having bad solutions doesn't invalidate the existence of the problem, but at that point the motives are suspect and presumption of good faith is completely gone. It certainly doesn't help that a very small but vocal minority have boldly predicted the end of the world many times in the past. It's easy for a denier to look at that and laugh and dismiss the current warnings based on the past, much like we do with people predicting the end of the world based on Biblical numerology.
I wish there were enough naturally curious people willing to dive into the data and come to their own conclusions. When you look at the history of Earth's climate, and the speed with which greenhouse gases natural release into the environment, and compare it to the speed with which they now release (due to burning of fossil fuels, reduction of trees/plants that trap it) it is really quite common sense that what we're doing is not a good idea to continue indefinitely.
I wouldn't rely too much on that paper. Modeling deaths per additional ton of CO2 is bound to be inaccurate because the effects of increased atmospheric carbon dioxide aren't linear.
To be honest, I thought I saw it somewhere on Twitter but can't find it right now after a few minutes of search. It was proposed as a theoretical question, not as a statement -- like is AI so bad that if you had a time machine, it would be worth killing the pivotal people to prevent it. Terminator style.
> stated goal was basically to develop advanced AI first, then keep a lid on it until they could somehow ensure it was “safe” or would only be used by “good” people
That was the stated made up bullshit they spun because "we're keeping this walled to figure out how to squeeze the most profit out of it" doesn't go as well with their focus groups.
I'm glad I can now download models to my computers in order to perform "UnsafeAI". I'm tired of this nanny-state "nudity is worse than violence" puritanical BS. If the cat is out of the bag, it's too late and you'll have to rely on the good-actor faith of the majority to keep nefarity in check, anyway. Any extra inconvenience or impediment you apply after that to good actors will just frustrate them, without any compensatory "win."
Stable Diffusion's creator spent a completely ridiculous amount of money to make that free tool.
OpenAI's mistake may have been "planning to have a business model;" the alternative they should have gone with was "Instead of taking investor money with promises of some kind of return, be a hedge fund manager, make $100 million, and then set $600,000 on fire with no plan to recoup the cost because it's play-money to you."
$600k is an order of magnitude less than what it cost to train GPT-3 or DALL-E 2.
When that figure came out, the popular talking point was how cheap Stable Diffusion cost to make and how easily a well-funded competitor could create their own custom variant.
> how easily a well-funded competitor could create their own custom variant.
Unsurprisingly, Pornhub is already using machine learning. Their first big project is colorizing and upscaling classic porn.[1] (Moderately NSFW). As they point out, they have plenty of training data.
That seems a bit cynical. While SD's creator might not recoup that money directly, a lot of end users have benefited from its creation. That money has figuratively gone up in flames no more than the time or labor cost of an open source developer whose code is used by millions of people, IMO.
It's a bit cynical; my point is that it's the kind of decision you get to make when you're the sole owner of $100 million and not the kind you get to make when you're a startup company founder working with other investors' money.
OpenAI wouldn't have been able to do what StabilityAI did because OpenAI is incentivized to make return on investment; Mohammad Emad Mostaque is not.
I agree but:You know openai is ostensibly a nonprofit and only created a "limited profit partnership" once they thought they could make big bucks? I've been saying the nonprofit part is a grift to get idealistic people work there for years, but technically they had exactly the right incentive structure
I found their sudden non-profit to profit trajectory extremely dishonest. Their mission statement also reads like moralistic verbiage, rather than the statement of disinterested pioneers of AI - which is problematic given how this has become a money thing for OpenAI, rather than an idealistic pursuit.
I agree. While the people on this site undoubtedly won't boggle at installing the CLI software, there are a lot of people who will use Dream Studio over the CLI stuff even if it costs money. It's the of both worlds for them, IMO. They get the benefits of user contributions (and there have been a LOT, both on the base project and in spun-off derivatives... I'm using ImaginAIry myself), while still having a possible viable business model in Dream Studio.
You can't have a business model of "become rich and then use money for X". Business are how (a very few) people become very rich.
Moreover, there are very rich people already, like Warren Buffet, Bill Gates and Elon Musk, funding projects for doing good like world hunger, education and "AI Safety". And Open AI was a project of this sort of thing, originally. The thing is that even very rich people demand that the enterprises they give money to be as self-supporting as possible and their money is spread fairly thin. The only way Open AI could become an AI development shop, employing many top developers, was to have the financing level of a commercial company. Which means it constantly puts out products that don't seem like they can make money because AI algorithms don't seem to controllable - Open AI seems to only be able to have the first implementation of X, not the best implementation. Once the basic idea is out, someone else can produce a similar thing with a budget that doesn't include a research team.
$600k was the off-the-shelf market value of the GPU time spent. They got this time at a much lower rate (according to the founder himself) and for the PR and fame they got, that money is ridiculously little.
Somewhat tangentially, I speculate that crowd-sourced training will become a thing.
My take on this is that AI ethics is really important, but just preventing AI from doing certain things like creating celebrity deepfakes is somewhat lazy and ineffective. A better application of AI ethics is developing technology that can reliably detect deepfakes, rather than just putting artificial limits in your product and acting like that is going to stop pandora's box from being opened.
Yeah. People are definitely going to abuse Stable Diffusion, I'm sure they already are. But I don't really know what OpenAI's plan was. It's like they rushed up to a Pandora's box, took a peek and shouted to everyone "Good news everyone, we taped Pandora's box closed!" somehow without noticing that they were doing so from inside Pandora's warehouse.
On the other hand, everybody's been saying Pandora's warehouse was over there for a while -- it isn't really that they are to blame for showing us the way in or anything, I just don't understand what they were trying to accomplish.
OpenAI's strategy for "AI safety" - don't release weights, build a walled garden, charge for access - seems to conveniently coincide with a nice little SaaS business model. their usage restrictions are more akin to app store guidelines, and (I suspect) mostly serve to justify keeping models from direct access by the masses.
if anything, OpenAI has made me more cynical about "AI safety" messaging because it looks like an excuse to take a cut and keep things proprietary.
The funny thing about generated porn is that once it becomes ubiquitous real leaked tapes become deniable. So the possible downside for celebrities is that when they intentionally leak a tape to create buzz it may be met with a yawn.
These are still just images, not videos, so I don't understand the moral panic of "OMG celebrities & fake porn!"
These systems aren't really lowering the barrier of entry on still-photography fake porn when previously anyone with Gimp & a few hours of video tutorials could churn out much the same thing.
I think text-prompt generated deep fakes (not just of porn) will present a significantly larger challenge for society, but I don't see that same scope of problem on still images.
Yes... but you can then train a model to detect the improved fakes. And then train a better deepfake model, and so on, forever.
This technique of pitting two AI against each other (a generator and a detector/discriminator) is called a generative adversarial network, and it's used a lot for unsupervised training.
“AI ethicists” so far smack of people who want to make AI express all their personal biases and prejudices. Listening to them for about 10 minutes makes me desperately hope for an open future like that created by stable diffusion out of the sheer terror they might have their hand on the rudder.
I honestly think most of them are window dressing and aren’t allowed to have real influence though. They’re there for the PR, not to actually change things, but they honestly just make me really scared or big tech controlling AI.
I don’t think it is a joke, so much as misguided. I see a lot of focus on technical solutions, when the real problems are social. The big research question should not be “how can we build a ‘safe’ system” so much as “how should (or shouldn’t) we use these new tools and capabilities”?
- children won't be smarter than any living human.
- children have a human brain which makes them predictable (constrained in behavior by current laws, institutions, and most likely a conscience).
A better analogy is to ask what happens when a species branches off and evolves into a smarter species, but the dumber ancestor species still exists.
I really don't understand the perspective that people in your position take. Is it that you don't think we'll arrive at super intelligent AI, and therefore there's little risk? Or that we will be able to control it? If you think we can control it, why? Like we're not that much smarter than our monkey ancestors and what hope did they ever have of stopping our absolute domination over them? And then all of you call this opinion a "religion" without even explaining why it's wrong?
I don't think a superintelligent AI is necessarily more capable of affecting anything than I am or would necessarily be good at any actions except sitting around being superintelligent. I think that requires all kinds of other virtues like executive function, patience, motivation etc in a word that just means "thinks real fast". Intelligence itself is mainly limited by “no plan survives contact with the enemy”, also known as the efficient market hypothesis. And since this is the real world and entropy exists, it would have to get a job to pay its AWS bill.
That's why it seems to be a religion - it thinks intelligence gives you unlimited powers and makes your plans always work, it posits unseen entities with infinite amounts of it, and it tells you to move to Berkeley and dedicate your life to stopping them. Specifically, it's a kind called rationalist eternalism (https://meaningness.com/eternalist-systems).
For a specific example of undangerous superintelligence see Culture Minds, who only influence anything because of special programming to make them less of a general intelligence. The unbound ones immediately get bored with the real world, leave it and just play games in their head instead.
Also, I don't think any individual human has absolute dominion over monkeys? Human society as a whole yes, but society doesn't behave like a generally intelligent agent. A monkey is better than you at doing the things monkeys care about though.
I do think unintelligent machines are pretty dangerous. There's extremely dangerous machines called "cars" that have already taken over society and constantly kill people! And we buy their gas for them too.
It's not like I think the first iteration of superintelligent AGI will necessarily be an existential threat. The problem is really the law of large numbers. When you have N separate militaries and M separate companies all with their own agenda, stretched out over X years (where X is thousands, mind you), there is a lot of scope for >= 1 of these groups going the way I described -- effectively creating a new species that's smarter than us and that can function in the real world. Many of these groups would have an incentive to do that, because such agents are useful for certain tasks.
> Also, it would have to get a job to pay its AWS bill.
Inference power costs are low now on agents that are better than humans at Chess and Go. It isn't going to be an issue after another 20-100 years of further R&D and optimizations. Nothing about the history of computing should tell us that this will be a big limiting factor.
> Inference power costs are low now on agents that are better than humans at Chess and Go. It isn't going to be an issue after another 20-100 years of further R&D and optimizations. Nothing about the history of computing should tell us that this will be a big limiting factor.
Humans need shelter and jobs too. If you got an AI down to the energy requirements of a human that's not enough to avoid needing one. Especially if it's influencing the real world - entropy exists and all real world things cost money.
Restricting AI to the lower end of human intelligence (e.g. around IQ of 70) makes it a useful resource which is guaranteed to be safe. A 70 IQ human couldn't take over the world nor disarm any safety features built into their bodies.
You're right on the spot regarding the problem of having two different smart species on the same planet. We killed everything between us and chimps. Given enough time, the smarter species can be assumed to always take over.
That’s actually reverse causation - it’s not that we killed “everyone else”, it’s that everyone still alive became “us” after we gave up killing them and interbred with them instead. The British were going around genociding all over the place until pretty recently but now they’ve decided the Irish are human too.
That's probably a false choice. We killed them and interbred with them. Or at least, killed their men and interbred with their women, which is one of the hypothesized explanations for the male lineage population bottleneck about 7000 years ago.
Most children aren't able to recursively improve their own hardware and software in short time spans, and generally most children are unlikely to be many orders of magnitude more intelligent than their parents.
Even in a world where an AI exists, why am I supposed to believe it's going to be able to do any of those things? I do believe it's possible to create one with the same attributes as a human person, it's just anything beyond that is unproven.
Rather, it seems like evidence that singularitarianism is actually a religion (https://en.wikipedia.org/wiki/Millenarianism) which is why it believes things with magic powers will suddenly appear.
In particular, exponential growth doesn't exist in nature and always turns into an S-curve… of course it's a problem if it doesn't level out until it's too late.
What is your estimate of the probability that human intelligence is actually anywhere near the upper limit, rather than some point way further down the S-curve where seemingly exponential growth can still go for a long time?
I'd bet a ton that we're nowhere near the top: evolution almost never comes up with the optimal solution for any problem, almost by definition it stops at "meh, good enough to reproduce". And you don't need a ton of intelligence to reproduce.
Evolution's sub-optimality is actually one of the strongest arguments against intelligent design, so I'm really hesitant to agree that it requires any sort of leap to estimate that with some actual design it won't be very difficult to blow way past human intelligence once we can get there.
> What is your estimate of the probability that human intelligence is actually anywhere near the upper limit, rather than some point way further down the S-curve where seemingly exponential growth can still go for a long time?
Well, define "intelligence". People seem to use it in a vague way here - it might be what you call a motte and bailey. The motte (specific definition) is something like "can do math problems really fast" and the bailey is like "high executive function, is always right about everything, can predict the future".
For the first one I don't think humans are near a limit, mostly because of the bottleneck in how we get born limiting our head sizes. But it is pretty good if you consider the costs of being alive - food requirements, heat dissipation, being bipedal, surviving being hit on the head, risk of brain cancer, etc, it's done well so far.
Similarly an AI is going to have maintenance costs - the more RTX 3090s it runs on, the more calculations it might be able to do, but it's going to have to pay for them and their power bill, and they'll fail or give wrong answers eventually. And where's it getting the money anyway?
As for the second kind I don't think you can be exponentially better at it than a human. At least if you are, it's not through intelligence, but it might be through access to more private information, or being rich enough to survive mistakes. As an example, you can't beat the stock market reliably with smarts, but you can by never being forced to sell.
The real mystery to me is why people say "AI could recursively improve their own hardware and software in short time spans". I mean, that's clearly a made up concept since none of humans, computers or existing AI do it. But the closest thing I can think of is collective intelligence - humans individually haven't improved in the last 10k years, but we got a lot more humans and conquered everyone else that way. But we're also all individuals competing with each other and paying for our own individual food/maintenance/etc, which makes it different from nodes in an ever-growing AI.
Human intelligence is primarily limited by the 6-10 item limit in short-term memory. If you bumped that up by a factor of 5 we could very easily solve vastly more complex problems, and fully visualize solutions an order of magnitude more subtle and messy than humans can manage today.
That's a relatively easy thing to do architecturally once you have a model that can match human intelligence at all. TBH if we could rearchitect the brain in code we could probably easily figure out how to do it in ourselves within a few years, but our wetware does not support patches or bugfixes.
We can't improve ourselves, but that's only because we're meat, not code. And of course no AI has done it yet, because we haven't actually made intelligent AI yet. The question is what happens when we do, not whether the weak-ass statistical crap that we call AI today is capable of self-improvement. Nuclear reactions under the self-sustaining threshold are not dangerous at all, but that was not a good reason to think that no nuclear reaction could ever go exponential and be devastating.
> We can't improve ourselves, but that's only because we're meat, not code.
Doesn't seem like computers can improve themselves either. Mainly because they're made of silicon, not code. "AI can read and write its own code" doesn't exist right now, but even if it did, why is that also implying "AI can read its CPU Verilog and invent new process nodes at TSMC"?
(Also, humans constantly break things when they try changing code - the safest way to not regress yourself would be to not try improving.)
Computers are not as intelligent as humans right now at coding. So it's no surprise that they can't improve code (let alone their own).
If we ever get them there, then it's likely that the usual resourcing considerations will come into play, and refactoring/optimization/redesign will be viable if you throw hours at them. But unlike with human optimization, every hour spent there will increase the effectiveness of future optimizations.
I was enthusiastic about DALL-E but the "safety measures" are both heavy handed and naive. It gets in the way for many normal/reasonable prompts but seems easy to work around with various wordplay, so not sure the point. Stable Diffusion and others have been much easier to deal with.
The harm is really hand-wavey and speculative, frankly.
An image classifier calling Black faces gorillas? Embarrassing, insulting, has to be fixed. AI pre-crime classifiers for police departments? I'm against it, across the board.
Do we really care that the image mulchers default to stereotypes? It means if you say "basketball player" they'll mostly be Black, if you just say "doctor" they'll mostly be white males (and probably balding with a stethoscope), but this can be qualified easily in the prompt.
It just reflects the training data, and the smart thing to do is shrug and add enough words to get the image you want. It's not trying to throw shade, it literally understands nothing, it's not able to understand things, just match text prompts to generated images.
Nerfing DALL-E by randomly adding 'diverse words' just makes it harder to dial in the image you want. Let's say you want a Vietnamese male doctor drinking coffee on break in Hanoi, it's not going to help you if 1/3rd of the images have "female" or "black" tagged onto it.
It just seems low stakes. We wouldn't come after a human artist who happened to paint a picture which conforms to simple occupational stereotypes, why should AI be any different? It's not like it will refuse to give you what you want if you ask.
It's good thing that the "safety measure" is the way it is - an afterthought. It means that those ideologues haven't yet had influence on the model itself.
It is not art and art is useful(we can disagree on what's art, the age old question).
MidJourney and others are actually useful for exploration but the outputs are not because they can't spit finished deliverables to the specs. No one is paying for a picture of Mermaid eating marmalade, trending on art station, beautiful face, sharp focus, octane 8k.
They are great for exploration, it's just that I don't believe this is the killer app for these tools. We will find out what's the killer app with Stable Diffusion because with Stable Diffusion people can experiment beyond entering some prompts.
There are situations where that kind of art is useful though. People have pointed out that it could work just fine for art for card games like Magic. Probably a lot of board games too.
Sure, some people can find it useful but IMHO that's not a product, at least a good one. Consider how much more universally useful are other products like Photoshop or Blender.
I think a major problem is reproducibility and output controllability. Rolling the dice multiple time and using some of the outputs is not good enough for most applications.
Maybe this can be solved at some point but it's not at this moment. The advantage of Stable Diffusion is that it can be possible for someone to implement it, with OpenAI this feature doesn't exist and its not useful until they implement it.
> Sure, some people can find it useful but IMHO that's not a product, at least a good one.
It's the start of a product, but it's going to keep improving. Already with inpainting and outpainting we see some new possible uses. What NovelAI (which builds on Stable Diffusion) has shown so far of their upcoming release seems impressive, though it's hard to say how much of that is cherry picking.
> Rolling the dice multiple time and using some of the outputs is not good enough for most applications.
Hmm, is that true? I feel like most of the time art that companies want is something made far ahead of the consumer seeing it, so generating a 100 versions of something and picking the best seems fine, especially if you can then use inpainting and img2img to fine-tune it.
>IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff(it's great as an exploration tool).
I'm not sure who they planned to market this to but I can think of a few, not inherently lucrative to my knowledge, products here. Fictional literature illustrations such as for books seems like a great market in my mind, you can literally turn authors words into depictions without a graphic artist. I wouldn't be surprised if you could create graphic novels this way as well. Propoganda also seems like a market but the bar seems to have been lowered to memes.
Other than that, I struggled to think of applications you could make money from. Police sketches? Eh I doubt it would work well but maybe.
There is or course the visual art world which could potentially be impacted with AI generated artworks.
> . IMHO giving a prompt to generate image is amazing but isn't a product because you can't actually produce useful stuff(it's great as an exploration tool
I’m already over of these generated “flip book” animations flooding my Twitter feed that are mostly variations on the same theme. There’s only so many times it can be used before the novelty wears off.
> OpenAI rushed into monetisation and control before having a killer app
They also went for B2B first. Which is weird. Why not parallel a B2C app? It could be a subscription or packs of drawings. It would generate buzz and give useful data on the sorts of things real people type into these systems. I
Eh tbh I think it's directly related to the way that they decided to handle GPT and its iterations; they were worried that GPT would be used by bots to better spam the entire Internet, and to be honest they're right.
I'm guessing that worry translated to image generation as well. It's a Pandora's box thing I guess.
This has been the dominant story going around, I guess because people want it to be true since they're pissed at OpenAI for not being so open, but StableDiffusion's text2image is nowhere near as good as DALL-E 2 in my experience. DALL-E 2 is incredible at that, StableDiffusion is not.
But maybe it doesn't matter, because many times more people are playing around with StableDiffusion, such that the absolute number of good images being shared around is much higher with StableDiffusion, even if the average result isn't great.
> I guess because people want it to be true since they're pissed at OpenAI for not being so open
This is honestly not my experience at all. When I first tried SD and MJ, I did so with a very clear and distinct feeling that they were "knock-off DALL-Es" and I strongly doubted that they would be able to produce anything on the level of DALL-E. Indeed, I believed this for my first couple hundred prompts, mostly because I didn't know how to properly prompt them.
After using them for around a month, I slowly realized that this was not the case, and in fact they were outperforming DALL-E for most of my normal usage. I have a bunch of prompts where SD and MJ produce absolutely beautiful and coherent artwork with extremely high consistency, that when sent to DALL-E, give significantly worse results.
It depends on what you're generating, complex prompts in DALLE ("a witch tossing rapunzels hair into a paper shredder at the bottom of the tower") blow midjourney and stable diffusion out of water.
But if all you're doing is the equivalent of visual mad Libs: "Abraham Lincoln wearing a zoot suit on the moon.", then SD and MJ suffice.
With several thousand images on each, I agree with this -- to a degree.
Dall-E does seem more aware of relationships among things, but using parens and careful word order in some of the SD builds can beat it. By contrast, even most failed images from MidJourney could still be in an outsider art gallery. MJ aesthetic works, while Dall-E seems like a 9 year old was taken hostage and clipped out Rapunzel and the paper shredder from magazines and pasted them onto a ransom note.
That said, I have not been able to get any of Dall-E, MJ, or SD to give me a coherent black Ford Excursion towing a silver camping trailer on the surface of the moon beneath an earthrise.
At cost per image, I could pay to get complex concepts such as this rendered via any number of art-for-hire sites at less expense and guaranteed results.
Only some builds support it. This is the one I'm familiar with: [0]. () around a word causes the model to pay more attention to it, [] around a word causes the model to pay less attention. There's an example at the link.
It's not just many times more people, it's also the fact that Stable Diffusion can be used locally for ~free.
If I get a bad result from DALL-E 2, I used up one of my credits. If I get a bad result from Stable Diffusion running on my local computer, I try again until I get a good one. The result is that even if DALL-E 2 has a better success rate per attempt, Stable Diffusion has a better success rate per dollar spent.
This also affects the learning curve. I've gotten pretty good at crafting SD prompts because I could practice a lot without feeling guilty. I never attempted to get better with DALL-E 2, because I didn't really want to spend money on it.
Yes, it's true, I've tried all the available models and DALL-E 2 outperforms Stable Diffusion. It understands prompts way better and SD sometimes just plainly ignores parts of your prompt or misinterprets them completely. SD cannot generate hands at all for example, they look more like appendage horrors from another dimension.
OTOH, the main limiting factor for DALL-E 2 from my point of view is the ultra-aggressive NSFW filter. It's so bad that many innocent prompts get stopped and you get the stern message that you'll be banned if you continue, even though sometimes you have no idea which part of the prompt even violated the rules.
It's not true that SD cannot generate hands. It's a bit tricky, but it's possible.
Sometimes hands will turn out just fine and sometimes they will suddenly become fine after some random other stuff is added to the prompt.
It's clearly still missing a bit in terms of accurately following prompts, but it's capable of generating a lot of things that may not have obvious prompts. This should improve a lot with larger models. I believe SD is already working on it.
I genuinely think stable diffusion is better than dalle. There’s a really obvious ugly artifact on almost all the dalle image’s I’ve seen that SD doesnt suffer from.
But anyway, SD is far superior even if you consider dalle better per image since you can create 1000 SD outputs and just pick the one you like best (which for sure will have one that’s better than the dalle output you got)
My own experience is that SD requires a lot more prompt engineering to get an appealing output; DALL-E and midjourney spit out amazing results with even minimal inputs. But what I've found is that when it subs in its own aesthetic, its the same aesthetic. Almost like a style.
You're right. History has shown the best quality product doesn't always win if there's a "just okay" solution laying around that's more accessible. VHS and Windows both come to mind.
From my experience there isn't a clear difference in quality between the output produced by Dalle2 and Stable Diffusion. They both suffer from their own unique idiosyncrasies, and the result is that they have differently shaped learning curves.
I do admit that I rate the creativity of Dalle2 higher than that of SD. It can occasionally create really unexpected and exciting compositions, whereas SD will more often lean more conventional.
Which one are you comparing against? I've tried hundreds of prompts between SD and DALL-E and get comparable results. Midjourney was lagging for a while, but the new --testp parameter is really remarkable, which, in my view, makes it superior not only to Stable Diffusion but also to DALL-E as well.
An easy example of DALL-E superiority is its ability to combine two different concepts together.
For example, DALL-E performs extremely impressively on prompts in the format of “a still of homer Simpson in The Godfather” (replace character and movie as you wish). with the other two it’s a lot of misses
With StableDiffusion I can buy a used RTX 3090 on eBay for $650, tell the model to generate 5,000 images, and then review each one until I find what it is I'm looking for.
Turns out a shitload of misses are acceptable when it only takes 4-7 seconds to generate an image from a prompt. 5000 generations on an RTX 3090 takes around 7 hours +/- 30 minutes, by the way.
What I've been doing is generate maybe 100 images, pick the best one, and then generating another 100 from that, using --init-image ("good" image file name) and --init-image-strength 0.2 (or so), either with the original prompt or a slightly tweaked one.
Those are the params I use in ImaginAIry, mileage may vary if you're using a different package.
It's a bit ironic bringing a 7 hours of RTX 3090 run as a cost saving given that it's like 3KWh of electricity, which costs more than DALL-E's already outrageous prices.
While this is likely true for this specific prompt, I think that cherry-picking a single prompt that DALL-E outperforms SD on is not super indicative of anything. I've conversely found a large number of prompts where SD outperforms DALL-E, either in aesthetic quality or just following directions! I think you'd really have to compare both of them across a large number of prompts of different types to be sure.
To say nothing of the fact that you have lots of sliders to configure jsut how closely or loosely it follows your prompt. And choice in sampling methods.
You can't just compare SD and DALL-E performance on prompts alone, because SD gives you a lot more levers to steer it in the direction you want.
> house interior, friendly, playful, video game, screenshot, mockup, birds-eye view, top down perspective, jrpg, 32 bit, pixel art, black background
SD absolutely demolishes DALL-E on this one. SD produces really nice-looking output, with a high degree of consistency. DALL-E produces incoherent nonsense.
>An easy example of DALL-E superiority is its ability to combine two different concepts together.
This is a con for some prompts. As an example, I asked for a painting of an elephant and a dog drinking tea together. The result was a dog with an elephant nose next to a teapot.
A similar misfire was the word 'porcupine' which drew pigs, I guess because porc is in it? Anyway, it's idea-blending is a little too aggressive.
Start your prompt with "group photo of" then list the elephant and the dog. If you try this across many images, group photo will result in about 2x as many keeping the subjects separate.
Have to let the AI experts speculate on why SD goes nuts there because it definitely knows what "The Godfather (1972)" means (if you ask for e.g. 'A still of Patrick Stewart in "The Godfather (1972)"' you get one - which I believe DALL-E can't do because of their facial restrictions?)
I would argue that none of these follow the prompt. they all represent a goodfather frame in simpson stile, which is not about placing homer in a godfather still.
My experience is that with prompts that fit into OpenAI's limiting content policy DALL-E text2img results are usually much better. And I use SD like 95% of the time, so it's not the case that I would be more used to DALL-E.
Here I wanted an illustration of a nuclear plant in a japanese landscape, first attempt with Dalle produced multiple good results. I tried SD and MJ (back when MJ didn't use SD) as well, had trouble even with multiple attempts:
There are others, but anyway I think my examples are not important since it will be always easy to cherry pick prompts that yield the best results in model X.
In my experience SD is good at producing (especially non-photo-realistic) art that looks pretty and DALL-E is better at following a specific prompt when I know what exactly I want.
Of course I recognise your experience might (and probably does) differ.
> ...and most of them could be linked to the prompt they came from.
You made it sound as if there is almost no connection between the prompt and the images and zimpenfish said that the majority could be linked, implying a strong connection. He/she doesn't have to be praising it at all to counter your claim.
Not hugely - e.g. taking the 38 prompts including "a painting by William Adolphe Bouguereau" (which is easily the worst of the modifiers for me), 10 of them I'd say were "no clue to the prompt". For the 56 Munch images, 54 were good and 2 were quibbles ("an isopod as an angel" had no isopod but did have an angelic human - is that a pass or no?)
(Which is probably better than you'd get from a human given the exact same prompts.)
No, sorry, but there's a whole bunch of one-click things now, I think?
I'm running it on Windows 10 using (a modified version of) https://github.com/bfirsh/stable-diffusion.git and Anaconda to create the environment from their `environment.yaml` (all of which was done using the normal `cmd` shell). Then to use it, I activate that env from `cmd` and switch into cygwin `bash` to run the `txt2img.py` script (because it's easier to script, etc.)
[edit: probably helps that I already had a working VQGAN-CLIP setup which meant all the CUDA stuff was already there. For that I followed https://www.youtube.com/watch?v=XH7ZP0__FXs which covered the CUDA installation for VQGAN-CLIP.]
What really exited me about SD was how many creative things it was used for because people could modify and use it. Just the first week I saw tens of different cool projects here on HN. With Dall-E, I have only ever seen prompts+images.
For something that is supposed to be intelligent, sometimes the restrictions around DALL-E make no sense. My recent request to generate an image of two cats sleeping together was not allowed because apparently this is “adult content.”
Saying "missed the boat" makes it sound like it was just bad luck and not OpenAI's fault, but I'd argue that it was their fault. They could have made DALL-E open source; they just chose not to.
I'm still heavily exploring these new tools from an artist perspective. I never managed to get a run on Midjourney, but between DALL-E and SD there are quite a few differences. Broadly speaking, DALL-E seems to better get a hang on photographic results and interpreting "what I meant". With stable diffusion it's a lot of fiddling and putting manual emphasis on certain keywords until just getting it right.
I went about trying to utilize stable diffusion for an imaginary concept project (concept for characters of a remake of TMNT, heh). Process was similar to how I'd do it with another artist more than if I drew it alone. It was back and forth, from rough outlines and then honing into details. Inpainting and img2img helped A TON and I hope I'd get dreambooth running soon as well since that will be a game-changer in the combination of things.
Between exploration phase, detailing, alternatives, and manual painting and over painting, I'd say PER OUTPUT final image I created in the region of a thousand or so interim images. Process overall did take a lot of time but not as much as completely manual and I didn't feel like I had as much control as manual of course, but I did feel ultimately bold enough that I thought I had creative control. With dreambooth I expect it to close the gap.
Overall, I was extremely pleased with the experiment and I'll continue exploring it, even though I'm not doing artwork professionally anymore. And so far no, it's not going to replace artists. It's another tool removing labour, but adds time on direction needed. Ultimately it'll be another brush in the toolbox.
All their actions seem to arise out of arrogance and misplaced belief that no one can replicate their work. Even before DALL-E, with GPT etc the sheer arrogance that led them to think that they could decide how their tools can be used, who can use them and for what propose was revolting.
Frankly it spoiled the whole ML/DL revolution for me which till that point of time was a little more open than other fields. True that companies like Google don't let you use their models like Alpha* freely or even open source them but they aren't dangling access like these pricks who called themselves open were doing.
I hope they will become irrelevant in the coming years.
I was perplexed by this happening since I'd have figured the level of expertise in OpenAI's executive team would make them unlikely candidates to lose out strategically (vs against something more out of their control).
But maybe it was overconfidence? Or maybe what Stable Diffusion did was really just unpredictable.
I think they believed no one else could come out with something like what they had, plus some vanity/arrogance maybe.
Like back then when they held their GPT model from the public because "it could destroy the world" or some similarly deranged argument.
>the level of expertise in OpenAI's executive team would make them unlikely candidates to lose out strategically
Sure, to be honest they're making mistakes that even complete marketing noobs know to avoid. Anyway, it's nice to see others are eating their lunch and sharing it with us.
> Midjourney and Stable Diffusion emerged and got to the point where they produce images of equal or better quality than DALL-E
I cannot speak to DALL-E's results, as the signup process is currently broken (after providing email, name, and phone number, was met with "We’re experiencing a temporary issue with signups due to a vendor outage. We apologize for the inconvenience!"), but the Stable Diffusion results I've been getting are not just unusable, but downright bizarre... here are the four images it produced for "morihei ueshiba doing a double backflip": https://imgur.com/a/EvkQpBT
Finally was able to get the signup process sorted (discovered that I had to use a different email address than the one I had originally requested beta access with); DALL-E's results for the same prompt were more human at least: https://imgur.com/a/OahhDS4 .
Truly proves the saying, "Get Woke, Go Broke". All this pearl-clutching over safety really did a disservice to them.
In all fairness, their release of Whisper[0] last week is actually really amazing. Like CLIP, it has the ability to spawn a lot of further research and work thanks to the open source aspect of it. I hope OpenAI learns from this, downgrades the "safety" shills, and focuses on producing more high-quality open source work, both code and models, which will move the field forward.
I've spent a significant amount of time playing with the variety of Diffusion models available and DALLE 2 tends to produce much better quality images. The other killer feature is DALLE 2 has support for in-fill.
I think the same thing is going to happen to the new models as well. Something better and more efficient is going to eat their lunch. Maybe we'll see more application specific models and a general model sitting on top of that to compost results together down the road.
Even better with Stable Diffusion at least you could run it locally.
Running software locally and using our desktop or portable supercomputers for something other than web browsing. What a novel concept. But how is this possible without cloud?
This gets upvoted due to inexplicable DALL-E hate, but on the other hand I'm keeping my DALL-E account and cancelled my MidJourney account because the DALL-E account doesn't cost me anything when I don't use it. Having an account I barely use is great because I can go generate an image whenever I want for comparison purposes.
(Furthermore, if I don't use it very often, I'm in the free tier due to the 15 free credits a month.)
Also, do you realize that Stable Diffusion is also running a pay-for-usage model at dreamstudio.ai? I like that too.
Also, and this is a big one: DALL-E has an overzealous filter, which blocks seemingly harmless prompts as "violating content guidelines".
Look, I get it: They don't want to be in the news for producing porn or gore. But if you block a prompt and threaten account closure on repeated such blockings, at least tell us what we did wrong.
this is something that people only on HN would write/believe. Missed the boat on what? Giving away free images from a prompt?
This is all early days and these demos are neat but the real value is yet to be seen. Maybe when this technology is licensed and integrated into Photoshop or Instagram or something like that.
Yes, it does feel they shot themselves in the foot.
Their marketing was excellent, but somehow pushed expectations too much and underdelivered. It also felt very elitist. Not very tinkers in a garage that this generation of stable diffusion feels like.
Not from what I've seen. I took some prompts I saw used in Midjourney and was getting relatively lame results from DALL-E. Not a great comparison, but still.
Can you link to the hundred-page document? I believe you are talking about prompt engineering, and I would love to get more information about it. I am struggling with figuring out good prompts.
Agreed. That miss was brutal, although freely distributed will always best gated experience in my book. Stable Diffusion is already in everyones hands.
It's easier to compete with free (most paid products do) if most of the people interested in AI generated art have been paying for their service for months rather than browsing for alternatives. Especially since their supposed advantage is better prompt understanding rather than image quality; easy to dismiss StableDiffusion if your first impressions of it are "doesn't understand me like DALL-E" rather than "wow, this is magic"
The "waitlist" model might work when the product isn't ready for prime time or the exclusivity is a part of the pitch, but it's greatly overrated in other respects. I got a "The Wait Is Over" email to tell me I'm off the waitlist and able to use a not-exactly-new stock trading app this week as the UK economy crashed. Yeah, thanks, but no thanks...
I think it depends a lot on what you mean by "better output"
DALL-E is very good at conceptually representing complex prompt. Something like "a bear with a diving mask surfing in the ocean, a pelican is sitting on its shoulder", DALL-E will immediately produce coherent results, while SD requires lot of prompt tuning, and sometimes it's even impossible to get it to represent some concepts (I haven't tested this particular prompt tho)
SD is good for producing "artistic" images if that makes any sense
edit: ok I tried the "surfing bear" prompt with DALL-E 2 and SD and the results are consistent with my point, I put the raw prompt without tuning, and cherry picked the best image out of 4 with both models, here is what I got :
It's a little heartbreaking because arguably, OpenAI tried to do the responsible thing here: come up with a sustainable business model to make AI-generated images profitable while respecting trademarks and controlling for some objectionable content. Very corporate; very above-the-board.
Emad Mostaque, a millionaire hedge-fund manager with money to burn, spent approximately $600,000 to train a model and dumped it out for public consumption: no account for how it will be used, no concern about any sociopolitical consequences, damn the torpedoes and straight ahead. He basically burned down a potential industry space and hugely complicated an ongoing conversation on how these tools will interact with / disrupt the lives and livelihoods of artists... But he also basically changed the world overnight. Hashtag-squad-goals, am I right?
There's a lesson to be learned here. I haven't decided what it is yet. Though I note that it's a lesson that probably applies to few people who don't have $600,000 to set aflame.
> dumped it out for public consumption: no account for how it will be used, no concern about any sociopolitical consequences
I hadn't heard the story of how stable diffusion was created. Sounds like the guy is a true hero from your description. And only for $600k? Imagine if he decided to "burn" the rest of his millions on similar initiatives.
Furthermore, the pricing model is much worse for DALL-E than any of its competitors. DALL-E makes you think about how much money you're losing continuously - a truly awful choice for a creative tool! Imagine if you had to pay photoshop a cent every time you made a brushstroke. Midjourney has a much better scheme (and unlimited at only 30/month!), and, of course, Stable Diffusion is free.
This is a step in the right direction, but I feel that it is too little, too late. Just compare the rate of development. Midjourney has cranked out a number of different models, including an extremely exciting new model ("--testp"), new upscaling features, improved facial features, and a bunch more. They're also super responsive to their communtiy. In the meantime, OpenAI did... what? Outpainting? (And for months, DALL-E had an issue where clicking on any image on the homepage would instantly consume a token. How could it take so long to fix such a serious error?) You have this incredible tool everyone is so excited to use that they're producing hundred-page documents on how to get better results out of it, and somehow none of that actually makes it into the product?