I think it's so interesting and positive that Stable Diffusion has come out and absolutely destroyed DALL-E as a product. What are the best examples of DALL-E being integrated into a product? Are there any at all?
I am soooooooooo glad this happened. OpenAI has defaulted on their promise of openness and it seems a lot of models are gatekept by profiteering and paternalistic moralism
All it takes is one good, actually open project to sidestep all of their chicanery.
Yes, and the end of the world they predicted if text-to-image technology got into the hands of bigots or if celebrities/politicians were deep-faked into odd situations, etc ... just has not materialized. I really think (and always thought) that the whole ethical reason for withholding access was bullshit! Same with generative text from prompts.
For one, software takes a bit to diffuse into the public usage, secondly if you were the victim of blackmail or some other such criminal activity that was perpetrated with the use of such a system you wouldn't raise your hand immediately -- you'd get clear of the problem -- and then afterwards depending on how embarassing the situation is you'd speak out publicly. Many victims will never identify the methods used, and most will never speak out publicly.
In other words, systems like this could be being used to harass people and it'd take a bit of time before 'the public' was ever very aware of it being an ongoing problem.
I'm pretty sure we haven't seen the deepfakes we are expecting yet. Give it a couple months for grainy photos of Clinton in the basement of a pizza parlor with handcuffs in the background.
I think those ethical concerns are very real. It's just inevitable that this technology will be used for nefarious purposes. But withholding access to the tech, can't do anything to avoid the inevitable.
Well there's also the problem that if technology is possible but kept proprietary, then it is not expected that it will be used. This increases its power.
A world where deep fakes are well known and accessible technology undermines their nefarious utility (the opposite of course also being a problem: actual video evidence now potentially being undermined).
There are some people making images that almost anyone would find offensive. But I haven't noticed those images having any impact at all, at least not yet.
That's usually how innovation stagnation happens. Tons of examples all around. Intel and AMD were fighting, at one point Intel got a solid lead on AMD but eventually they became to overconfident and became lazy. Same will probably happen with AMD eventually, just a matter of how long the cycles are.
OpenAI decided keep it for themselves, their tech was impressive but they didn't have a killer app and they tried to prevent the inevitable by restricting the use of their machine.
StableDiffusion might be inferior to DALL-E in some aspects but they build a community with full access to the tooling and that community is much more likely to find a killer app for this impressive tech.
It's kind of ironic that OpenAI is losing out due their closed ways and desire for control.
The people who first originated the clip guided diffusion approach (rivershavewings around this time last year) are now working for stable diffusion so it's somewhat arguable that dalle wasn't actually first (just first to make a user friendly saas for it).
You need CLIP to have CLIP guided diffusion. So the current situation seems to trace back to OpenAI and the MIT-licensed code they released the day DALL-E was announced. I would love to be corrected if I've misunderstood the situation.
You're totally right, OpenAI released CLIP in january. But I mean CLIP isn't an image generator, it's just a classifer. If we restrict the question to actual text to image generators (ignoring deep dream or some of the 'kinda cool but far from the coherency of post-2021 generators') then clip guided diffusion is kinda the first.
This has happened so many times in the past 50 years in tech: one company creates a superior technology that has small sales compared to the bigger company that commodifies it at scale either through marketing or capex dollars, and thus squashes the smaller inventing company into irrelevance (or out of existence).
BetaMax > VHS
Xerox Park > [ Mac > Windows ]
PS2 Bus > ISA Bus
AltaVista > Google
I'm curious to see if Tesla is hit by this now that the big automakers finally woke up and realized there's a colossal market for low-cost EVs.
Whatever API they release is going to be way more restrictive than what people can do with Stable Diffusion. I doubt we’ll see anywhere near the same amount of integrations unless OpenAI just lets you download the weights and run Dalle locally.
We haven't seen any integrations but that doesn't mean we have any idea how many users DALL-E has. Stuff showing up on Hacker News isn't a good proxy for this.
Stable Diffusion easily has 100x more active users that Dall-e; this based on stats OpenAI released and private sources I been able to dig up. Rumored that Stability AI is in process of raising additional funding at over a billion dollar valuation. Unless OpenAI rapidly changes course, only matter of time before they are footnote in history of AI in my opinion, since Stability will likely rapidly go after every single current offering they have, including GPT3/CoPilot.
Well, I guess it's fortunate that OpenAI the company is owned by a nonprofit that's devoted to research. As long as they get funding they can keep doing research.
> OpenAI LP is governed by the board of the OpenAI nonprofit, comprised of OpenAI LP employees Greg Brockman (Chairman & President), Ilya Sutskever (Chief Scientist), and Sam Altman (CEO), and non-employees Adam D’Angelo, Reid Hoffman, Will Hurd, Tasha McCauley, Helen Toner, and Shivon Zilis.
> Our investors include Microsoft, Reid Hoffman’s charitable foundation, and Khosla Ventures.
Also, [2] is interesting reading. It's not official (the author writes "views expressed are my own"), but it's written by a researcher at OpenAI and was reviewed by people on the board of directors.
Thanks, no idea how I missed reviewing the about page — and the recent post on LessWrong was interesting too; in fact, resubmitted it on HN, since I am curious what others make of it:
I'm currently trying to put 1000x wallpaper seamless textures into UE5 Marketplace. I'm saddened to see this news.^^. Well, fuck money anyway right? Here's a tip, you can produce all you need if you follow this guide:
For textures: Once you have generated the color map (diffuse) from StableDiffusion, you can use CrazyBump to BATCH create the normal map and displacement map. I'm currently at my 200th file iteration.
http://www.crazybump.com/
CrazyBump, all the cool kids are using it.
Now this is where I'm at. Call me crazy. I'm forgetting stuff surely, but it's the best I can do. Go and change the world.
Oh, it generates from a text prompt, not a sample texture. I thought this was just a tool to generate wrapped textures from non-wrapped ones.
The licensing is a mess. The Blender plug-in is GPL 3, the stable diffusion code is MIT, and the weights for the model have a very restrictive custom license.[1] Whether the weights, which are program-generated, are copyrightable is a serious legal question.
Shouldn't the arguments for applying copyright to photographs apply nicely to applying copyright to ML weights too? Sure, the output is generated by a machine, but the output is also created by the creative choices of the machine's user.
If anything, it would seem to me that photographs had a much better case to being uncopyrightable, with them being mechanistic reproductions of existing reality.
The license isn't particularly restrictive to users, but it does put restrictions on developers.
For example, the way I interpret it, if you write software that incorporates Stable Diffusion then it's not legal to use any GPL-licensed code, or vice versa (thanks to the "use-based restrictions").
(Of course, you could argue that this isn't much more "restrictive" than the GPL itself is.)
Pretty ironic to assert copyright on the weights while ignoring it on the training data the produces the weights. Are AI practitioners foolish enough to tempt that fate?
Very cool. There are some really interesting opportunities to integrate stable-diffusion into many creative apps right now. It's neat to see it all happening at once.
As a developer and past indie dev that creates awful art for any projects I take on, this is incredibly exciting. We're getting closer to the reality where an engineer can build a full game or experience completely on their own and maintain a high level of artistic quality.
Unfortunately none of the three textures shown as examples in the README are seamless pattern textures. That would have completely driven the point home. I really like the idea though.
I feel like that shouldn't be too hard to add. One of the all-in-one solutions for Stable Diffusion[1] has this feature already. I don't know how it is implemented but it does kind of seem once one of them gets a new feature they all get that new feature before long.
It already supports it (it uses the lstein fork, which most projects use. Probably your link does too).
Just none of the sample images in the readme make use of the seamless feature. It would be nice to see what they look like without installing everything.
I can't wait until these kinds of tools are usable live. I'd love open worlds with unique character interactions and scenery. I'm always incredibly disappointed when I've exhausted a game's content or when portions of content are obviously built on some simplistic pattern, either visual or interactive.
Aren't indie games already dangerously close to the commodity space? Steam is overwhelming these days. There are dozens of games I can build bridges in or new interesting strategy games. How is any one developer supposed to capture enough market share to make any money of their work? I am worried that tools like this will just lower the barrier even more.
Maybe its a good thing because it will allow indie devs to spend less time/money on art.
Yip. I think a large motivation for many games is not to make money but to make something that you personally want that isn't already out there that, where the money is just a nice perk.
"UnReal World", for the most extreme example I know of, was released and has been in development for more than 3 decades. It's still receiving regular updates, with the dev kind of mixing game and life. It's a game about surviving in the Finnish wilds, by a dev who lives out in the middle of the Finnish wilds.
The barrier of entry has been on the floor ever since Steam discontinued Greenlight and started just allowing everyone on the platform. But at the same time they invested a lot in better content discovery: personalized recommendations, the discovery queue, curators you can follow, etc.
If you're building the next rehash-of-popular-concept, this asset generator at best saves you a couple minutes shopping the Unity Asset Store, and selecting the right store-bought texture in blender. But it will raise the bar of what's possible with new, innovative settings, which I'm really looking forward to.
> How is any one developer supposed to capture enough market share to make any money of their work? I am worried that tools like this will just lower the barrier even more.
In an ideal society, everyone has time, energy, and resources to create art themselves just because it makes them happy, as opposed to having to turn a profit.
Maybe finally the pretending-my-programmer-art-is-a-super-opinionated-stylistic-choice-to-go-with-retro-pixel-art-and-not-just-because-it's-so-much-easier-not-to-hire-an-artist fad can be — if not laid to rest — perhaps toned down a bit.
These tools wont replace artists or needing some sort of artistic sense - there are several indie games that had professional artists working on the assets but the developers behind them completely massacred their art.
As an example check out Frayed Knights on Steam - i really like the game and think it is both very fun and a very competent freeform blobber RPG, but despite the author having help from artists (and he even worked in some beloved PS1 games himself so he wasn't an amateur at it), the visuals are downright ugly - the UI even looks worse than the default Torque theme! The fact that the game was shipped with what it looks like a rough placeholder made in MS paint for the inventory background, tells me that the only reason for that is that the developer (whom, do not get me wrong, i otherwise respect, just not when it comes to visuals) is blind when it comes to aesthetics (which is a shame because the actual game is both very humorous and has an actually deep character system - but due to the visuals it was largely ignored).
This wont be solved by AI, at the end of the day someone will have to decide that something looks good and someone will have to integrate whatever output the AI creates with the game.
What will actually happen is that people with some artistic skills will be able to do things faster than they were able before - it will improve several types of games (i.e. those whose art styles fit whatever the AI creates), but it wont let someone without artistic skills suddenly make high quality art assets.
Now-- someone figure out how to setup the boundary conditions so that it can fill in penrose or Jarkko Kari's aperiodic wang tilings to efficiently get aperiodic textures.
If you fill in a set of these tiles with the right edge rules, then you can just formulaically fill a plane and get a non-repeating texture without generating unreasonable amounts of SD output.
I’m kind of overwhelmed by this stuff at the moment.
On the one hand, it’s very clear by now that this new generation of AI is absolutely game changing. But for what? It feels a bit like discovering oil in 1850.
I’ve spent years false starting on personal game dev projects (all for fun and learning) because I have zero art skills.
The ability to ask OpenAI for an entire set of game at work has changed everything for me.
I’m positively jubilant about the ability to just ask for textures and such. Not necessarily ideal for some master vision of a professional game, but a game changer for people like me. Also probably great for proof of concept and such.
I recently got into Blender addon development and have been really pleasantly surprised with how extensive its Python API actually is — it's not perfect yet, but it's certainly very very useful! My personal wishlist for the future still has things like direct memory access to volume trees, shipping with pyopenvdb, etc, but I'm pretty impressed with how extensible Blender is by default.
That's because Stable Diffusion is built with PyTorch. Which isn't optimized for anything but CUDA. Even the CPU is a second class citizen there. Let alone AMD or other graphics.
Not saying PyTorch doesn't run on anything else. You can but those will lag and some will be hackish.
Looks like Nvidia is on its way to be the next Intel.
Part of this is simply that AMD does a TERRIBLE job of writing tooling and software for anything that isn't just "render these triangles for this videogame". Doing raw compute things with AMD GPUs just seems limited to those building supercomputers apparently. Their promised "cross GPU" solution in ROCm is only available on a tiny fraction of the GPUs they make, seemingly without architecture excuses for why it isn't available on 5000 series cards, it took them YEARS to provide a backend that blender could actually use, productively and without crashes and bugs, and their drivers are just in general very fragile.
It's weird to me how much lip service AMD puts into making cross platform, developer friendly, free and open GPU compute standards, and then turn around and just not do that.
>Looks like Nvidia is on its way to be the next Intel.
There's a non-trivial part of several industries who do not want to see that happen, because NVidia treats its customers and partners orders of magnitude worse than Intel ever did.
From the Arch wiki, which has a list of GPU runtimes (but not TPU or QPU runtimes) and arch package names: OpenCL, SYCL, ROCm, HIP,: https://wiki.archlinux.org/title/GPGPU :
> GPGPU stands for General-purpose computing on graphics processing units.
> The combination of the limited Cycles split kernel implementation, driver bugs, and stalled OpenCL standard has made maintenance too difficult. We can only make the kinds of bigger changes we are working on now by starting from a clean slate. We are working with AMD and Intel to get the new kernels working on their GPUs, possibly using different APIs (such as CYCL, HIP, Metal, …).
> hipify-clang is a clang-based tool for translating CUDA sources into HIP sources. It translates CUDA source into an abstract syntax tree, which is traversed by transformation matchers. After applying all the matchers, the output HIP source is produced.
> HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code. [...] Key features include:
> - HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
> - HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
> - HIP allows developers to use the "best" development environment and tools on each target platform.
> - The [HIPIFY] tools automatically convert source from CUDA to HIP.
> - * Developers can specialize for the platform (CUDA or AMD) to tune for performance or handle tricky cases.*
Stable Diffusion works fine for me on my RDNA2 gfx1031 ( RX6700 XT ) under Debian. If you are on a Linux kernel with amdgpu, have your distro's latest version of ROCm HIP runtime, and a relatively recent AMD GPU, you just need to replace the default pytorch with the ROCm HIP version.
# Install your distro's HIP runtime and rocminfo
$ apt install rocminfo hip-runtime-amd # for debian and derivatives
# Confirm ROCm is installed properly and confirm your gpu is supported
$ rocminfo
ROCk module is loaded
...
Name: gfx1031
Marketing Name: AMD Radeon RX 6700 XT
...
# Must uninstall any non-ROCm torch libs
$ pip3 uninstall torch torchvision torchaudio
# Install latest ROCm pytorch, see website for more versions
$ pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
# Required env for unofficially supported newer gpu's like gfx1030 & gfx1031 that give error "hipErrorNoBinaryForGpu", can be added to /etc/environment to make permanent
$ export HSA_OVERRIDE_GFX_VERSION=10.3.0
# Test torch HIP is working
$ python3
>>> import torch
>>> torch.cuda.is_available()
True
>>> quit()
# Run Stable Diffusion workloads like normal...
Hope this points people in the right direction with AMD GPUs.
Blender can also take advantage of the installed HIP runtime for render acceleration in the latest versions too once you enable the setting in preferences.
> Whenever I ask for something like ‘seamless tiling xxxxxx’ it kinda sorta gets the idea, but the resulting texture doesn’t quite tile right.
Getting seamless tiling requires more than just have "seamless tiling" in the prompt. It also depends on if the fork you're using has that feature at all.
https://github.com/lstein/stable-diffusion has the feature, but you need to pass it outside the prompt. So if you use the `dream.py` prompt cli, you can pass it `"Hats on the ground" --seamless` and it should be perfectly tilable.
This is exactly the kind of post that keeps me coming back to HN. I've been using AI for concept art, but hadn't even thought about it for texturing. And now we have a plugin for Blender to do exactly that, simply incredible!
Very exciting work. However, currently I would not yet make it part of my workflow as it does not output normal and roughness maps. However, I see that those are listed in 'Future Directions'.
I'm not convinced either way yet. We're just now starting to crack this problem open.
Intuitively it seems like we could represent stable 3D positions and geometry that then passes through a separate shading stage.
Use a model to replace only the engine and game logic itself. Faster physics and rules than a physics engine. Much less human engineering needed to build a final product.
Add a final pass through an ML upscaler for photorealism or cartoon crispness.
I'd gamble 50-50 that this is an existential risk to Epic Games. I'd also gamble 50-50 that we see this in five years.
I can't think of any incumbent right now that is safe from disruption.