Show HN: I stripped DALL·E Mini to its bare essentials and converted it to Torch

andybak · on June 28, 2022

If people just want to run text to image models locally by far the easiest way I've found on Windows is to install Visions of Chaos.

It was originally a fractal image generation app but it's expanded over time and now has a fairly foolproof installer for all the models you're likely to have heard of (those that have been released anyway).

https://softology.pro/tutorials/tensorflow/tensorflow.htm

songeater · on June 28, 2022

also, https://pollinations.ai/

Needs a google sub to run colab (DALL-E itself needs Colab Pro, but other models run on free version).

edit: not local! but very handy.

sAbakumoff · on June 29, 2022

I can't make any of the image generation worked - they all fail with the error message. Did you have luck with any of them?

ccbccccbbcccbb · on June 28, 2022

not really surprising, but

bare minimum GPU: NVIDIA 2080 with 8GB VRAM

300 Gb of disk space

</caveat emptor>

leereeves · on June 28, 2022

It should probably have an option to download just one model instead of all of them. 300GB is a lot.

ccbccccbbcccbb · on June 28, 2022

Alas, the very first point was already a no-go for me

shreyshnaccount · on June 28, 2022

I thought they don't use the gpu for inference?

shreyshnaccount · on June 28, 2022

where did you get these? couldnt find it, mightve missed smth

leereeves · on June 28, 2022

Thank you, I've been looking for something like that and it looks very cool, judging by this tutorial that shows it in action, as it creates an image and displays the results in progress:

https://youtu.be/4_LgrAL7EWg?t=163

jquaint · on June 28, 2022

Visions of Chaos is amazing.

Its great if you want to run more "classic" AI algorithms as well!

langitbiru · on June 28, 2022

I guess we will have text-to-image startups in the next batch of YC.

upupandup · on June 28, 2022

Pornhub is working on something I hear

harpersealtako · on June 28, 2022

It's only a matter of time before we get a good NSFW image generation algorithm. Text erotica generation is already a not-insignificant part of the public AI world (remember AI dungeon? Most people used it for porn). The question is whether it's going to come out of the established adult industry or not. There are clear benefits (no real humans needed anymore, personalized fetish material forever, etc.), the only issue is whether they're willing to deal with the inevitable bad press (e.g. fake images of real people, taboos like underage content or disturbing imagery). I wouldn't be surprised if any models from the adult industry will basically be heavily gimped from the start to avoid liability.

upupandup · on June 28, 2022

I feel like anybody, especially female, who have high res photos/videos of themselves nude or not on the internet are going to wake up one day that they have turned into a pornstar.

For example, kpop idols. There are high rest 60fps fancams of them dancing, there are high res broadcasts, images from all angles.

There's already a huge market for deep fake kpop and I believe that they are in for a reckoning.

which is an awful awful thing ;)

flycaliguy · on June 28, 2022

Your last sentence combined with your introductory visual of somebody “waking up” one day as a pornstar is pretty disturbing. Anybody waking up to discover they are the subject of pornography would be experiencing a traumatic public humiliation. Check your head.

BTW, basically every “female” along with male out there has high resolution photos on social media.

upupandup · on June 29, 2022

> traumatic public humiliation. Check your head.

and this is my fault how?

rexpop · on June 29, 2022

It's your lecherous winking that's so pathetic.

cfcf14 · on June 30, 2022

I agree with the other commenter and feel the connotations of your post, the hints at your enjoyment at the prospect of this potential future - are disturbing.

khalladay · on June 28, 2022

It would be an awful thing and it's creepy that you seem to think otherwise.

harpersealtako · on July 5, 2022

I'm not sure whether you actually think this is good or bad but honestly I bet it would actually be a net positive, in a "if everybody's a pornstar, nobody is" kind of way. If you can generate nsfw video from a picture of a face with the press of a button, it will probably cease to be a thing that matters to anybody within a generation or two, in the same way we think about Photoshop today. It might be a bit of an awkward transition though since it definitely offends most modern people's sensibilities.

Lets be real though, it's the anime-style generation models that are going to be the pioneers in this field.

pmarreck · on June 28, 2022

Possibly related: I was at Cornell when there was a guy who was an excellent artist and was making money by selling pornographic art of other actual students. Needless to say, a stop was put to this once word got out...

nudpiedo · on June 28, 2022

wandb forbids reusing the models and other information, independently of their usage, so they should find another source for their models

EDIT: as I am being accused of inventing it I will quote the terms of agreement and license, since maybe its own founder seems to not have read it or someone without training on how to write proper terms and agreements made it for them and the restrictive usage of "Material" does apply to its hosted software.

Note that there is no formal definition of "Materials" or "Service", so that it applies to all the contents of the webpage including the software stored there: https://wandb.ai/site/terms

I quote it:

2. Use License Whether you are accessing the Services for personal, non-commercial transitory viewing only (our free license for individuals only), for academic use, or for commercial purposes (our subscription package for businesses), permission is granted to temporarily download one copy of the information or software (the “Materials”) from our website. This is the grant of a license, not a transfer of title, and under this license, you may not: a. Modify or copy the Materials; b. Use the Materials for any commercial purpose, or for any public display (commercial or non-commercial); c. Attempt to decompile or reverse engineer any software contained in the Materials; d. Remove any copyright or other proprietary notations from the Materials; or e. Transfer the Materials to another person or "mirror" the Materials on any other server. This license shall automatically terminate if you violate any of these restrictions and may be terminated by us at any time. Upon terminating your viewing of these materials or upon the termination of this license, you must destroy any downloaded materials in your possession whether in electronic or printed format. f. Utilize our personal license for individuals for commercial purposes and any such use of our personal license for commercial purposes (e.g. using your corporate email) may result in immediate termination of your license.

slewis · on June 28, 2022

Founder of Weights & Biases here (wandb). We don’t forbid anything, models are property of the people who created them.

Why do you think that?

EDIT: I'll edit respond, since you did. Look at sections 3b and 3c in the terms, they cover Models and other user content specifically. Those are user property, not our property. But I can see how this is confusing. We will clarify it.

nudpiedo · on June 28, 2022

Did you write your own terms of agreement rather than a lawyer? I signed up today and that was explicitly written in the license agreement, point 2. Note that there is no formal definition of "Materials" or "Service", so that it applies to all the contents of the webpage including the software stored there, and as soon as anything is ambiguous the interpretation is free (or random).

https://wandb.ai/site/terms

I quote it:

2. Use License Whether you are accessing the Services for personal, non-commercial transitory viewing only (our free license for individuals only), for academic use, or for commercial purposes (our subscription package for businesses), permission is granted to temporarily download one copy of the information or software (the “Materials”) from our website. This is the grant of a license, not a transfer of title, and under this license, you may not: a. Modify or copy the Materials; b. Use the Materials for any commercial purpose, or for any public display (commercial or non-commercial); c. Attempt to decompile or reverse engineer any software contained in the Materials; d. Remove any copyright or other proprietary notations from the Materials; or e. Transfer the Materials to another person or "mirror" the Materials on any other server. This license shall automatically terminate if you violate any of these restrictions and may be terminated by us at any time. Upon terminating your viewing of these materials or upon the termination of this license, you must destroy any downloaded materials in your possession whether in electronic or printed format. f. Utilize our personal license for individuals for commercial purposes and any such use of our personal license for commercial purposes (e.g. using your corporate email) may result in immediate termination of your license.

slewis · on June 28, 2022

Thank you for responding!

I am certainly not a lawyer, however, 3b and 3c (from the terms link you posted) state that user content, including specifically Models, are property of the user.

Are you saying you think there is a conflict between 2 and 3b, 3c, or did you miss section 3?

``` 3. Intellectual Property & Subscriber Content

a. All right, title, and interest in and to the Services, the Platform, the Usage Data, the Aggregate Data, and the Customizations, including all modifications, improvements, adaptations, enhancements, or translations made thereto, and all proprietary rights therein, will be and remain the sole and exclusive property of us and our licensors.

b. All right, title, and interest in and to the Subscriber Content, including all modifications, improvements, adaptations, enhancements, or translations made thereto, and all proprietary rights therein, will be and remain Subscriber’s sole and exclusive property, other than rights granted to us to enable (i) Subscriber to process its data on the Platform, and (ii) us to aggregate and anonymize Subscriber Content solely to improve Subscriber's user experience.

c. Subscriber Content means any data, media, and other materials that Subscriber and its Authorized Users submit to the Platform pursuant to this Agreement, including, without limitation, all Models and Projects, and any and all reproductions, visualizations, analyses, automations, scales, and other reports output by the Platform based on such Models and Projects. ```

nudpiedo · on June 28, 2022

Sorry Slewis, I cannot reply to your other comment with another subcomment.

This is an ambiguation and it is an issue that as founder you should address as it could be interpreted as a self contradictory agreement and then invalidate part of the agreement (as I've seen with non formal open source licenses, ill formed patents which became bypassed, etc).

A way to solve that might be using a glossary of what do you mean by each term, however I do recommend you to use a lawyer for such things as this sort of mistakes might become expensive later on.

slewis · on June 28, 2022

Yes, I can certainly see how this is confusing. We will work with our legal team to clarify the terms.

no_identd · on July 4, 2022

Go buy a copy of https://www.survivingiso9001.com/ [No affiliation] for your lawyers. Not directly for the subject matter, but as a cautionary tale about consequences of word choice.

idontknowifican · on June 28, 2022

[flagged]

nudpiedo · on June 28, 2022

Please abstain yourself of writing any sort of derogatory comment without provide any constructive argument to the conversation, and just downvote the comment, or even better: follow how the conversation follows.

I also recommend you to read the guidelines on how to post in the HN community, check the section regarding how and what to comment in the link here:

https://news.ycombinator.com/newsguidelines.html

EDIT: added link

dalle-world · on June 28, 2022

I spun up an aws ubuntu ec2 with 2 Tesla M60. When I run python3 image_from_text.py --text='alien life' --seed=7

I get this error detokenizing image Traceback (most recent call last): File "/home/ubuntu/work/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/ubuntu/work/min-dalle/min_dalle/generate_image.py", line 74, in generate_image_from_text image = detokenize_torch(image_tokens) File "/home/ubuntu/work/min-dalle/min_dalle/min_dalle_torch.py", line 107, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/ubuntu/work/min-dalle/min_dalle/load_params.py", line 11, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/usr/local/lib/python3.10/dist-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

pmarreck · on June 28, 2022

I get a similar error running it locally (not sure if related, but it also can't find my GPU, which is a 3080ti and should be sufficient): Traceback (most recent call last): File "/home/pmarreck/Documents/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/pmarreck/Documents/min-dalle/min_dalle/generate_image.py", line 75, in generate_image_from_text image = detokenize_torch(torch.tensor(image_tokens)) File "/home/pmarreck/Documents/min-dalle/min_dalle/min_dalle_torch.py", line 108, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/pmarreck/Documents/min-dalle/min_dalle/load_params.py", line 12, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/home/pmarreck/anaconda3/lib/python3.9/site-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

smcleod · on June 28, 2022

Almost worked on my 2021 Macbook Pro (M1 Pro) - https://github.com/kuprel/min-dalle/issues/1#issue-128676523...

bart3r · on June 28, 2022

I had exactly the same issue

sam1r · on June 28, 2022

Seems like there is a fix.

https://github.com/kuprel/min-dalle/issues/1#issuecomment-11...

smcleod · on June 28, 2022

Working now!

pmarreck · on June 28, 2022

Can anyone give instructions for M1 Max MBP? I had a compilation issue in building the wheel for psutil that looks like "gcc-11: error: this compiler does not support 'root:xnu-8020.121.3~4/RELEASE_ARM64_T6000'" (gcc doesn't support ARM yet?)

What toolchain will get it working on Mac?

smcleod · on June 28, 2022

Which GCC are you using?

~ which gcc

  /usr/bin/gcc

~ gcc --version

  Apple clang version 13.1.6 (clang-1316.0.21.2.5)
  Target: arm64-apple-darwin21.5.0
  Thread model: posix
  InstalledDir: /Library/Developer/CommandLineTools/usr/bin

pmarreck · on June 29, 2022

Exact same results!

julien_c · on June 28, 2022

what's the inference time on M1?

tomduncalf · on June 28, 2022

Running the "alien life" example from the README took 30 seconds on my M1 Max. I don't think it uses the GPU at all.

I couldn't get the "mega" option to work, I got an error "TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16" (looks like a known issue https://github.com/kuprel/min-dalle/issues/2)

Edit: installing flax 0.4.2 fixes this issue, thank all!

klohto · on June 28, 2022

The thread now has a fix. As for the GPU, it's possible to get it working with some extra steps https://github.com/google/jax/issues/8074

Macbook Pro M1 Pro numbers (CPU):

    python3 image_from_text.py --text='court sketch of godzilla on trial' --mega   640.24s user 179.30s system 544% cpu 2:30.39 total

tomduncalf · on June 28, 2022

From reading that thread it didn't sound like GPU was fully supported yet, were you able to get it working?

tomduncalf · on June 28, 2022

Pretty much identical on M1 Max

python3 image_from_text.py --text='a comfy chair that looks like an avocado' 612.30s user 180.72s system 552% cpu 2:23.52 total

kuprel · on June 28, 2022

Thanks for catching this. I just updated it so that it should work with the latest flax.

lloeki · on June 28, 2022

This appears to have been fixed moments ago:

https://github.com/kuprel/min-dalle/commit/38ebe54a382f36dc7...

NhanH · on June 28, 2022

Change the flax version to 0.4.2 (currently 0.5.2) will work

So much for semver :(

Twisol · on June 28, 2022

0.y.z is kind of an "all bets are off" situation in semver: https://semver.org/#spec-item-4

> Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Variant schemes like the one respected by Cargo (https://doc.rust-lang.org/cargo/reference/semver.html) aren't usually much different.

> Initial development releases starting with "0.y.z" can treat changes in "y" as a major release, and "z" as a minor release.

mikewarot · on June 28, 2022

What is wandb.ai, and Why does it keep asking for an API key?

It's not listed in the requirements

I've posted it as an issue

slewis · on June 28, 2022

Founder of Weights & Biases here (wandb). There are a couple issues raised in this thread: api key shouldn’t be required to download a public model, cache in home directory is annoying for this case. We will fix them.

kuprel · on June 28, 2022

Thanks for the tip! I just updated the colab to login anonymously

slewis · on June 28, 2022

Oh, great! HN as bug resolution mechanism++.

mmcnl · on June 28, 2022

And what about the install script? That one is still failing.

nl · on June 28, 2022

> What is wandb.ai

Weight & Biases

> Why does it keep asking for an API key

From the README:

the Weight & Biases python package is used to download the DALL·E Mini and DALL·E Mega transformer models

It might not be obvious you need an account if you aren't in the field though.

Aardwolf · on June 28, 2022

Why is this needed to download the model?

I'd prefer to download it myself and choose where I put it too.

It now uses some hashed filename in some config directory in your homedir for this, I dislike this and want control over where I put models, make it more self contained instead of random directories spread all over your OS, and give them as input by file path.

This feedback is about dalle mini playground instead but it does the same thing. If this one is stripped to bare essentials I'd expect this type of dependencies stripped too.

Edit: I don't want to seem like complaining too much though and am very happy with these open models and tooling for them. Thanks!

carnitine · on June 28, 2022

It’s not, it just makes it easier. Should be pretty simple to modify the code to work the way you want.

h0mie · on June 28, 2022

Saas dashboard for monitoring/mertics

kertoip_1 · on June 28, 2022

Does it just download pre-trained DALL-E Mini models and generate images using them? Because I can't seem to find any logic in that repo other than that. I'm not into that field, just curious if I'm missing something.

joshvm · on June 28, 2022

To add to the sibling comment. The challenge is not converting the weights as such. Pre-trained model weights are just arrays of numbers with labels that identify which layer/operation they correspond to in the source model. The challenge is expressing the model in code identically between two frameworks and programmatically loading the original weights in, since these models can have hundreds of individual ops. Hence why you can't just load a PyTorch model in Tensorflow or vice versa.

There are tools to convert to intermediate formats, like ONNX, but they are limited and don't work all the time. The automatic conversion tools usually assume that you can trace execution of the model for a dummy input and usually only work well if there isn't any complex logic (e.g. conditions can be problematic). Some operations aren't supported well, etc.

This isn't always technically difficult, but it's tedious because it usually involves double checking that at all steps, the model produces identical outputs for a given input. An additional challenge when transferring weights is that models are fragile and minor differences might have large effects on the predictions (even though if you trained from scratch, you might get similar results).

Also for deployment, the less cruft in the repository the better. A lot of research repositories end up pulling in all kinds of crazy dependencies to do evaluation, multiple big frameworks etc.

mFixman · on June 28, 2022

I don't understand why execution of a model with the same layers and weights would be different between PyTorch and Tensorflow.

Is it a problem of accumulation of floating-point errors in operations that are done in a different order and with different kinds of arithmetic optimisations (so that they would be identical if they used un-optimised symbolic operations), or is there something else in the implementation of a neural network that I'm missing?

joshvm · on June 28, 2022

In principle you can directly replace function calls with their equivalents between frameworks, this works fine for common layers. I've done this for models that were trained in PyTorch that we needed to run on an EdgeTPU. Re-writing in Keras and writing a weight loader was much easier than PyTorch > ONNX > TF > TFLite.

Arithmetic differences do happen equivalent ops, but I've not found that to be a significant issue. I was converting a UNet and the difference in outputs for a random input was at most O(1e-4) which was fine for what we were doing. It's more tedious than anything else. Occasionally you'll run into something that seems like it should be a find+replace, but it doesn't work because some operation doesn't exist, or some operation doesn't work quite the same way.

davidatbu · on June 28, 2022

It's just that expressing those "layers and weights" in code is different in tensorflow and Pytorch. I think a good parallel would be expressing some algorithm in two programming languages. The algorithim might be identical, but JS uses `list.map(...)` and python uses `map(list, ...)`, and JS doesn't priority queues in the "standard lib" while Python does, ...etc. Similarly, the low level ops and higher level abstractions are (slightly) different in Pytorch and Tensorflow.

I'm not too familiar with Tensorflow, so I can't give an example there, but a similar issue I recently faced when converting a model from Pytorch to ONNX is that Pytorch has a builtin discrete fourier transform (DFT) operation, while ONNX doesn't (yet. They're adding it). So I had to express a DFT in terms of other ONNX primitives, which took time.

qayxc · on June 28, 2022

In principle all operations can be translated between frameworks, even if some ops aren't implemented in one or the other. This, however, depends on whether the translation software supports graph rewriting for such nodes.

Lambdas and other custom code are also problematic, as their code isn't necessarily stored within the graph.

nonameiguess · on June 28, 2022

Seems like it'll be a serious issue for people hoping we can someday upload human brains into machines if we can't even transfer models from TensorFlow to PyTorch reliably.

qayxc · on June 28, 2022

Unrelated problems, really. Having written such translation library, I can say with confidence that the only reason for this is lack of interest in it.

Graph to graph conversion can be tricky due to subtle differences in implementation (even between different versions of the same framework), but it's perfectly possible, though not many utilities go all the way to graph rewriting if required.

Problems arise with custom layers and lambdas, which are not always serialised with the graph depending on the format.

prionassembly · on June 28, 2022

Human brains have high degrees of plasticity -- our brain is much more generic than its usual functional organization would suggest. I don't think we'd be able to upload brains ("state vectors" was the sci-fi buzzword) before digital supports are able to emulate that.

ShamelessC · on June 28, 2022

They converted the original JAX weights to the format that Pytorch uses. Because JAX is still fairly new, it can be a lot easier to get Pytorch to run on e.g. CPU. I do find the number of upvotes interesting and I imagine many people just upvote things that have DALLE in the title, to a degree.

Not to discourage the OP of course, great work.

OJFord · on June 28, 2022

Look how much easier it is to install & run, people are interested in and up-voting the result, not how much work was (or wasn't) required to achieve it.

ShamelessC · on June 28, 2022

Fair enough - that makes sense, just haven't had a chance to dive into it yet.

avhon1 · on June 28, 2022

It still seems to require JAX somewhere to work.

On my desktop, running the example

> python image_from_text.py --text='alien life' --seed=7

results in

> RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source.

Unfortunately, following the instructions to build JAXlib from source (https://jax.readthedocs.io/en/latest/developer.html#building...) result in several 404 not found errors, which later cause the build to stop when it tries to do something with the non-existent files.

Unfortunately, it looks like I won't be running this today.

celticninja · on June 28, 2022

https://github.com/kuprel/min-dalle/issues/1#issuecomment-11...

phh · on June 28, 2022

It defaults to JAX. To use torch, add --torch parameter

etaioinshrdlu · on June 28, 2022

Anyone have some stats on inference time and RAM requirements? (on specific hardware)

bart3r · on June 28, 2022

I have a 2019 MacBook Pro 2.4Ghz Quad-core i5, 8GB RAM with Intel graphics card

  python3 image_from_text.py --text='a happy giraffe eating the world' --seed=7  154.61s user 22.18s system 262% cpu 1:07.40 total

  WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

As you can see, it took 1min 7seconds to complete.

I assume it would be much faster with a grunty graphics card

chrisa · on June 28, 2022

Using an rtx 3090 (NVIDIA gpu with 24GB of RAM):

Mini = 5.33 s

Mega = 14.7 s

Update: about 1/2 that time is just loading the model, so if you load the model and then generate multiple images, it drops to:

Mini = 3.91 s

Mega = 8.86 s

pja · on June 28, 2022

NB: Needs a weights & biases account in order to download the models.

lars_francke · on June 28, 2022

You can CTRL+C that prompt and it'll download them anyway but it'll tell you that you can't visualize your results then.

fartcannon · on June 28, 2022

Surly wandb is not a bare essential?

Borgz · on June 28, 2022

You can download the models yourself if you don't want to use it.

capableweb · on June 28, 2022

Where?

chrisa · on June 28, 2022

There are links on the readme there, or you can run:

> wandb artifact get --root=./pretrained/dalle_bart_mini dalle-mini/dalle-mini/mini-1:v0

and select option (3)

nemoniac · on June 28, 2022

Error: Project dalle-mini/dalle-mini does not contain artifact: "mini-1:v0"

pmarreck · on June 28, 2022

any way to download a pretrained dalle-mega model this way?

EDIT: wandb artifact get --root=./pretrained/dalle_bart_mega dalle-mini/dalle-mini/mega-1-fp16:v14

Note, it is a 4938MB download

jjallen · on June 28, 2022

I wish this requirement was in the README

lastdong · on June 28, 2022

Now we just need to containarise it (there are a few docker python nvidia images)

WheelsAtLarge · on June 28, 2022

Good Job. What are the hardware requirements?

JacobiX · on June 28, 2022

Interesting when testing with inputs like "Oscar Wilde photo" or "marilyn monroe photo" and comparing to a Google image search. After some iterations we can have quite similar images but the faces are always blurry.

bogwog · on June 28, 2022

That's intentional. When training the models, they filter out human faces and adult content among other things.

l33t2328 · on June 28, 2022

That’s annoying

mg · on June 28, 2022

What is the maximum resolution possible with this?

If it depends on the hardware, what would be the limit when one rents the biggest machine available in the cloud?

jwitthuhn · on June 28, 2022

Fixed size of 256x256. It cannot go any bigger or smaller.

01acheru · on June 28, 2022

Out of curiosity: why it cannot be changed? I know nothing about this field so... thanks!

ShamelessC · on June 28, 2022

Transformers output fixed-length sequences. For this transformer they chose 256 pixels, or 32 "image tokens" that each decode to an 8-by-8 pixel "patch".

You can technically increase or decrease this - or use a different aspect ratio by using more or fewer image tokens, but this is static after you start training. It will also require more "decodes" from the backbone VQGAN model (responsible for converting pixels to image tokens), and thus take longer to run inference on.

CLIP-guided VQGAN can get around this by taking the average CLIP score over multiple "cutouts" of the whole image allowing for a broad range of resolutions and aspect ratios.

petercooper · on June 28, 2022

It's already being scaled up to 256x256 from something smaller anyway. You could add an extra upscaler to go further which I've tried with moderate success, but you're basically doing CSI style 'enhance' over and over.

freemint · on June 28, 2022

Because that is how the network is trained. You could modify the network size and retrain to get different resolutions.

jcims · on June 28, 2022

The google collab link works if you replace the computed path to flax_model.msgpack on line 10 in load_params.py with ‘/content/pretrained/vqgan/flax_model.msgpack’

Edit: actually it's easier to open a terminal and move /content/pretrained/vqgan to /content/min-dalle/pretrained/vqgan

kuprel · on June 28, 2022

Thanks for figuring this out. The problem was that the vqgan repository was being cloned to the wrong directory. The updated colab should work now

DivineTraube · on June 28, 2022

The updated version gives me this (after successful setup with the example alien thing):

UnfilteredStackTrace Traceback (most recent call last) <ipython-input-2-0e20e3adf861> in <module>() 2 ----> 3 image = generate_image_from_text("alien life", seed=7) 4 display(image)

67 frames UnfilteredStackTrace: TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last) /content/min-dalle/min_dalle/models/dalle_bart_decoder_flax.py in __call__(self, decoder_state, keys_state, values_state, attention_mask, state_index) 38 keys_state, 39 self.k_proj(decoder_state).reshape(shape_split), ---> 40 state_index 41 ) 42 values_state = lax.dynamic_update_slice(

TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

jcims · on June 28, 2022

You need to install flax 0.4.2. If you're using collab you just open a terminal (icon in the bottom left of the screen) and run:

    pip3 install flax==0.4.2

kuprel · on June 28, 2022

Yes this is the fix for now. I need to address what is actually causing the dtype mismatch

bjarneh · on June 28, 2022

   TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16.

godmode2019 · on June 28, 2022

That dinosaur image has fantastic meme potential

amar-laksh · on June 28, 2022

People might also like this one: https://github.com/saharmor/dalle-playground Really easy to work with

enlyth · on June 28, 2022

I couldn't get this to pick up my graphics card when running it with WSL 2, it's just says no cuda devices found or something so I gave up, not sure if anyone had any luck

epicureanideal · on June 28, 2022

Free idea: Same but for making short video clips, and then eventually producing entire movies.

cnity · on June 28, 2022

Not sure why this is downvoted: seems like the inevitable endpoint of AI vision/image generation research and it warrants consideration and discussion.

danielbln · on June 28, 2022

https://github.com/THUDM/CogVideo

jwitthuhn · on June 28, 2022

Love this, before I only ever saw code that ran this model through jax. This seems to perform much better on my m1.

mahastore · on June 29, 2022

WTF is Wandb.ai? This seems like a sneaky way to get people to sign up for this wandb thingy.

ramesh31 · on June 28, 2022

Has anyone got this running on M1?

sAbakumoff · on June 28, 2022

The results are amazingly poor. Try "biden plays chess against napoleon"

tgv · on June 28, 2022

I tried a few non-descriptive statements from random tweets. As it turns out, nobody's made a random tweet since 2016, but for the few that exist, the results are great. E.g. "Good Morning Everyone , Happy Nice Day :D" generates something that can only be described as bored-ape meets Picasso in kindergarten. Probably the next-gen 1M$ NFT. If anybody needs proof that these models don't think, this is it.

JanSt · on June 28, 2022

What is the license of the generated artwork?

leereeves · on June 28, 2022

Still an unsettled question, but for now: U.S. Copyright Office Rules A.I. Art Can’t Be Copyrighted

https://www.smithsonianmag.com/smart-news/us-copyright-offic...

iggldiggl · on June 29, 2022

I think that decision refers to an attempt to getting it copyrighted with the AI/computer program as the author.

leereeves · on July 1, 2022

Alternatively, you could try to hide the fact that something was created by an AI, and probably get away with it, but that isn't relevant to the legal question of whether something can be copyrighted when it is known to be created by an AI.

witheld · on June 28, 2022

You pressed the button, it's owned by you, all rights are reserved by default.

Dangeranger · on June 28, 2022

This statement is not accurate[0].

> But copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the [human] mind.” COMPENDIUM (THIRD) § 306 (quoting Trade-Mark Cases, 100 U.S. 82, 94 (1879)); see also COMPENDIUM (THIRD) § 313.2 (the Office will not register works “produced by a machine or mere mechanical process” that operates “without any creative input or intervention from a human author” because, under the statute, “a work must be created by a human being”). So Thaler must either provide evidence that the Work is the product of human authorship or convince the Office to depart from a century of copyright jurisprudence.

[0] https://www.copyright.gov/rulings-filings/review-board/docs/...

witheld · on June 28, 2022

This is literally irrelevant, this is about attempting to register art as owned BY the computer

If you give the computer the instructions, such as “avocado chair”, the avocado chair is yours.

It wouldn’t be yours if it was something like a deep dream- if you ran the program with no input and generated a “random” work.

Dangeranger · on June 29, 2022

Please cite a source that confirms your claim, rather than stating it as a fact without evidence. You may be right, but I have no way of confirming that given the content of your comment.

The report I’ve cited makes a compelling argument against your claim, and several prominent organization’s copyright policies align with it.

iggldiggl · on June 29, 2022

Your quote says “without any creative input or intervention from a human author”.

Just hitting a "Generate random artwork" button indeed certainly doesn't seem to qualify as "creative output or intervention", but as for how DALL-E and consorts currently work, I'd say that coming up with a suitable prompt text, potentially refining it to get the output closer to what you want, curating the output, maybe using one of the output pictures as input for further processing, etc. arguably all are at least some amount "creative input or intervention from a human author".

cuteboy19 · on June 29, 2022

Is Dalle significantly different from Adobe Photoshop in the eyes of the law? In both cases you will use a software agent to create art. In fact CGI art has existed for decades. Surely this is a settled question

punk_ihaq · on June 28, 2022

Comment deleted

pilotneko · on June 28, 2022

I think there might be some session leakage. I typed “A pig with a bowler hat.” and the model returned a picture of a half moon.

ubertaco · on June 28, 2022

No matter what I typed, it always generated the same image of a moon half-covered in shadow. I think something might be a bit buggy with this.

buf · on June 28, 2022

I always generates an image of a moon for me.

mensetmanusman · on June 28, 2022

That’s a feature not a bug…

ausbah · on June 28, 2022

has anyone applied compression techniques to large models like dalle-2?

sydthrowaway · on June 28, 2022

Gamechanger

Clone before kill.

etaioinshrdlu · on June 28, 2022

I don't think this is infringing on anyone's rights.

sydthrowaway · on June 28, 2022

openai?

ShamelessC · on June 28, 2022

This project is done by volunteers unrelated to open ai.

c0n5pir4cy · on June 28, 2022

They've started changing the name in some places as well to avoid this kind of confusion - they've renamed the app to Craiyon as OpenAI have asked them to. (https://www.craiyon.com/#headlessui-disclosure-button-7)

coding123 · on June 28, 2022

Cloning....

daenz · on June 28, 2022

This stuff is so cool and it makes me happy that we're democratizing artistic ability. But I can't help but think RIP to all of the freelance artists out there. As these models become more mainstream and more advanced, that industry is going to be decimated.

mdp2021 · on June 28, 2022

Not differently to translators etc.: not required for every small task, still required for doing things professionally.

jokethrowaway · on June 28, 2022

Translators at least have official documents (aka the only times I see a translator in my life across 3 countries) because the government is retarded and needs someone with a title to translate "Name" and "Surname" on a birth certificate.

There is no equivalent for illustrators.

My friends who studied some specific language are all unemployed or doing unqualified jobs. Their peers from a generation before are teachers or work in some embassy.

That said, before some unicorn really start doing some serious polishing, you'll still want some illustrators to piece art together. Taking the output of these models won't deliver a ready made product easily.

londons_explore · on June 28, 2022

Lots of professional translators moved into language tuition.

I guess lots of artists will move into teaching art.

I see tools like this might increase interest by the public into making their own art with the help of new tools, and some will want to be taught.

mola · on June 28, 2022

Yeah, so a lot less work...

mdp2021 · on June 28, 2022

I will rephrase it: if people today are available to eat dirt instead of nourishment - etc. for innumerable instances -, to get contented with lack of quality (with the akin acceptance of consequential decline of the general perception of quality), the fault is more in decadence than in instruments.

You need well cultivated intelligence to obtain a good product: if "anything goes" is the motto, if "cheap" is the "mandate", there lies the issue.

shakna · on June 28, 2022

Only in the same manner that GPT-3 eliminates the need for writers. Or influencers remove the need for advertising.

That is, a surface-level view might show these things as equivalent, but the skills required to produce a decent result are not encapsulated in the averages that models contain.

mola · on June 28, 2022

I'm sure a lot of "content writers" for SEO spam will become obsolete. The content level is already rock bottom, so is easily replaced by brainless machines.

But I'm more bothered by sociatal effects where art is automated. I believe it'll expedite the effects we saw when the internet short circuited the feedback loop for creators, killing any gaps where non revenue optimizing humane creative force could thrive. Not to mention the crazy mimetic positive feedback loops tearing the discourse apart.

kortex · on June 28, 2022

I dunno. I've read a lot of GPT output. It lacks a certain consistency over medium scales. The big picture checks out, and the word-by-word grammar checks out, but the sentence-by-sentence information often isn't cohesive, or certain entity references change over time.

Text-to-image algos did the same thing for a while, but you look at the latest full-size DALL-E and it's pretty much flawless.

https://openai.com/dall-e-2/

If I were considering art school, I'd certainly be reconsidering my options. Maybe there are some defects in the output, but nothing photoshop can't fix.

I think where humans win out (for now) is where a high degree of specificity/precision is needed (e.g. graphic design). Or certain legal requirements are present - AI art can't be copyrighted at this time - such as logo design.

shakna · on June 28, 2022

Or most places where art is displayed and/or sold, because those places generally disparage purely digital art, and method is part of what goes into the valuation of the piece. "Oil on canvas" is worth more than "AI-assisted digital print", especially because duplicating it requires considerable effort.

marban · on June 28, 2022

Taste and empathy are tough to emulate.

_flux · on June 28, 2022

Dall-e2 sometimes pulls through big time, though: https://www.reddit.com/r/dalle2/comments/vbtqkw/dalle_really...

But it's not going to tell you in clear words if your prompt was bad to begin with, like a human would, hopefully :).

yetihehe · on June 28, 2022

Yeah, sometimes it pulls through, which means it still needs someone with taste and empathy to filter results.

mrtranscendence · on June 28, 2022

I really don’t think that this will be the case anytime soon. Images can be generated from zany prompts, but making a coherent, fits-together-well set of images for a product like a web page or an illustrated book is far off.

Further, artists have a host of skills that DALL-E doesn’t, like “take that image, but change the colors a bit to make it more acceptable to the client, and move the cartoon bird a little further down”. Or “make an image that will look as good in a print as it does on a small screen”.

dbddv01 · on June 28, 2022

"an illustrated book is far off". Hi, just to mention that i'm using mini DALL-E for graphic novel experiments... Indeed not really a human quality but ... https://twitter.com/Dbddv01