Muse: Text-to-Image Generation via Masked Generative Transformers

andybak · on Jan 17, 2023

Sigh. At this point - if I can't try it out, then I don't really care. It's just a tease.

CuriouslyC · on Jan 17, 2023

Fidelity on the output isn't great, but the coherence (assuming the examples weren't massively cherry-picked) seems very good. Given the number of parameters this should be able to run on end-user machines, and in theory this could be fine tuned to produce better looking output than stable diffusion/etc.

What this model does more than anything else is demonstrate we're still in the early stages of generative models, and we can expect a lot of progress from architectural improvements over the next decade (in addition to the progress in compute and data that we're already counting on).

mikemoka · on Jan 17, 2023

Here is an available implementation:

https://github.com/lucidrains/muse-maskgit-pytorch

rafaelero · on Jan 17, 2023

Hopefully stability.ai will train this model and release it open-source. It's much easier to train it than Stable Diffusion after all.

Garlef · on Jan 17, 2023

It'd be interesting to see some results where the training set has higher artistic quality (and how this model influences the "house style"). The output does not look great when compared to what other (trained) models deliver.

But the promise of a big efficieny gain will be an incentive for companies like midjourney to give it a go with their data.

seydor · on Jan 17, 2023

More amazement . I wonder where this field will end up. Cute animal and nature images are nice but have limited real-life use (i mean, we have to accept that visual media ends after everyone can be an artist). I wonder when we 'll start interfacing language models with robotics to do some real-life work

desro · on Jan 17, 2023

I don't think we have to accept the idea that "visual media ends after everyone can be an artist." On the contrary: such a scenario may well make visual media even more relevant than it is today. Already we communicate concepts and emotions with gifs, icons, memes...

This can go in any number of fantastical directions. But visual media both as a private/personal medium and salve as well as an enterprise-grade tool of mass entertainment and propoganda? Baby we're just getting started!

activatedgeek · on Jan 17, 2023

We are starting to see this kind of work already, e.g. https://code-as-policies.github.io

deepsquirrelnet · on Jan 18, 2023

> Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations;

Am I wrong or is that the same architecture as DALL-E 1?

pr337h4m · on Jan 17, 2023

Would stuff like DreamBooth and textual inversion be usable with transformer models like this one?

https://dreambooth.github.io/ https://textual-inversion.github.io/

kleiba · on Jan 17, 2023

Please stop teasing and post the link to your free trial web interface. Please?

azinman2 · on Jan 17, 2023

Most google ML work is just papers; correct me if I’m wrong. Some models have made their way to hugging face like T5 but I don’t think any have a web interface.

kleiba · on Jan 18, 2023

One can dream though...