I clicked through to the demo site ( http://gaugan.org/gaugan2/ ) and it was horrible.
The interface is clunky, slow, and confusing. I actually had to zoom out in my browser to see the whole thing. Had to click through a non-HTTPs warning. The onboarding tutorial is pretty bad.
I got a generic picture of the milky way for any prompt I tried ("rocks", "trees"). If you press Enter in the prompt field it refreshes the page.
This feels like a hackathon front-end hooked up to an intro to PyTorch webservice. It's only neat because, unlike the other 20 copies of this same project I've seen, it was the only one that didn't immediately throw its hands up and say "server overloaded, please wait."
If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.
Here we have a web app that does something very clever, built by great devs, that fails because it doesn't work the way users expect it to.
So many highly intelligent, super technical founders have a belief that their amazing tech will sell itself, so they don't put the time and effort necessary in to design, UX, or marketing, and they fail because their UI didn't make it clear what to do. It probably works brilliantly when they demo to people, with the authors driving it or helping users get the most from it. But when users have to use it without that help... It fails.
The lesson for founders here is simple - test your UX, because you won't get a second chance with most customers.
No matter what some backend folks believe, there will always need to be highly skilled front end engineers who can put together web apps in a way where the interface just 'gets out of the way' so you can focus on the actual utility.
This point is very important and I hope to not do the same misstake.
Because I also watched the video, saw what was happening there, looked nice - but trying it for myself absolutely did not work as expected.
In other words, if there is a simple video with features shown - then trying it out needs to be as simple as the video, or it causes lots of frustration.
First time users do not want to deal with setup configs etc. first. This is something you want later.
It does produce some very cool results: https://imgur.com/LQuo4UM
Had the press release/tutorial emphasized this angle instead of the wonky text-to-image thing, my initial impression would have been a lot better. This is genuinely a really neat feature. All my UI and discoverability criticism stands though!
This comment explains how to use it:
No matter what variations I tried the sky was cloudy dark. Trees however were majestic. The UI does suck.
The picture is just really zoomed out. That's how awesome it is - it shows you ALL rocks and trees.
Alright, I've uttered the incantation for it to do the thing. I still don't get it.  https://imgur.com/4zbaiH0
I also tried another example prompt, which bared a striking similarity to the previous result. I don't know if it's persisting the result (It shouldn't - I didn't click the re-use image button), but the strange life-raft looking artifact is very persistent.  https://imgur.com/KTCM4xH
(and for what it's worth, you're totally right that the UI is just an absolute disaster)
I think I would have been more generous to the project had I known it could do that. Maybe I X'd out of the frustrating tutorial too early? :)
Maybe I'm just too old for tech demos.
For a research demo the UI is extremely polished. It even has an interactive tutorial!! Some people...
I work in a research group and most of my colleagues would have a hard making an interface like this one, no matter how confusing. They are ML/DL scientists and usually have had absolutely 0 exposure to frontend.
Let me put my tinfoil on. Maybe the idea is to normalize and make it less scary. Look at the press releases. The general response was not a horrified 'omg, this tech has gone berserk', but 'oh, its a benign lil biddy'.
The interface is absolutely atrocious.
A barrage of obtrusive prompts. When it wanted me to enter a phrase I typed "sunset over a field" and got a picture of the Milky Way galaxy also, followed by more prompts.
I then closed the tab and came here to read the comments.
That's just default browser behaviour for forms.
Or how about this one for "dog playing with ball" https://i.imgur.com/ldGLdwF.png
I have tried about a dozen different input phrases and every time I get these very strange results.
"river flowing through desert": https://i.imgur.com/QSjH5hk.png
"sunset waterfall": https://i.imgur.com/wS3EEci.png
Or how about a lovely "green shoreline": https://i.imgur.com/RkbTV99.png
While the results on Nvidia's website aren't too impressive, if you look at the history of animated movies , one can see how trivial and simplistic the art and animation was.
Having had some experience doing some research on GANs at university, I know them to be very powerful. What's very important to note is that the images generated my the model are truly "novel" i.e. completely fictitious. The images generated may be biased to some of the training data such as color and texture of the water and rocks for example, but every image is a fantasy of the model. The only way the model can generate such realistic images is because it has a very good abstract internal representation of what oceans, waves, rocks are.
Back at university, I pitched the idea to my professor of using GANs for generating "novel" images in real time while parents would read bed time stories to children. I didn't get very far. Glad to see some real progress in that direction.
Hollywood has little awareness of just how much danger the legacy version of their industry is in.
ML generated assets are slowly creeping towards reality and at the same time doing 3D dev is 100-1000x easier than it was just a few years ago. It's now possible to do for free in many cases as well.
1. Just close the tutorial
2. Scroll down to the ToS checkbox and check that
3. On top in the "input utilization" row make sure only text is checked
4. Enter your text (use only landscape terms)
5. Press the arrow to the right inside a rectangle button located below the text input.
6. Maybe zoom out a bit because the result will be the image on the right which for me was out of view by default. Had to zoom to 50% to see the whole UI.
1. uncheck "segmentation"
2. check "text"
-the screen zooms around disorienting for the tutorial and I get to congratulations you made your first image - there's nothing there.
-Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry
-The whole site is a fixed width which is wider than my screen
- A red alert check box at the bottom confuses me about whether that's why it's not working
That tripped me up too. After typing in your text, don't hit enter or anything similar, just click or tap the button with the right arrow (or anything to the right of that button, effects vary).
Chrome appeared to work somewhat better than Brave, but it was still pretty frustrating.
I did manage to get it to work in a half-assed manner eventually, but the UI definitely needs a great deal of work.
Edit: Looks like it was only trained on landscape images.
As for the UI: layout via tables?
An AI generated house on a lake: https://imgur.com/a/0wtVKum
I have found the best results come from uploading an image, then using the demo tools to get a segmentation map and sketch lines, then editing those as you desire. Changing the styling at the end also makes a big difference!
However... This gallery on imgur gives a better idea of capability
I'd love for just the algorithm generation tool to be available for download. The web UI is clunky and just doesn't seem to work right.
Returned an image of stars in outer space.
There are a whole series of "this<blank>doesnotexist" E.g. landscapes, faces, animals, etc.
Another unfortunate name: "Gauleiter" was a regional leader of the Nazi Party.
Even then, I don't think I'd care enough to fight through the layers of bullshit here.