Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia releases Paint me Picture – A web app for GauGAN2 (nvidia.com)
167 points by nitinreddy88 11 days ago | hide | past | favorite | 90 comments

I really don't have anything constructive to say. I think in general we're getting too soft on shitty things, so I'm going to be harsh.

I clicked through to the demo site ( http://gaugan.org/gaugan2/ ) and it was horrible.

The interface is clunky, slow, and confusing. I actually had to zoom out in my browser to see the whole thing. Had to click through a non-HTTPs warning. The onboarding tutorial is pretty bad.

I got a generic picture of the milky way for any prompt I tried ("rocks", "trees"). If you press Enter in the prompt field it refreshes the page.

This feels like a hackathon front-end hooked up to an intro to PyTorch webservice. It's only neat because, unlike the other 20 copies of this same project I've seen, it was the only one that didn't immediately throw its hands up and say "server overloaded, please wait."

If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.

[0] https://imgur.com/BNLDt6A

Your comment is a really good example of how startups can fail.

Here we have a web app that does something very clever, built by great devs, that fails because it doesn't work the way users expect it to.

So many highly intelligent, super technical founders have a belief that their amazing tech will sell itself, so they don't put the time and effort necessary in to design, UX, or marketing, and they fail because their UI didn't make it clear what to do. It probably works brilliantly when they demo to people, with the authors driving it or helping users get the most from it. But when users have to use it without that help... It fails.

The lesson for founders here is simple - test your UX, because you won't get a second chance with most customers.

The app I used worked nothing like the slick demo in the video. In fact, the UX and UI are some of the worst I have used in recent memory.

No matter what some backend folks believe, there will always need to be highly skilled front end engineers who can put together web apps in a way where the interface just 'gets out of the way' so you can focus on the actual utility.

"The app I used worked nothing like the slick demo in the video."

This point is very important and I hope to not do the same misstake.

Because I also watched the video, saw what was happening there, looked nice - but trying it for myself absolutely did not work as expected.

In other words, if there is a simple video with features shown - then trying it out needs to be as simple as the video, or it causes lots of frustration.

First time users do not want to deal with setup configs etc. first. This is something you want later.

After a sibling commenter very patiently pointed out that I was holding it wrong - I would encourage taking another look at this project if only to try out the Painting mode.

It does produce some very cool results: https://imgur.com/LQuo4UM

Had the press release/tutorial emphasized this angle instead of the wonky text-to-image thing, my initial impression would have been a lot better. This is genuinely a really neat feature. All my UI and discoverability criticism stands though!

The text-to-image thing is cool as well, as long as you can a) figure out how it works b) only enter landscape terms.

This comment explains how to use it: https://news.ycombinator.com/item?id=29338213

I managed to produce landscape - canada fall colors red yellow green and bright blue sunny sky.

No matter what variations I tried the sky was cloudy dark. Trees however were majestic. The UI does suck.

Your query worked for me on first try: https://imgur.com/a/iTKwZ3v

I tested it again and it worked. Not sure what was I doing wrong first few tries

Type in "cat" into the text box and see a wonderful variety of landscapes that look like they are straight from the surface of some sort of plane of hell. Or furry flowers. Or fruit with eyes. I was expecting "lion in the Savannah" type pics, not "a visualisation of my DND group's "baleful polymorph" spell"...

Yes, whatever I type I get some abomination in the general settings I asked for (hills, mountains, ocean.) And I agree that the UX is also horrible. I want the webapp they are showing in the video, not the one they have online.

>I got a generic picture of the milky way for any prompt I tried ("rocks", "trees").

The picture is just really zoomed out. That's how awesome it is - it shows you ALL rocks and trees.


What if this is done on purpose? If they make the UX too easy then since this tech is so impressive it will be shared across the general population and quickly the service will get overloaded. This way only those that truly have the patience to fiddle with controls have the ability to get it working.

It looks like you have selected for it to use a segmentation mask, and not to use text.

I have no idea what that means, but the fact that it both told me to enter a text prompt and actually let me do it while not being in whatever magical mode it should have been in in order to actually use the text prompt is another point that can be added to my above rant.

Alright, I've uttered the incantation for it to do the thing. I still don't get it. [0] https://imgur.com/4zbaiH0

I also tried another example prompt, which bared a striking similarity to the previous result. I don't know if it's persisting the result (It shouldn't - I didn't click the re-use image button), but the strange life-raft looking artifact is very persistent. [1] https://imgur.com/KTCM4xH

Yeah, the text-to-image seems to be highly dependent on whether the generator knows how to generate the specific objects the text model thinks should be in the image. I got much more consistent results using the semantic segmentation drawing as input:


(and for what it's worth, you're totally right that the UI is just an absolute disaster)

The picture you drew and had it turn into rocks is actually really cool!

I think I would have been more generous to the project had I known it could do that. Maybe I X'd out of the frustrating tutorial too early? :)

Literally the first step after accepting the TOS:


Yes but the selection is lost if you press enter out of habit. I had the same frustration until I realized what happened which turned that into anger :)

Maybe I'm just too old for tech demos.

Nah, the UI is not very intuitive, so there's plenty of blame to go around :-)

Wow, you weren't kidding. What an awful interface.

Hackernews has never gone soft on anything. Quite the opposite.

Yeah Hackernews is almost universally critical. Even his comment is needlessly critical. nVidia isn't poising this as a polished consumer app. It's a demo! The URL is https://www.nvidia.com/en-us/research/ai-demos/

For a research demo the UI is extremely polished. It even has an interactive tutorial!! Some people...

I worked in research in human computer interactions, this is a very poor UI even in a research setting. The technology is impressive though.

Your parent comment didn't mean that this was a good UI for human-computer interaction research, he meant that it was a good UI considering it was probably just a quick demo built by the scientists who did the research to showcase their work.

I work in a research group and most of my colleagues would have a hard making an interface like this one, no matter how confusing. They are ML/DL scientists and usually have had absolutely 0 exposure to frontend.

Yes not every researcher is good in frontend development. But my research group wouldn't promote such a broken UI until someone fix it at least a little bit.

<<If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.

Let me put my tinfoil on. Maybe the idea is to normalize and make it less scary. Look at the press releases. The general response was not a horrified 'omg, this tech has gone berserk', but 'oh, its a benign lil biddy'.

Tinfoil off.

The interface is absolutely atrocious.

I bailed on it also, with the same experience.

A barrage of obtrusive prompts. When it wanted me to enter a phrase I typed "sunset over a field" and got a picture of the Milky Way galaxy also, followed by more prompts.

I then closed the tab and came here to read the comments.

It just did absolutely nothing for me, I typed in words and hit enter, just a green screen.

I'll say something constructive. While the text to image functionality is terrible, the segmentation based on arbitrarily drawn images is kind of fun. They should have just gone with the latter.

This is a demo piece not a paid app. Nvidia has zero interest in selling web apps. This is just a marketing stunt to show off their capabilities so they can sell hardware.

I don’t know how to make sense of the tos for projects like this. They all have clauses that let them modify their terms later for example

> I got a generic picture of the milky way for any prompt I tried You need to uncheck "segmentation" and check "text"

> If you press Enter in the prompt field it refreshes the page.

That's just default browser behaviour for forms.

I'm still clicking on the next button.

I am just getting very weird results that don't look at all like the one in the demo video. Here for example the image it gave me for "car in front of house" https://i.imgur.com/QdtrtCR.png

Or how about this one for "dog playing with ball" https://i.imgur.com/ldGLdwF.png

I have tried about a dozen different input phrases and every time I get these very strange results.

Yeah, it can draw anything you want, as long as what you want is mountains, trees and lakes.

Cars and houses don’t sound like landscape. Maybe they should put some filters for non landscape input

I missed the part where it said it was trained only on landscapes. So I retried it with just those and got this:

"river flowing through desert": https://i.imgur.com/QSjH5hk.png

"sunset waterfall": https://i.imgur.com/wS3EEci.png

Or how about a lovely "green shoreline": https://i.imgur.com/RkbTV99.png

2nd one is pretty dope. The 3rd one is rather uncanny to me though

Well, you did better than I did. I only get pictures of stars and galaxies regardless of what I input, even restricting it to landscapey words.

Those are at least weird and interesting. I tried 'dog running by a river' and it just kept giving me astronomical images ¯\(°_o)/¯

Make sure "Input utilization" is set to "Text" if you're entering a text prompt.

what? are you saying they showcased the best results? that's unheard of.

It's not just a matter of showcasing the best results. It's a matter of night and day difference between normal output and the one showcased. They are not even remotely close. I actually wonder why they decided to release this to the public in its current form. I was very impressed by the noise suppression app they released so I expected something that delivers decent results.

The first image would make an excellent /r/writingprompt

You broke it!!

I wonder if in a decade or so, large tech companies will unseat Disney, Warner Brothers etc as creators of animation movies.

While the results on Nvidia's website aren't too impressive, if you look at the history of animated movies [1], one can see how trivial and simplistic the art and animation was.

Having had some experience doing some research on GANs at university, I know them to be very powerful. What's very important to note is that the images generated my the model are truly "novel" i.e. completely fictitious. The images generated may be biased to some of the training data such as color and texture of the water and rocks for example, but every image is a fantasy of the model. The only way the model can generate such realistic images is because it has a very good abstract internal representation of what oceans, waves, rocks are.

Back at university, I pitched the idea to my professor of using GANs for generating "novel" images in real time while parents would read bed time stories to children. I didn't get very far. Glad to see some real progress in that direction.

[1] https://www.filmsite.org/animatedfilms.html

Close, but it will actually turn out to be a very small company (possibly in less than a decade).

Hollywood has little awareness of just how much danger the legacy version of their industry is in.

ML generated assets are slowly creeping towards reality and at the same time doing 3D dev is 100-1000x easier than it was just a few years ago. It's now possible to do for free in many cases as well.

Hardly, Disney now owns Pixar after the early 3D competition days, they can as easily buy another concurrent.

That’s “competitor”.

Right, thanks.

This seems to be the link: http://gaugan.org/gaugan2/

A tip for people who get lost in the interface:

  1. Just close the tutorial
  2. Scroll down to the ToS checkbox and check that
  3. On top in the "input utilization" row make sure only text is checked
  4. Enter your text (use only landscape terms)
  5. Press the arrow to the right inside a rectangle button located below the text input.
  6. Maybe zoom out a bit because the result will be the image on the right which for me was out of view by default. Had to zoom to 50% to see the whole UI.

Also definitely don't hit enter like you do in every other form, because that seems to clear your input, swap the "input utilisation" back to only segmentation and sometimes also unchecks the ToS checkbox.

I only said lakes, mountains etc, and only got galaxy/star pictures...


You have to

1. uncheck "segmentation"

2. check "text"


Thanks! This UI is something else...

I finally got through to the demo through three links and it's so busted in so many ways for me that I give up. Maybe it's stupid to try with my old netbook but I don't get any indication of whether I need a fancy graphics card for it to work or if it's running on my end. Anyway

-the screen zooms around disorienting for the tutorial and I get to congratulations you made your first image - there's nothing there.

-Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry

-The whole site is a fixed width which is wider than my screen

- A red alert check box at the bottom confuses me about whether that's why it's not working etc

> Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry

That tripped me up too. After typing in your text, don't hit enter or anything similar, just click or tap the button with the right arrow (or anything to the right of that button, effects vary).

More or less the same, and I have a nearly-new M1 iMac, so it's not your old netbook.

Chrome appeared to work somewhat better than Brave, but it was still pretty frustrating.

I did manage to get it to work in a half-assed manner eventually, but the UI definitely needs a great deal of work.

The UI is horribel .. they don't have one single ux person who have 2 hour to spare that could help out at nvidia ...

I entered "kitten" and got typical surreal GAN output with disconnected topology, dozens of eyes, etc.

Edit: Looks like it was only trained on landscape images.

I like what it did with colorless green ideas sleeping furiously. It fits the mood of the sentence.


I entered "mountains" and "mountains and lake" and only got pictures of what looked like blurred galaxies and stars. Clicking any of the style buttons got me colored/tinted pictures of stars. Is it broken?


Your segmentation map contains "sky" by default. Try drawing "mountain" color in the lower part of the image first.

MtG dual-color lands serve as a good source of ideas on what to put in the textbox:


As for the UI: layout via tables?

Feeding all magic cards into a GAN would be a great way to generate new ones.

I am impressed, obviously still a tech. demo but I like the future. text to image isn't awesome but segmentation worked beautifully


The comments are overwhelmingly critical of the user interface, which is undoubtedly the weak part of this release, but I was still able to get some very impressive results.

An AI generated house on a lake: https://imgur.com/a/0wtVKum

I have found the best results come from uploading an image, then using the demo tools to get a segmentation map and sketch lines, then editing those as you desire. Changing the styling at the end also makes a big difference!

I input "a horse", it gives me this: https://imgur.com/1coGQix

Behold, AI. I think our dev jobs are safe for the next 100 years at least.

I sit here quite impressed with my Pseudo 50s SciFi book cover


However... This gallery on imgur gives a better idea of capability https://imgur.com/gallery/coWN44P

The UI for the demo is atrocious, but that's probably because the text-to-image generation was glued to their existing AI painting tool.

I'd love for just the algorithm generation tool to be available for download. The web UI is clunky and just doesn't seem to work right.

After trying 30 minutes I kind out understood some stuff. But the ui is legit anxiety inducer. I hope they can fix the ui to make it fun. Currently felt like using 80's DOS graphic software with so much manual input.

Chose building->house, segmentation and text, wrote "skyscraper" as text and drew some lines of a silhouette of a skryscraper.

Returned an image of stars in outer space.

Trying to do everything that the tutorial and other commenters have suggested, but when I click the arrow button it just waits for a few moments and nothing is generated. Am I missing something?

Are there any open-source models that can do similar type of landscape generation? I would really like to look at the code and try to understand how these things are built...

This might be a good start: https://thisbeachdoesnotexist.com

There are a whole series of "this<blank>doesnotexist" E.g. landscapes, faces, animals, etc.

Nvidia should visualize a periodic 24 hour 3D landscape sweep of the chatter on social media platform for metaverse dive through and interactive engagement.

Wow. I have a good degree of respect for Nvidia but this should never have been released in the state it is. Whos the product manager for this?

Link to actual demo: http://gaugan.org/gaugan2/

It seems that the web UI was generated by AI too, because it's really hard to make sense of it.

What license are the produced images under? I could see this being used for cheap stock photos

Now, it's not an imagination if you can create a visual art using texts.

To whomever came up with this name, good job

This sounds quite ironic to me, since "Super-GAU" in German stands for a disaster beyond all expectations (usually meltdown of a nuclear reactor).


Another unfortunate name: "Gauleiter" was a regional leader of the Nazi Party.


"gaugan" sounds like the word for "rag" in Polish.


I hear they have a miraculous new AI tool that magically determines your sexual desires then uses lasers to induce those feelings through your eyeballs with no contact necessary! Coincidentally demonstrated at this very same URL!

Even then, I don't think I'd care enough to fight through the layers of bullshit here.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact