Nvidia's approach to software is really interesting, and demonstrates that they have a hardware culture through and through.
They could turn Canvas into a web app and charge a monthly subscription. Alternatively they could go the OpenAI GPT-3/DALL-E-2 route and give it away as a web app or API to generate a huge potential customer list. Instead, they're only interested in technology demonstrations.
I'm not arguing that this is a good _or_ bad thing. It's just interesting to watch a company drive one of the greatest innovations humankind has ever developed (AI), yet fail to capitalize on the resulting value creation due to a hardware focused culture.
Can I ask, was there an underlying reason that people deciding to pursue this image generation task, or is this literally just the result of throwing lots of tasks at different types of AI until you finally find one it seems to do well?
I don't mean to denigrate this, the results are clearly interesting, but I just don't understand what problem this solves, it just seems to raise the noise floor on reality.
There is a real use case for this type of technology.
I'm the founder of https://ayvri.com, and we have a 3D virtual world where outdoor athletes watch their activities, and the activities of others.
As the resolution (and speed) of our 3D world improved, people got more interested and engaged with it.
I believe this is the future of video. Not volumetrically created through 20+ cameras, but with a single camera capturing the scene, and AI filling in the blanks based on what it knows.
Right now, there is a whole bunch of architectures that are being discovered as being good for certain tasks.
At some point, there is going to be some sort of higher level research for ML in terms of generating an architecture for a particular task. And all this research is going to be used for this.
When you look at GAN, I could see the data from this endeavor being used to improve it's output. I get it doesn't seem to have a direct application, but I would suspect that it actually is quite valuable to the media and entertainment space in the long run.
tl;dr - It’s a GAN, they have some interesting limitations but can output 1024px images in real time on a consumer gpu.
The training labels may have been “segmentation maps”. These are regions of an image with a known scene description such as “cloud”, “trees”, “sky”. I’m not certain what model they use, but I bet it is a Stylegan2/3 modified to generate an image from a given set of segmentation masks.
Indeed, without the research context, it’s a little strange “why” you would want a product like this. Nvidia has done a lot of research to get GAN to run very fast on their RTX cards due to being mostly convolutional, operating directly in pixel (or wavelet) space rather than an embedding space. On my RTX 2070, I can run Stylegan2 at 1024px at a somewhat reasonable 10 FPS.
Canvas remains my favorite way of demonstrating both the potential and the danger of AI to Boomers, parents, grandparents, and just non-techies in general.
Even the most computer illiterate of people these days are able to scribble an MS Paint landscape and have it (usually) turn into a gorgeous seascape or mountain vista.
First program/app since maybe WordLens (that old iOS real time translation overlay app) that gets consistent “wows” out of virtually everybody.
They could turn Canvas into a web app and charge a monthly subscription. Alternatively they could go the OpenAI GPT-3/DALL-E-2 route and give it away as a web app or API to generate a huge potential customer list. Instead, they're only interested in technology demonstrations.
I'm not arguing that this is a good _or_ bad thing. It's just interesting to watch a company drive one of the greatest innovations humankind has ever developed (AI), yet fail to capitalize on the resulting value creation due to a hardware focused culture.