Hacker News new | past | comments | ask | show | jobs | submit login
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds (nvlabs.github.io)
350 points by lnyan 25 days ago | hide | past | favorite | 60 comments

Take this idea, and apply it to Google Earth. Have it procedurally generate the rest of the data as I zoom in.

Its incredible I can check out my small home town on Google Earth now entirely in 3d (which wasnt the case just a few months ago). Yet the trees/cars are still these blocky low resolution things sticking out of the ground. Imagine Google Earth procedurally generating a high resolution mesh as I zoomed in. Train it with high resolution photogrammetry of other similar locations for the ground truth - and let me zoom in endless in their VR app

I like that idea as a separate program (or just in the VR app) but I think it would be confusing/misleading in Google Earth itself. At the very least there needs to be a clear user-facing indication as to which content is procedurally generated (ie fictional) versus photographed (ie real). Obviously there’s a grey area with image processing but I think there’s a real concern with prioritizing nice pictures over actual information.

I believe Microsoft Flight Simulator does this (and it has VR support). It gets a bit tiring to see the same trees on every place on earth, though.

Their building gen is also super nice but it really takes away from the experience when flying a known location.

Yeah, going and flying over the small rural farm I grew up on I was disappointing since Flight Simulator had procedurally generated a whole compound full of random buildings that made it look like some cult compound.

What would be fun is the inverse. Feed in a section of Google Earth and have it generate a Minecraft world out the other end.

It appears that someone's worked that out.


This is fantastic, good find!

It's been a couple of years, but I still like the Outerra project. And also the Bohemia Interactive VBS.

Is this what https://earth2.io is doing?

No, that's probably a scam. https://www.youtube.com/watch?v=ZaijNcRuzsQ

a crypto scam isnt what i had in mind

Super cool.

With that said I bet this would choke on lots of actual Minecraft worlds, because people often build things using blocks where the semantics get thrown completely out the window in favor of aesthetics. Want a big mural on the wall? You're going to be building the wall itself out of differently-colored blocks of wool

Maybe they'll solve that part one day :)

Edit: That said, it could choke in some really interesting ways...

Question for the experts: Is it possible that GANs will be used for rendering video game environments in the near-ish future? This has been one of my private predictions with respect to upcoming tech, but I'd love to know if people are already thinking about this, or alternatively, why it won't happen.

It depends on how exactly you frame the question.

If you ignore the implementation, its basically a procedural texturing technique? Those are widely used now.

If you're talking about a real time post effect, it would probably be a bit too slow for a few more years.

If you count SLAM techniques that label camera feeds for AR games, those are very close but I dont think most run at a full framerate.

Yep, sorry about that.

No problem, I thought I could help with that :)

> If you're talking about a real time post effect, it would probably be a bit too slow for a few more years.

The paper in question can apparently render at 2K and 30fps though. Or at least that’s what the videos claim.

Non real-time as in generating levels for your game to release it's doable today. Real-time it will probably be doable soon, especially for shared-world games where a server can generate for multiple people at a time rather than single-player.

Even real-time today it should be doable if you create your game with that in mind. You really don't need to generate all the textures, just a compact representation of the level which is to be rendered normally after the fact.

Artistically, developers could do some trippy dream sequences with GANs, where the glitchyness and training artifacts add to the immersions. Because one can sample GANs or mix in latent dimensions, the experience can be tailored individually based on the characters decisions for instance.

If the result is also a 3d environment, it could save a lot of time designing scenarios.

I'm not sure if this is what you're asking about, but here's a Two Minute Papers (dear fellow scholars!) video about a deep learning paper for super sampling with some applications for games: https://www.youtube.com/watch?v=OzHenjHBBds

That seems like an obvious use case.

We've had procedurally generated worlds for a long time, but this would take it from roguelike top-down or isometric to immersive fps.

It has already started with dlss. I am not sure about Nvidia implementation but super resolution can have some adverserial training.

If you mean asset / level generation. Then yes. It is the next step in procedural generation imo.

DLSS is just image upscaling using a neural network. It's a very different problem from what is shown here or what I believe GP is talking about.

Personally prefer blocky Voxel Art to the photoreal scene ;)

NVidia also released their RTXDI SDK for global illumination at a scale of millions of dynamic lights in real time. Combined with GANCraft, anyone could become a world class environmental artist using only Pixel Art tools.


NVidia really likes rendering landscapes, huh? https://blogs.nvidia.com/blog/2019/03/18/gaugan-photorealist...

Landscape images are classic exemplars for texture-by-number algorithms because nature's variety means that it's easier to make them look real enough.

See the OG transfer algorithm called "Image Analogies" from decades before the GAN boom:



Yes, there are all sort of weirdness in their rendering but that's what you get in a research paper. Put that in the hands of actual game designers and you will have incredible possibilities.

Sounds a lot like Google's GAN for fantastical creatures[0]. Labels to photorealism seems to be the core idea behind both

[0] https://ai.googleblog.com/2020/11/using-gans-to-create-fanta...

Maybe I'm just too dumb, but I wish these papers would cut the nonsense and explain the key elements in layman's terms with simple examples. I'm super curious how you can do something like this in a fully unsupervised fashion, but the "Hybird Voxel-conditional Neural Rendering" doesn't mean much to me. Maybe if I knew what "voxel-bounded neural radiance fields" were...

> I wish these papers would cut the nonsense and explain the key elements in layman's terms with simple examples

If papers did that, they'd be a thousand pages long. The target audience is people intimately familiar with the state of the art.

The voxel bounded neural radiance field is important as neural radiance field was some prior research paper this builds off. But the very high level is just voxel data to image generation using some form of neural nets. I didn’t look at the paper but I’d hope it summarizes neural radiance fields and if not it’ll at least cite them and then you’d read there and see how this paper extends that work.

Wow that site and all it's auto play videos crashed my phone. I get that you want to show off the cool tech but please don't put that many videos on autoplay.

This is what happened in your head when you played Atari games as a kid.

It looks impressive, but what exactly is the machine learning doing on the original to produce the result?

And wouldn't it be possible to simply take the original minecraft map as a height map and texture map and then regenerate a new world with the original world data and more advanced post processing? You could interpolate and randomize more detail into the scene than you started with.

It’s not really adding any meaningful detail per se. where there’s a grass block it’s just rendering grass. All it is doing is projecting a stable image of “grass” (taken from a labeled image database) in that voxel.

Not to minimize the awesomeness of that... doing it stably in 3D while moving the camera is the point of this paper, and is amazing.

But it’s not really adding detail beyond “these are the kinds of pixels that grass has and the AI figured out we can put them in this arrangement without making things jumpy”

It doesn't look like it can do structures, either.

It also seems like your brain is doing most of the work here:


The renderer seems to be adding some resolution, smoothing, and mipmapping. Shaders can do the same thing, and in real time.

Modern shaders[1] do a lot, but you'll never mistake them for anything but minecraft. They don't quite get to where this paper is demonstrating.

[1] https://www.rockpapershotgun.com/best-minecraft-shaders

Anyone doing GAN with

before: old streetview pre bikelanes

after: streetview with new bike lanes

profit: now you can see what any town would look like with complete streets. I call it Complete Street View.

Please do implement. Of course it would be dreamlike, this is a strength as you wouldn’t want the gan to make design recommendations, just a plausible feel.

There's a probably a larger opportunity for an urban redevelopment sketching/brainstorming tool of sorts.

Civil engineers, architects and landscapers are going to have a field day with this.

I like how if you look close enough, the outlines of trees and hills still are block-ish.

The image translators work for the construct program, but there is way too much information to decode the matrix. You get used to it. I don't even see the code, all I see is blonde, brunette, readhead...

Wow! Amazing results, it's like marching cubes on an acid trip!

At the end of the paper it says that one frame takes 10secs to render. I wonder whether one day this method will be able to render in real time (say 30fps).

Maybe, but OTOH we have very efficient 3D rendering technology that we understand very well. If I had more compute, I'd want to raytrace everything in real-time, but I wouldn't feel the need to bring neural networks into the mix. A better use case of machine learning is probably to help procedurally generate the data to be rendered. It would be really neat to be able to turn a few photos of a real-world location into high quality 3D meshes with no gaps, for example.

This creates a realistic topography from voxels. If you could do this in realtime then you could have a game where you have the flexibility of minecraft yet the appearance of a more photorealistic game. Imagine playing a game that looked like Control except everything is destructible and constructible. It's an exciting idea.

Almost definitely. There's many ways to optimize further with software and hardware is only getting better. I wouldn't be surprised if it's doable today with some cheating, a bit more hardware and a lot of work on optimization.

Would having a higher resolution texture pack make for better results?

It doesn't look like it uses any texture information. I think it only takes in a list of block locations and spits out a scene. I would think you would have to train it with every different combination of textures.

Heh i wonder if we ever get Minecraft 2.0, i get a good chuckle how it just barely runs on consoles and ultrapowerful PCs yet looks so "basic"

Myself and my son absolutely love it and spend months in this pandemic deep in minecraft worlds

That’s mostly a product of Minecraft’s technical choices. Modern computers can render axis aligned voxel grids on the order of 1,000,000^3 (think Minecraft scale but the blocks are sub millimeter) with PBR/GI in real time. Interactive would be another story I suppose.

The clickbait article makes people believe "you can create 3d models of ANY 2d object" but in reality, this would only come down to cars, cats and human faces. We have only so much datasets that are suitable for a GAN.

The neural net part of this seems somewhat trivial and also misapplied. This is not a realtime renderer, and I would hazard that if you gave someone who knows GLSL the task, they would produce something far and away more compelling than this, that could probably render at <1 FPS.


They would produce something which won't generalize to other types of environment without another huge load of human labor.

Your complaint could be made about just about any new technology. It's usually worse than what came before it at first, but the value is in the potential to eventually become better than what came before it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact