
NeRF: Representing scenes as neural radiance fields for view synthesis - dfield
http://www.matthewtancik.com/nerf
======
uoaei
This is absolutely stunning.

As they say in ML, representation first -- and this is one of the most natural
and elegant ways to represent 3D scenes and subjective viewpoints. Great that
it works into a rendering environment such that it's E2E differentiable.

This is the first leap toward true high-quality real-time ML-based rendering.
I'm blown away.

------
jayd16
Very cool. Reminds me of when I played with Google's Seurat.

The paper says its 5MB, 12 hours to train the NN and then 30 seconds to render
novel views of the scene on an nVidia V100.

Sadly not something you can use in real time but still very cool.

Edit:12 hours and 5MB NN not 5 Minutes

~~~
ssivark
Huh, what? It needs almost a million views, and takes 1-2 days to train on a
GPU. I’m not sure where the “5 minutes” number comes from.

EDIT: I was referring to the last paragraph of section 5.3 (Implementation
details), but maybe I’m misunderstanding how they use rays / sampled
coordinates.

Very impressive visual quality. But it seems like they need a LOT of data and
computation for each scene. So, its still plausible that intelligently done
photogrammetry will beat this approach in efficiency, but a bunch of important
details need to be figured out to make that happen.

~~~
scribu
> It needs almost a million views

Not sure what you mean by "views". The comparisons in the paper use at most
100 input images per scene.

~~~
bla3
A pixel is one view for their model if I understand correctly, so one hundred
100x100 images would be a million views.

------
lifeisstillgood
Well that took some effort just to work out what they actually did. How they
actually did it I have no idea. Impressive however - a sort of fill in the
blanks for the bits that are missing. If our brains _dont_ do this one would
be surprised.

And we are all supposed to become AI developers this decade?!

Come back Visual Basic all is forgiven :-)

------
raidicy
This blows my mind. This is probably a naive thought; This technique looks
like it could be combined with robotics to help it navigate through its
environment.

I'd also like to see what it does when you give it multiple views of scenes in
a video game. Some from the direct pictures and some from pictures of the
monitor.

~~~
yarg
They've only showed it working with static content - they'll need to do it
with video (multiple synchronised cameras) and in real time for ant robotics
application.

~~~
macawfish
It'd be interesting to see what happened if they encoded an additional time
parameter on each 'view' (input image pixel). Surely someone is already trying
to extend this technique that way.

------
teknopurge
This is bad-ass, partly because it's so elegant.

------
blackhaz
Could someone ELI5, please?

~~~
mooneater
If you give it a bunch of photos of a scene from different angles, this
machine learning method lets you see angles that did not exist in the original
set.

Better results than other methods so far.

~~~
notfed
Fist bump for actually answering as ELI5 (unlike the other responses).

------
kuprel
This would be great for instant replays

~~~
jayd16
Intel already does this with their "True View" setup. They also had a tech
demo CES where they synthesized camera positions for movie sets.
[https://www.youtube.com/watch?v=9qd276AJg-o](https://www.youtube.com/watch?v=9qd276AJg-o)

------
macawfish
The neural networks representing these scenes take up just 5 MB... Less than
the input images used to train them. Wow. Mind blowing!

~~~
BubRoss
Keep in mind though, that the way the data is represented is a form of lossy
compression and the size of the images may not be.

------
byt143
If you're only looking for one novel view, can it use less views that are
close to the novel one?

------
ssivark
Does anyone know how they do the “virtual object insertion” demonstrated in
the paper summary video? Can that be somehow done on the network itself, or is
that a diagnostic for scene accuracy by performing SFM on network output?

~~~
theresistor
I'm pretty sure they're rendering a depth channel and compositing it in.

~~~
teraflop
You could do that, but I think it's simpler to just introduce additional
objects during the raytracing process that generates the images. That would
produce accurate results even with semitransparent objects, unlike compositing
with an depth buffer.

------
philip368320
I would like to see “neural enhances” an already rendered 3D scene with the
changes which would make it more realistic, given depth map and other
information to the neural network

~~~
BubRoss
How would it be made more realistic?

------
tanilama
This is REALLY cool, but kinda makes sense as well. Neural networks are very
good at interpolation, given the right prior.

------
2OEH8eoCRo0
This is the kind of shit I come here for. Awesome post! Thanks for sharing!

------
anthk
This is like the Blade Runner ingame tool.

